Fonts and Non-English Alphabets

Post about your cool new gadgets and toys. Ask other users for help with techy problems.

Fonts and Non-English Alphabets

Postby Мастер » Mon Apr 08, 2013 10:04 pm

I suspect Arneb is best placed to answer this one, but maybe the issue arises for Halcyon Dayz as well.

I use a particular typesetting system for practically every written document I ever produce these days. In my opinion, it produces very nice-looking documents, but it most definitely requires practice to use. (The final result is inevitably an Adobe PDF file.) Emails are normally produced by Thunderbird, but then read on whatever the recipient happens to use.

Recently, I have been getting good about trying to type non-English names (usually people or places) properly, e.g., including umlauts in German, the little accent marks in French, a little squiggle under the "c" in Portuguese, etc. Less often, I would include a word or phrase in a non-English language. In email, this seems to work reasonably well; the people who receive my emails with this sort of content generally seem able to read them, as I sent them, and the PDF files generally display the way they were intended. At one point (before I thought about it much), I thought it would be appropriate to use a font designed for the language the particular person's name, place name, or other word came from. But then, I realised that would most likely look kind of weird when embedded in an English language document. We have a bunch of English stuff, written in a particular font, then all of a sudden someone's name, written in a different font, followed by more English back in the original font. That wouldn't look so good.

But then the problem is, there is no font which includes every character in every language in the world, at least not that I'm aware of. And even if there is, it might not look so good; the more common fonts are usually somewhat limited in this regard. So I am wondering how the different pieces of software I use seem to accomplish something which doesn't seem like it should be possible to me.

When I send an email, I can include German umlauted letters like ä, French accented letters like é, a Turkish dotless ı, etc. When these characters are included in a Unicode UTF-8 text document, I think I understand how this works. However, when I send myself an email with a Euro symbol € and the email character encoding is ISO-8859-1, it comes through as a Euro symbol, even though this symbol isn't supposed to be part of ISO-8859-1. WTF? Then what font my computer is using to show me this Euro symbol (or the other characters) on my screen is even less clear. The character encodings like ISO-8859-1 and ISO-8859-15 include most, but not all of the characters for most non-Cyrillic and non-Greek European languages. So when I include a euro symbol, or some Czech or Turkish letters which are not included in the character encoding, how is it that they appear quite clearly on the screen to me? (And for the most part appear quite clearly on the screen of most recipients of the email I send?) Unicode includes practically the whole world, so I think I understand how the document is encoded. But what about the font? There aren't fonts which include every single Unicode character out there, so how is the computer showing it to me? Does it look at each character, and choose a font which has the relevant character? In that case, my screen should look like a jumble of mismatched fonts, but that doesn't seem to be the case.

For PDF documents, it is even weirder. I can create PDF documents with various Czech letters, for example. If I cut-and-paste the Czech letters (which have various little marks above the letters, something which appears not in English) into my email program, some of them appear as a single Czech letter which looks just like it is supposed to. But others appear as two separate characters - the accent or other mark appears first, and then the main part of the letter is a separate character to the right. It is also not clear that the Czech characters are "searchable" - I can search for them in the PDF document, but some of them don't seem to be found.

Does anyone know how all of this works?
They call me Mr Celsius!
User avatar
Мастер
Moderator
Moderator
Злой Мудак
Mauerspecht
 
Posts: 23936
Joined: Tue Aug 02, 2005 2:56 pm
Location: Far from Damascus

Return to Computers and Gadgets

Who is online

Users browsing this forum: No registered users and 3 guests