UTF-8, HTML

Post about your cool new gadgets and toys. Ask other users for help with techy problems.

UTF-8, HTML

Postby Мастер » Sun Oct 19, 2014 2:00 am

Would the world be a better place if everyone used UTF-8 encoding for email, always, all the time, no exceptions, ever?

And how about HTML? I resisted that one for a long time, but now I send all my emails encoded as HTML, as there are features in my employer-required signature which don't show up properly without it.

I have figured out how to set my main email client to send all emails as HTML and UTF-8, but it seems my portable devices (which run IOS) use some algorithm not under my control to decide how to send out the email. The results are not always what I would hope for.
They call me Mr Celsius!
User avatar
Мастер
Moderator
Moderator
Злой Мудак
Mauerspecht
 
Posts: 23937
Joined: Tue Aug 02, 2005 2:56 pm
Location: Far from Damascus

Re: UTF-8, HTML

Postby MM_Dandy » Sun Oct 19, 2014 5:26 pm

Yes, the world would be a better place if everyone used UTF-8 for sure.
HTML...I suppose that now with everyone having web-mail clients like gmail, HTML is here to stay. But unless the message has to be viewed in HTML, I generally don't on my desktop client.
User avatar
MM_Dandy
Moderator
Moderator
King of Obscurity
 
Posts: 4927
Joined: Thu May 12, 2005 9:02 pm
Location: Canton, SD, USA

Re: UTF-8, HTML

Postby Мастер » Sun Oct 19, 2014 10:49 pm

MM_Dandy wrote:Yes, the world would be a better place if everyone used UTF-8 for sure.
HTML...I suppose that now with everyone having web-mail clients like gmail, HTML is here to stay. But unless the message has to be viewed in HTML, I generally don't on my desktop client.


The email client I use (Thunderbird) seems to use some semi-intelligent analysis to decide whether to use HTML or not - if the email doesn't use any HTML features, then it sends it as plain text. However, it doesn't seem to scan the signature as part of the decision process - so if the signature uses HTML features, but the body of the message doesn't, then the HTML features in the signature are lost. I found an add-on which forces the email client to use HTML all the time.

On fonts, I receive a huge number of emails which are done in some Chinese encoding (there are several). I am not sure whether this is because the senders have configured their email programs to do this (many of the people here are Chinese, and I also deal with a lot of immigrants from China), or because the email client makes the call for them. One Chinese person seems to send all her emails with Japanese encoding :? :-? :???: When I respond, the response is automatically in whatever encoding it came to me in, but I can manually override it to be UTF-8; when I do this, the appearance of the email on my screen changes substantially. (In my opinion, the fonts look much much better.) I'm not sure why that happens.
They call me Mr Celsius!
User avatar
Мастер
Moderator
Moderator
Злой Мудак
Mauerspecht
 
Posts: 23937
Joined: Tue Aug 02, 2005 2:56 pm
Location: Far from Damascus

Re: UTF-8, HTML

Postby tubeswell » Mon Oct 20, 2014 12:06 am

WTF does UTF stand for?
A bus station is where a bus stops. A train station is where a train stops. On my desk, I have a work station.

If you are seeing an apparent paradox, that means you are missing something.
User avatar
tubeswell
Enlightened One
Enlightened One
 
Posts: 324867
Joined: Sun Sep 19, 2010 11:51 am
Location: 129th in-line to the Llama Throne (after the last purge)

Re: UTF-8, HTML

Postby Мастер » Mon Oct 20, 2014 1:54 am

Universal character set Transformation Format is one of several methods, the most popular of which seems to be UTF-8, of representing the characters in the Unicode character set in digital form. The objective here is to be able to represent every character used in every language in the world. Originally, dead languages were excluded, but it was quickly realised that people working in history, linguistics, etc. wanted to use characters from even these languages in the articles they would write.

The original one-byte character encoding schemes (like ANSII) could represent 128 (and later, 256) distinct characters. Good enough for English, and most European languages, provided you only use one language in a document, rather than mixing them all together. If you want to create a web page which shows how to say "hello" in all the different western-European languages, then 256 is not enough, and forget about Greek, Russian, Hebrew, Arabic, and (gasp) Chinese.

The traditional system was that different encoding schemes would be used for different languages. So a document written in Russian would use a scheme which assigned each upper-case and lower-case Cyrillic character to an 8-bit code, with the leftovers used for punctuation. If someone tried to open the document on a computer in Portugal, and the computer failed to recognise that the document was Russian, it would try to interpret each byte as representing a Portuguese character, rather than a Russian character. The result would be total gibberish. (Well, even it worked correctly, it would still look like total gibberish if you don't understand Russian.)

Unicode was an effort to represent all the characters in a single encoding system, so there would be no ambiguity - every computer in the world would interpret text in exactly the same way, and it would be possible to put any characters you like in any text document, including many different languages mixed together. UTF-8 is one of several competing Unicode schemes (I think it has pretty much won out), which uses 8-bits to represent English characters, 16-bits to represent characters in other European languages, and more bits still to represent Chinese, Arabic, Hindi, Japanese, Mongolian, Russian, etc.

But, despite the availability of UTF-8, many text documents (including emails and web pages) still use other encodings. If there is some mistake in interpretation, and the computer displaying the information to someone uses a different encoding than the one intended, then the information will not be presented correctly, and most likely look like total crap.

If you ever see a text document where some of the characters look like hollow boxes or question marks, this is frequently an encoding-related problem.
They call me Mr Celsius!
User avatar
Мастер
Moderator
Moderator
Злой Мудак
Mauerspecht
 
Posts: 23937
Joined: Tue Aug 02, 2005 2:56 pm
Location: Far from Damascus

Re: UTF-8, HTML

Postby tubeswell » Mon Oct 20, 2014 7:11 am

TFT Mactep

I bow to your computing knowledge
A bus station is where a bus stops. A train station is where a train stops. On my desk, I have a work station.

If you are seeing an apparent paradox, that means you are missing something.
User avatar
tubeswell
Enlightened One
Enlightened One
 
Posts: 324867
Joined: Sun Sep 19, 2010 11:51 am
Location: 129th in-line to the Llama Throne (after the last purge)


Return to Computers and Gadgets

Who is online

Users browsing this forum: No registered users and 2 guests