I have an answer for you. If you want to read the short answer version it'll be at the end of the document I write here just so you know. I want to try and explain what is happening inside the OS. Just in case someone is interested in some slight differences between Windows XP and Vista/7. This may get a bit wordy (as I tend to do) but at least I'll try to keep it as watered down and understandable as possible.
-----------------
UTF-8 (UCS Transformation Format - 8 bit) was created to allow Unicode (a method of encoding characters so that each particular one has a different code sequence) the ability to be shown on the screen of operating systems. This initial method used an 8 byte storage method which would also be backward compatible with ASCII (American Standard Code for Information Interchange) which happens to be the standard method of creating characters on US keyboards. This standard is a variable width encoding which allows for all characters to be represented and the IETF (Internet Engineering Task Force) requires that all Internet protocols support UTF-8 so there is a standard method of interpreting documents and the like. Additionally the IMC (Internet Mail Consortium) requires that all email programs and servers support UTF-8. Basically this is being more and more used as a standard method of encoding characters and operating systems, programming languages and software. This is the background as to why there is a UTF-8. Incidentally there was originally a UTF-1 as well as UTF-16. They are just different methods of trying to be efficient but generally they were lacking in performance and portability. UTF-1 was cumbersome and slow. UTF-16 isn't widely used because there can be unexpected encoding stream problems.
With UTF-8 there is also something called BOM (Byte Order Mark). This helps the application determine if it is what may be called either Big Endian or Little Endian. This mark is generally at the beginning of the document so the application can properly determine how the character streams are encoded. This is not a requirement though. Because the byte streams for Unicode can be 16-bit or 32-bit the computer receiving these needs to know where they either begin or end so that it can be properly interpreted. Again this is not entirely required but would be nice to have. I think you'll recognize that from having to change your game (Left 4 Dead 2) from UTF-8 with BOM to just UTF-8. Since your game was expecting a BOM from the text being received through the game and it wasn't being provided it was trying to determine where by a default value and failing. Changing it to a generic UTF-8 solved that problem because the game didn't need to notice for the BOM at the beginning of the stream and just interpreted the characters according to the standard table, thus solving the problem in the game.
With that bit of background, UTF-8 is being used as the standard encoding method for most (if not all) documents and character encoding in the majority of applications and operating systems world wide. There is, however, some problems with implementation as I'll get to in a bit but some more background is needed to understand your problem. I promise it's worth the time to read and try understanding how this all happened.
Back when Microsoft was creating operating systems (and quite a few others) ASCII was used as a standard encoding for text in the majority of applications because it was the standard. The languages were not as readily available at the time (some were vastly incomplete or just wrong) and UTF-8 wasn't widely used and things moved along for a while until they became more popular and wanted to reach a vast greater number of customers. There had been language packs available which could be installed so that you would be able to interpret the character sets properly of other countries. I remember that Windows 95/98 needed additional disks in order to do this.
When Windows XP was released UTF-8 was used as the standard character interpretation for most applications. Especially Notepad. (See how we're getting to your question about that.)
Since XP had the language packs available on the multi-language installation media it would be no problem to install them if required. And as it stands now they still offer multi-language installation media. Usually when you purchase the OS from someone like newegg it is generally sending the US English version unless you specifically purchase the multi-language version. Regardless of this XP had the majority of it's applications using UTF-8 as a standard encoding for characters knowing that UTF-8 is backward compatible with ASCII. Notepad under XP does not suffer from the problem of that one character (the white heart) being interpreted as three lines. Because there is no real standard for plain text files the way applications encode from one to the other could be different. This may have some weird issues crop up such as bell sounds or screen changes where the cursor moves and the like.
Windows Vista/7 does not interpret this in the same way as XP did. There was a change made to a function call,
isTextUnicode, which makes it work differently. They made an adjustment to the way it falls back to other encoding to determine what character it is. Because they don't have a requirement for using the BOM (Byte Order Mark) Windows Vista/7 is more likely to interpret unknown characters as using either ANSI, American National Standards Institute, (CP 1252) rather than Unicode (UTF-16LE) within the document. With this it is more likely that the characters that can't be determined are shown incorrectly.
What does this all mean to me?" you ask. Well what it means is that if you want to correct the problem you'd have to use Notepad and save all the documents as UTF-8 and make sure to only have documents which are UTF-8 compliant. There is another solution that can be used and many programmers have developed text editors which properly interpret these changes of which I feel Microsoft should have left alone.
You can use an application called Notepad++ which will properly show these characters as it defaults to UTF-8. There are also many other applications of which can be used to replace Notepad all together.
Now to answer the obvious question about your browser. They use the standard UTF-8 encoding because of the IETF requirements and will correctly display them on your screen. Since that is a requirement for Internet applications and services it is independent of the operating systems. As far as the game goes it is shown, by your ability to force it to UTF-8 only, that the program accepts it properly but was looking for the BOM in order to know where to start and finish the stream. Since it wasn't being received it was falling back to the OS standard of misinterpreting it as ANSI. Since it was forced to a UTF-8 without the BOM it properly figures it all out.
I hope this is understandable in one shape or form. There is a lot of information there to sift through. As promised here is the 'short' explanation of the problem for those who skipped to the end.
----------------
tl;dr
UTF-8 is the standard worldwide which is being pushed. Microsoft made a change in the way Notepad interprets the characters and fails to do so properly. You'll need to install another text editor (like Notepad++) to fix that problem in text documents. No language pack is going to fix the error overall since they made a change to the program and not the default languages available.
post edited by James_L - 2012/08/31 08:54:27