Unicode, UTF and Wide Characters: What Does It All Mean?

As a Windows programmer used to code in Delphi, I must admit I wasn’t ready for how strings are handled in C, the language I’m currently learning. To say the least, it was a shock, and I stumbled upon many issues.

Among other things, I couldn’t understand how to use characters outside of the English alphabet, such as é or ¿. While I work mostly on Windows, I wanted my solution to be as cross-platform as possible. This meant I had to learn about character encoding and understand terms such as ASCII, ANSI, Unicode, UTF-8, UTF-16, UTF-32, wide characters and multibyte characters.

Through my research, I have digested a lot of information. While I’m not ready to make a recommendation on the best approach to cross-platform character encoding, I have found that the four articles below are, in order of importance, a must-read. However, they do assume basic knowledge of computer science and C. I hope you will find them as useful as I did.

I will update this article as I gain more experience in the matter.