As a Windows programmer used to code in Delphi, I must admit I wasn’t ready for how strings are handled in C, the language I’m currently learning. To say the least, it was a shock, and I stumbled upon many issues.
Among other things, I couldn’t understand how to use characters outside of the English alphabet,
such as é
or ¿
. While I work mostly on Windows, I wanted my solution to be as cross-platform
as possible. This meant I had to learn about character encoding and understand terms such as
ASCII, ANSI, Unicode, UTF-8, UTF-16, UTF-32, wide characters and
multibyte characters.
Through my research, I have digested a lot of information. While I’m not ready to make a recommendation on the best approach to cross-platform character encoding, I have found that the four articles below are, in order of importance, a must-read. However, they do assume basic knowledge of computer science and C. I hope you will find them as useful as I did.
- Unicode, Wide Characters, and All That
- wchar_t Is a Historical Accident
- Code Pages - Win32 app
- Use UTF-8 code pages in Windows apps
I will update this article as I gain more experience in the matter.