What is Unicode? And does SPSS support Unicode?
Resolving the problem
Unicode provides a unique number for each and every character, which helps in avoiding conversion issues on computer platforms (see www.unicode.org).
Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters: for example, the European Union alone requires several different encodings to cover all its languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use.
These encoding systems also conflict with one another. That is, two encodings can use the same number for two different characters, or use different numbers for the same character. Any given computer (especially servers) needs to support many different encodings; yet whenever data is passed between different encodings or platforms, that data always runs the risk of corruption. Because the number of characters in some encodings can be too large to hold in a single byte, characters can, in these encodings, occupy either one or two bytes, and this must be considered in operating on characters.
IBM SPSS Development is working on implementing Unicode throughout our product lines. Certain SPSS output is already produced in Unicode: generally XML files, charts, and html files. SPSS 16.0 introduced Unicode (UTF-8) to Data and Syntax. It is still possible to work in local encoding if you desire. For more information about local encoding please see solution 19634