You are here: Content Creation > Audience.data > Language Support

Language Support

Audience supports non-English languages by using the UTF-8 character set support.

UTF-8 stands for Unicode Transformation Format-8. It is an octet (8-bit) lossless encoding of Unicode characters. All non-English data read by Audience.data must be formatted as UTF-8.

Accented characters will not be processed unless the input data file has the UTF-8 header at the start of file.

The UTF-8 header characters are (hexadecimal) EF BB BF.Viewed as text they look like: "".

A simple way to save a text file as UTF-8 is to open it with Notepad, then "Save As" the same name, but change the Encoding type from ANSI to UTF-8.

BIG-5 encoding of Unicode (mostly for Chinese language support) is also supported. Internally the data is converted to UTF-8.