

If we did have a Unicode enabled server, the files would be merge-able and exclusive checkout would not be required.Ĭan detect whether a string is UTF-8 by parsing it (with or without a BOM). This is why we validate text as ASCII during check-in. MSDev does not handle anything other than ASCII very well in Asian regions. Still works when the game detects the string is ASCII and outputs it as such. String operations more complicated have to parse the string to do something as simple as a length calculation. Is a superset of ASCII a plain ASCII string is a perfectly valid UTF-8 string. P4 type Unicode is not enabled on our Perforce server.

Has a different memory profile for Asian languages. Simple access to all characters we will ever need. Very limiting only ASCII characters allowed. P4 stores the entirety of each version, which can unnecessarily bloat the depot size. Internal format is not defined each file could be in a different format.

Requires all files of this type to be exclusive checkout. Internal format is not defined each file can be loaded no matter what format it is. (although can go to 4 bytes with astral characters) (P4 type UTF-16) (This is validated with a P4 trigger on check-in) The Case for Binary (a superset of ASCII) (P4 type Unicode) UTF-16Ī string made up of 2 bytes per character with a BOM. UTF-8Ī string made up of single bytes which can use special character sequences to get non-ANSI characters. Western European high ASCII) needs to be stored as binary on the P4 server. (P4 type text) (This is validated with a P4 trigger on check-in) ANSIĪSCII and the current codepage (e.g.

These are not the technical definitions of formats, but rather simplified versions suitable for this page.Ĭharacters between 32 and 126 inclusive, and 0, 9, 10, and 13. Understanding these formats and their inherent pros and cons can help in making decisions on what formats to use in your projects. There are several formats that can be used to represent text and strings. This document provides an overview of character encodings used by Unreal.Īssumed knowledge: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) Text Formats Notes about C++ Source Code Specific to East Asian Encodings ToUpper() and ToLower() Non-Trivial in Unicode Recommended Encoding for Text Files Used by Unreal
