How do I use Unicode in Rev?
If you want to create stacks in languages other than English or the more common European languages then you need to use Unicode. Luckily Rev handles Unicode text very well in most cases. This lesson will cover the main tips and tricks you need to know in order to use Unicode in your stacks.
A brief introduction to Unicode
ASCII (the American Standard Code for Information Interchange) is a standard character encoding system which was developed in the 1960's. As ASCII is an 8-bit encoding it is limited to 128 code points numbered 0-127. ASCII assigns unique codes to upper and lower case Latin letters, numbers and common punctuation, ASCII is still widely used today and is consistent across all operating systems.
Unfortunately 128 code points are not nearly enough to represent all the characters in use across the world. In the 1980's the Unicode consortium created a character encoding standard designed to provide a way to display all of the world's languages by using a larger, 16-bit character table. The goal of Unicode was to assign each character in all the world's languages a unique code number.
Under the Unicode standard, there are several encoding systems, most notably UTF-8, UTF-16, and UTF-32. Revolution uses the UTF-16 encoding. However, Revolution has the ability to transcode between UTF-16 and several other common encodings.
For a more detailed introduction to Rev and Unicode have a look at the Unicode tutorial at
Typing Unicode into a field
This is nice and easy in Rev, Rev fields can handle Unicode text input without you needing to do anything because Rev simply uses the text input methods supplied by the host operating system. So if you want to type Russian characters into a field just select the Russian text input system and start typing, thats all there is to it!
Note on displaying right to left text
However, Unicode text input in Rev is not perfect. Rev has trouble rendering right-to-left languages like Hebrew and Arabic. This comes down to what is and isn't encoded as Unicode so
<arabic word 1> <non-unicode space><arabic word 2> would render incorrectly as
<arabic word 1> <arabic word 2>
<arabic word 1> <unicode space><arabic word 2> would render correctly as
<arabic word 2> <arabic word 1>
So when display right-to-left text you need to be careful and ensure that Unicode encodings are not lost.
When using non Unicode fields you can move text and chunks of text between them by just referring to the field (1 and 2)
put field 1 into field 2
But if you want to put unicodeText into a field you need to set its unicodeText property (3 and 4)
set the unicodeText of field 2 to the unicodeText of field 1
Using character chunk expressions with unicodeText
When manipulating Unicode text in Rev you can use character chunk expressions just as you usually would, but you must remember to use the unicodeText of the field rather than just a reference to the field or the text of the field.
set the unicodeText of field 2 to character 1 to 10 of the unicodeText of field 1
Remember each Unicode character is represented by 2 bytes so to display a single character you need to get 2 characters of the unicodeText. Thats why what looks like 5 characters is character 1 to 10.
Other types of chunk expression such as word and item cannot be used directly with unicodeText but they can be used, we will look at how later in the lesson.
Unicode and buttons
So far we have only discussed Unicode in fields, what if you want to use Unicode text to label a button? You need to use a different method for this as buttons don't have a unicodeText property.
Instead we need to set the textFont of the button to a Unicode font and then set the label of the button, Unicode textFont names in Rev are in the form
font name, language
font name is the name of an installed font and language is the name of the language you want to display, for displaying Unicode text it should always be "unicode". To use Russian text as the label for a button we would set the textFont to "Arial,Unicode".
set the textFont of button 1 to "Arial,Unicode"
set the label of button 1 to the unicodeText of field 1
Single and Double Byte Encoding and UTF-8
UTF-8 is the best method to store and transfer Unicode text in Rev. UTF-8 is part of the Unicode standard, and is a way to store Unicode (double-byte) text in an ASCII (single-byte) text file this is especially important for encoding Unicode text for use in web browsers and email.
They keys to using UTF-8 text within Rev are the uniEncode() and uniDecode() functions. These funtions convert between single-byte and double-byte characters so you can convert UTF-8 to Unicode and then display it in a field or as a button label.
Importing text from a UTF-8 file
To import text from a UTF-8 file and display it in our stack we just need to read it in, uniEncode() it and then display it as we learnt earlier.
put url ("file:greek.txt") into tUnicodeText
set the unicodeText of field 1 to uniEncode(tUnicodeText,"UTF8")
If we want to export text from our stack we simply go in the other direction
put uniDecode(the unicodeText of field 1,"UTF8") into tUTF8Text
put tUTF8Text into url("file:greek2.txt")
Using chunk experssions with UTF-8
A big advantage of UTF-8 is that it preserves ASCII characters, so it preserves the default chunk characters - return, space, tab and comma. This means that, by using UTF-8 as an intermediate stage, we can use the chunk expressions that don't work directly with unicodeText.
put uniDecode(the unicodeText of field 1,"UTF8") into tText
set the unicodeText of field 2 to uniEncode(word 1 of tText,"UTF8")