Unicode

Adapted from a newsletter article by Devin Asay

If you have ever tried to create stacks in a language other than English and the more common West European languages you may have run into the problem of how to produce all the character glyphs that the language requires. Fortunately, Unicode is there to help us out. The following lesson will teach you how to use Unicode in your Livecode stacks.

Note: This lesson has been tested and works as described on Mac and Windows platforms. Some aspects of it may not conform exactly as described on Linux.

Step 1: Typing unicode text into fields

This is a good place to start because it's the easiest. Livecode fields can handle Unicode text input without any intervention by the developer. That is because Livecode simply uses the text input methods supplied by the host operating system. So if you want to type Japanese characters into a field, you simply select the Japanese text input system you want to use and start typing. Livecode knows how to render it properly in the field, and it is then ready for use. If you want to learn how to select the text input method on your OS, see the help documentation for that OS.

However, there are a couple of issues with Unicode text input in Livecode. Livecode currently has trouble rendering right-to-left languages like Hebrew and Arabic while you are typing them. Specifically, it will properly render characters in a word from right to left, but when you type a space to begin a new word, the new word is inserted to the right of the previous word, not to the left as it should be. For this reason it is recommended that you create Hebrew and Arabic text outside of Livecode and import it, rather than trying to type them within Livecode.

By default text in Livecode is ASCII text. So let's first look at some of the ways Livecode provides for working with ASCII text encoding. We're all familiar with the rich collection of tools that Livecode provides for working with text. Among them are two functions, charToNum() and numToChar(), that allow us to work with the ASCII value for any character. They work like this (try it in the message box):

Step 2: Use of the charToNum and numToChar functions

put charToNum("a")
put numToChar(97)

You can use the numToChar() function to create a rudimentary ASCII table. Just create a new field, name it "ascii" and run this routine in the message box:

put empty into field "ascii"
repeat with i = 0 to 255
put i & tab & numToChar(i) & crlf after fld "ascii"
end repeat

That's how these two functions work by default. But you can tell Livecode to expect Unicode values for these two functions by first setting the useUnicode property to true.

An important thing to remember: The useUnicode property only affects the charToNum() and numToChar() functions. No other text operations are affected by this property.

Let's look at how this works in practice. Let's say you have a field "russText" containing the sentence Я люблю тебя. The sentence begins with the upper case Russian letter 'Я'. If you wanted to find out which Unicode code point corresponds to that letter you would do this:

set the useUnicode to true
put charToNum(char 1 to 2 of fld "russText")

Step 3: Using the useUnicode property

Conversely, to render a Unicode character using its code point do this - the letter 'Я' should appear in the field.

set the useUnicode to true
set the unicodeText of fld "russText" to numToChar(1071)

 

The unicodeText property.

The previous example is a good way to introduce another important tool for using Unicode in Livecode: the unicodeText property. If you want to move unicode text from field to field, you have to use this property. In the normal ASCII world you can just do this:

put field 1 into field 2

However, if you want to put Unicode text into a field you have to set its unicodeText property:

set the unicodeText of fld "newPlace" to the unicodeText of fld "oldPlace"

 

Step 4: Using the UnicodeText property

Another important thing to remember: The secret to manipulating Unicode text in fields lies in the unicodeText of the field.

So if you want to move chunks of text, you have to refer to chunks of the unicodeText:

1. Copying a Unicode character to another field -

set the unicodeText of fld "letter" to char 1 to 2 of fld "sentence"

2. Moving words -

set the unicodeText of fld "other" to word 1 to 2 of the unicodeText of fld "this"

3. Inserting Unicode text from one field into another -

get the unicodeText of fld "info"
set the unicodeText of fld "info" to it && word 2 of line 2 of the unicodeText of fld "bottom"

Step 5: Converting between single and double-byte encodings

When using Unicode text, especially if you are importing or exporting text from or to other systems or environments, you may need to convert your Unicode to a single-byte encoding system, or vice-versa. The most common reason for doing this is reading and writing UTF-8 files. As I mentioned above, I recommend storing your Unicode text in UTF-8 format if you are planning to share it with others or send it over the internet. UTF-8 is part of the Unicode standard, and is a way to store Unicode (double-byte) text in an ASCII (single-byte) text file. UTF-8 is especially important for encoding Unicode text for use in web browsers and email.

The keys to using UTF-8 text in Livecode are the uniEncode() and uniDecode() functions. Let's say you've gotten some UTF-8 text from a web site and you want to display it in your Livecode stack. You store it in a file called myUniText.ut8. This is how you would read it in:

put url ("binfile:/path/to/file/myUniText.ut8") into tRawTxt
set the unicodetext of fld "display" to uniencode(tRawTxt,"UTF8")

Conversely, to save Unicode text from Livecode to a UTF-8 file, use uniDecode():

get the unicodeText of fld "myUniText"
put unidecode(it,"utf8") into url "binfile:/path/to/file/myUniFile.ut8"

This is another important thing to remember: For reliably transporting Unicode text, convert it and store it as UTF-8 text.

Step 6: Using Unicode in buttons and menus

So far, we've only been talking about Unicode text in fields. Almost none of that applies to buttons, primarily because buttons have no unicodeText property. Instead, the basic approach for displaying Unicode text in buttons and menus consists of two steps:

1. Set the textFont of the button to a Unicode font;
2. Set the label of the button to the desired Unicode text.

Unicode font names in Livecode take the form Font Name,language, where Font Name is the name of any font installed on the system, and language is the name of the language you want, or the term "unicode". For example, for Russian Cyrillic text I might use "Arial,Russian" as the font name; for Japanese, "Osaka,Japanese"; and for Greek, "Geneva,Unicode". Not every language can be used as the second part of a Unicode font name. For a complete list of valid language names see the Livecode Dictionary entry for uniEncode.

One way to assign a Unicode label to a button is to reference some existing Unicode text in a hidden field. Let's say, for example, that we are making a stack for Mandarin Chinese speakers and we want to give our Start button a Chinese label, 開始. We could type or import the Unicode text to a field and use that field as the source text for the button label:

set the textFont of button "start" to "BiauKai,Chinese"
set the label of button "start" to the unicodeText of fld "hiddenChinText"

Note: As of LiveCode 5.5 DP2 it is not longer posible to set the unicode text by marking the text font property with a unicode tag. The unicode nature of of text and/or label properties of buttons, groups and graphics is now an intrinsic property that cannot be altered by script. You can copy and paste unicode text into the label fields in the property inspector instead.

One technique that works well for creating Unicode button labels is to store the Unicode label text in a custom property of the button. When yout do this, store it as UTF-8 text to avoid the byte order problem when moving the stack from machine to machine. So first you would store the unicode text in a custom property:

set the chinLabel of button "start" to unidecode(the unicodeText of fld "hiddenChinText","UTF8")

Once that was in place you would use the custom property as the source of the Unicode text:

set the textFont of button "start" to "BiauKai,Chinese"
set the label of button "start" to uniencode(the chinLabel of btn "start","UTF8")

One more note on Unicode buttons: Because Unicode text doesn't always "travel" well from platform to platform, I usually set Unicode button labels and menu contents each time I go to the card, in a preOpenCard handler.

Step 7: Using Unicode in Ask and Answer dialogs

Ask and answer dialog prompts can have Unicode prompts, but you can't pass Unicode text in the ask and answer command arguments. Instead you use another handy technique for setting Unicode text—store the Unicode as entities in HTML text. Storing the htmlText of a field that contains Unicode text is another reliable way of keeping the Unicode text intact during transfers. It also is the only way to display Unicode text in ask and answer dialog prompts.

To see what this means, let's look at the Chinese start button example above. In the first case we had the Unicode Chinese text 開始 in a text field "hiddenChinText". If I were to examine the htmlText of this field it would look something like this:

開始

Notice that the two Chinese characters are embedded in the htmlText as Unicode entities: 開 and 始. HTML Unicode entities like this will reliably render as the proper Unicode characters in Livecode, regardless of the operating system the stack is running on. So to use Unicode characters in ask and answer prompts, do something like this:

put the htmlText of fld "hiddenChinText" into tChinPrompt
answer tChinPrompt with "Cancel" or "OK"

 

There is one other advantage of saving Unicode text as HTML entities—it is the best way to save Unicode text with text styles like bold and italic and font attributes like size and color.

Step 8: Setting a Unicode stack title

I'll conclude this lesson with one more piece of functionality in Livecode—the ability to use Unicode text for title of the stack window. Just set the unicodeTitle property of the stack to a valid unicode string. Here's an example:

set the unicodeTitle of this stack to the unicodeText of fld "russTitle"

 

Comments (11)

Alejandro Tejada Capellan Tuesday Oct 12 at 11:23 PM

Excellent tutorial! Many thanks, Devin.

Robert Man Monday Feb 27 at 07:36 PM

a few hints to use a livecode app as a front end of a mySql dBase relying on a stream of utf8 text data, in the on-rev context :

1) In PHPAdmin : make sure all fields of your dbase are set to utf8 encoding. On phpmyadmin @ on-rev I had to precisely reset this for each text field.

2) In your connection script, when you connect to the dbase and get the transactionId include a :
revExecuteSql transactionId, "SET NAMES utf8"
(and check in the result all is going ok for the first time!)

This makes sure that mysql knows that it receives utf8 data and has to send back utf8 data.

3) Then you can query your data, put it into a var, and set the unicodeText of your fields with uniEncode(myVar,"utf8)

4) The user can edit the fields, within windows or mac os x

5) Last you can send your data back from livecode to the server using put the uniDecode (the unicodetext of myField, "utf8") into myReadyToSendVar !

If this can save you a few hours... !

John Sunday Jul 22 at 08:35 AM

Thanks Robert for the advice!

David Silman Monday Aug 13 at 02:28 PM

Can someone re-do these tutorials as by your own declaration this tutorial is well, wrong now.
Just look at page 11
http://www.runrev.com/downloads/livecode/5_5_1/LiveCodeNotes-5_5_1.pdf

Hanson Schmidt-Cornelius Friday Aug 17 at 11:19 AM

Hi David,

indeed, changes made at LiveCode 5.5 DP2 alter the behaviour of assigning Unicode to buttons. This impacts step 6 of this lesson only. All other steps of this lesson remain unchanged.

A note has been added to step 6 to indicate the changes that were made.

Kind Regards,

Hanson

Cannix Abura Monday Sep 16 at 03:42 PM

我们希望code editor能够支持double-byte

Example:
set the label of button "Start" to "开始使用"

set the label of button "Start" to "????"

cannix abura Tuesday Sep 24 at 03:32 PM

Hi,
Code Edit can not directly use Double-byte string. In the Property Inspector -> Custom Properties -> Property Contents can not type Chinese Japanese Korean,
Hoping to correct, Thanks!

Hanson Schmidt-Cornelius Friday Sep 27 at 12:45 PM

Hi Cannix,

as of LiveCode 5.5 DP2 it is no longer possible to set the unicode text by marking the text font property with a unicode tag. You can copy and paste unicode text into the label fields in the property inspector instead.

KInd Regards,

Hanson

Hanson Schmidt-Cornelius Friday Sep 27 at 12:47 PM

Hi Cannix,

we are in the middle restructuring LiveCode to give developers more freedom in the use of LiveCode. The progress on this is pretty good. If you are on our mailing lists, then I am sure we will update you when our updated unicode support is ready to be shipped.

Kind Regards,

Hanson

Ince István Wednesday Nov 06 at 12:38 PM

Hi.

About this restructuring, can we find information about the stage it is, when is planed to be publicly(not in dev or RC builds) available?

Kind Regards,
István

Hanson Schmidt-Cornelius Monday Dec 09 at 03:34 PM

Hi Ince,

I cannot give you concrete deadlines on when we expect to complete implementing our unicode support. Unicode is now one of our key features to be included and we have several developers working on it at the moment.

Kind Regards,

Hanson

Add your comment

E-Mail me when someone replies to this comment