Convert string to/from Unicode UTF-16

Syntax

s.unicode([ss] [codepage] [length])
s.ansi([ss] [codepage] [length])

Parameters

s - str variable.

ss - subject string. Default: s. With ansi, it can be string or word* or BSTR.

codepage (QM 2.3.0) - code page identifier. If omitted or negative, uses the code page that QM uses everywhere, which depends on whether QM is running in Unicode mode. Read more in Remarks.

length - number of characters to get from ss. If omitted or negative, gets whole ss. With unicode it must be number of bytes (even if there are multibyte characters). With ansi it must be number of 2-byte characters (even if there are 4-byte characters).

Remarks

To store Unicode text, often is used UTF-16 format, where characters consist of 2 bytes (sometimes 4). It is used with most COM functions, with Windows API functions whose names end with W, and with many other functions. However QM functions don't work with UTF-16 strings. They work with ANSI or UTF-8 strings. Therefore sometimes it is necessary to convert from/to UTF-16. Although normally str variables store ANSI or UTF-8 strings, they also can store UTF-16 strings.

unicode converts ss from ANSI or UTF-8 to UTF-16, and stores the result into s.

ansi converts ss from UTF-16 to ANSI or UTF-8, and stores the result into s.

QM 2.3.0. A BSTR variable can be simply passed to ansi. Previously you would have to use its pstr member.

If ss is a variable containing binary data (null characters), the functions get only the part of it until the first null character, unless you explicitly specify length.

Variables of BSTR type store text in UTF-16 format. To convert to/from BSTR, also can be used operator =. It uses default code page (CP_ACP in ANSI mode, CP_UTF8 in Unicode mode). Unlike ansi/unicode, it gets binary data too.

To convert to UTF-16 when calling dll functions, it is more convenient to use operator @. See example.

Note: For historical reasons these functions are incorrectly named, because UTF-8 actually is Unicode too. A better name for unicode would be something like toutf16, and for ansi - fromutf16. QM versions before QM 2.3.0 did not support UTF-8, so these names were good.

If QM is running in ANSI mode (Unicode unchecked in Options), default code page is CP_ACP (0). It is the current system Windows ANSI code page. To see the actual value you can use GetACP. If QM is running in Unicode mode (Unicode checked in Options), default code page is CP_UTF8. It is Unicode encoded in UTF-8 format.

To convert string from one ANSI or UTF-8 encoding to another ANSI or UTF-8 encoding, you can use str function ConvertEncoding or LoadUnicodeFile. See example.

Examples

int h; str s1 s2 s3
s1="QM_Editor"
s2="QM_Editor"

 call a function that uses UTF-16 string as an input parameter
h=FindWindowW(+s1.unicode 0)

 or you can use operator @
h=FindWindowW(@s2 0)

 or you can use operator L, but only with string constants
h=FindWindowW(L"QM_Editor" 0)

 call a function that uses UTF-16 string as an output parameter
BSTR b.alloc(300)
GetWindowTextW h b 300
s3.ansi(b) ;;note: don't use s3=b because it gets whole buffer

out s3

 convert from current QM encoding to iso-8859-1
s1.ConvertEncoding(_unicode 28591)