Tokenize (split) string


int tok(string arr [n] [delim] [flags] [arr2])



string - string to tokenize. Usually str variable.

arr - receives tokens. Variable of type ARRAY(str) or ARRAY(lpstr). Also can be pointer-based array of str or lpstr. Can be 0 if don't need.

n - max number of tokens required. If omitted or -1, gets all.

delim - delimiters.



Modify string: substitute first delimiter character after each token to 0.

  • It is useful when arr is array of lpstr.
  • string must be of type str, not lpstr.

If there are more than n tokens, get whole right part as last (n-1 th) token.

  • For example, if string is "a b c" and n is 2, you will get "a" and "b c" instead of "a" and "b".

Don't split parts enclosed in " " (double quotation marks).

  • For example, tok "a, ''b, c''" a -1 ", ''" 4 gets "a" and "b, c", not "a", "b", "c".
8 Don't split parts enclosed in ( ).
16 Don't split parts enclosed in [ ].
32 Don't split parts enclosed in { }.
64 Don't split parts enclosed in < >.
128 Don't split parts enclosed in ' '.
0x100 delim is table of delimiters.

QM 2.3.1. Recursive parsing of parts enclosed in ()[]{}<>.

  • For example, when parsing string "<a (b > c) d>" with flags 8|64, you would get 3 tokens: "a (b ", "c" and "d". With flags 8|64|0x200 will be 1 token: "a (b > c) d".

QM 2.3.1. Don't apply this default behavior of parsing parts enclosed in ()[]{}<>:

1. Characters )]}> in parts enclosed in "" are ignored.

2. A single character )]}> enclosed in ' ' is ignored.

0x1000 QM 2.3.3. Delimiters are blanks (space, tab, new line, control characters) and delim characters.

QM 2.3.5. Always trim blanks around tokens. Also removes blank tokens.

  • For example, tok " a , b " a -1 "," 0x2000 gets "a" and "b", not " a " and " b ".

arr2 - array for parts between (after) tokens. Will have same length as arr. Can be 0 if don't need.



Parses string and stores tokens in arr. Returns number of tokens.


If arr is array of str, it receives copies of tokens. If it is array of lpstr, it receives pointers to tokens within string; it is faster.


QM 2.3.5. Applies flags 4-128 even if delim does not contain these characters. Then tokens include these characters.


QM 2.3.5. Fixed bug: flags 4-128 ignored when the enclosed part is preceded by a non-delimiter character.



Although tok can be used to get lines of a multiline string, there are simpler ways. See example3, foreach, findl, str.getl.

To parse strings also can be used regular expressions (findrx, str.replacerx) and other string functions, like find, findc, findw.



str s = "one two three"
ARRAY(str) arr
int i nt
nt = tok(s arr)
for(i 0 nt) out arr[i]



str s = "one, (two + three) four five"
ARRAY(str) arr arr2
int i nt
nt = tok(s arr 3 ", ()" 8 arr2)
for(i 0 nt) out "'%s' '%s'" arr[i] arr2[i]
 'one' ', ('
 'two + three' ') '
 'four' ' '



str s = "one[]two[]three"
ARRAY(str) arr = s
for(int'i 0 arr.len) out arr[i]



str s="abcdef"
int i
 Split s into characters as strings:
ARRAY(str) a.create(s.len)
for(i 0 a.len) a[i].get(s i 1)
 Split s into characters as character codes:
ARRAY(int) b.create(s.len)
for(i 0 b.len) b[i]=s[i]