Regular expression options

The following flags bits can be used with functions findrx and str.replacerx:

 

RX_CASELESS 0x100 (same as flag 1)

 

If this bit is set, letters in the pattern match both upper and lower case letters. It is equivalent to Perl's /i option, and it can be changed within a pattern by a (?i) option setting.

 

RX_MULTILINE 0x200 (same as flag 8)

 

By default, PCRE treats the subject string as consisting of a single "line" of characters (even if it actually contains several newlines). The "start of line" metacharacter (^) matches only at the start of the string, while the "end of line" metacharacter ($) matches only at the end of the string, or before a terminating newline (unless RX_DOLLAR_ENDONLY is set). This is the same as Perl.

When RX_MULTILINE it is set, the "start of line" and "end of line" constructs match immediately following or immediately before any newline in the subject string, respectively, as well as at the very start and end. This is equivalent to Perl's /m option, and it can be changed within a pattern by a (?m) option setting. If there are no "\n" characters in a subject string, or no occurrences of ^ or $ in a pattern, setting RX_MULTILINE has no effect.

 

RX_DOTALL 0x400

 

If this bit is set, a dot metacharater in the pattern matches all characters, including newlines. Without it, newlines are excluded. This option is equivalent to Perl's /s option, and it can be changed within a pattern by a (?s) option setting. A negative class such as [^a] always matches a newline character, independent of the setting of this option.

 

RX_EXTENDED 0x800

 

If this bit is set, whitespace data characters in the pattern are totally ignored except when escaped or inside a character class. Whitespace does not include the VT character (code 11). In addition, characters between an unescaped # outside a character class and the next newline character, inclusive, are also ignored. This is equivalent to Perl's /x option, and it can be changed within a pattern by a (?x) option setting.

This option makes it possible to include comments inside complicated patterns. Note, however, that this applies only to data characters. Whitespace characters may never appear within special character sequences in a pattern, for example within the sequence (?( which introduces a conditional subpattern.

 

RX_ANCHORED 0x1000

 

If this bit is set, the pattern is forced to be "anchored", that is, it is constrained to match only at the first matching point in the string which is being searched (the "subject string"). This effect can also be achieved by appropriate constructs in the pattern itself, which is the only way to do it in Perl.

 

RX_DOLLAR_ENDONLY 0x2000

 

If this bit is set, a dollar metacharacter in the pattern matches only at the end of the subject string. Without this option, a dollar also matches immediately before the final character if it is a newline (but not before any other newlines). The RX_DOLLAR_ENDONLY option is ignored if RX_MULTILINE is set. There is no equivalent to this option in Perl, and no way to set it within a pattern.

 

RX_EXTRA 0x4000

 

This option was invented in order to turn on additional functionality of PCRE that is incompatible with Perl, but it is currently of very little use. When set, any backslash in a pattern that is followed by a letter that has no special meaning causes an error, thus reserving these combinations for future expansion. By default, as in Perl, a backslash followed by a letter with no special meaning is treated as a literal. There are at present no other features controlled by this option. It can also be set by a (?X) option setting within a pattern.

 

RX_NOTBOL 0x8000

 

The first character of the string is not the beginning of a line, so the circumflex metacharacter should not match before it. Setting this without RX_MULTILINE (at compile time) causes circumflex never to match.

 

RX_NOTEOL 0x10000

 

The end of the string is not the end of a line, so the dollar metacharacter should not match it nor (except in multiline mode) a newline immediately before it. Setting this without RX_MULTILINE (at compile time) causes dollar never to match.

 

RX_UNGREEDY 0x20000

 

This option inverts the "greediness" of the quantifiers so that they are not greedy by default, but become greedy if followed by "?". It is not compatible with Perl. It can also be set by a (?U) option setting within the pattern.

 

RX_NOTEMPTY 0x40000

 

An empty string is not considered to be a valid match if this option is set. If there are alternatives in the pattern, they are tried. If all the alternatives match the empty string, the entire match fails.

 

RX_UTF8 0x80000 (currently not supported in QM)

 

This option causes to regard both the pattern and the subject as strings of UTF-8 characters instead of single-byte character strings.

 

RX_NO_AUTO_CAPTURE 0x100000

 

If this option is set, it disables the use of numbered capturing parentheses in the pattern. Any opening parenthesis that is not followed by ? behaves as if it were followed by ?: but named parentheses can still be used for capturing (and they acquire numbers in the usual way). There is no equivalent of this option in Perl.

 

RX_NO_UTF8_CHECK 0x200000 (currently not supported in QM)

 

Don't check if UTF-8 is valid. Faster but can be dangerous.