U:RDoc::TopLevel[ iI"regexp.rdoc:EFcRDoc::Parser::Simpleo:RDoc::Markup::Document:@parts[�o:RDoc::Markup::Paragraph;[I"JRegular expressions (regexps) are patterns which describe the ;TI"Pcontents of a string. They're used for testing whether a string contains a ;TI"Lgiven pattern, or extracting the portions that match. They are created ;TI"1with the /pat/ and ;TI"J%r{pat} literals or the Regexp.new ;TI"constructor.;To:RDoc::Markup::BlankLineo; ;[I"JA regexp is usually delimited with forward slashes (/). For ;TI" example:;T@o:RDoc::Markup::Verbatim;[I"!/hay/ =~ 'haystack' #=> 0 ;TI"0/y/.match('haystack') #=> # ;T:@format0o; ;[I"LIf a string contains the pattern it is said to match. A literal ;TI"string matches itself.;T@o; ;[I"PHere 'haystack' does not contain the pattern 'needle', so it doesn't match:;T@o;;[I"(/needle/.match('haystack') #=> nil ;T;0o; ;[I"?Here 'haystack' contains the pattern 'hay', so it matches:;T@o;;[I"7/hay/.match('haystack') #=> # ;T;0o; ;[I"NSpecifically, /st/ requires that the string contains the letter ;TI"D_s_ followed by the letter _t_, so it matches _haystack_, also.;T@S:RDoc::Markup::Heading: leveli: textI"!=~ and Regexp#match;T@o; ;[I"TPattern matching may be achieved by using =~ operator or Regexp#match ;TI"method.;T@S; ;i;I"=~ operator;T@o; ;[I"S=~ is Ruby's basic pattern-matching operator. When one operand is a ;TI"Qregular expression and the other is a string then the regular expression is ;TI"Tused as a pattern to match against the string. (This operator is equivalently ;TI"Sdefined by Regexp and String so the order of String and Regexp do not matter. ;TI"SOther classes may have different implementations of =~.) If a match ;TI"Qis found, the operator returns index of first match in string, otherwise it ;TI"returns +nil+.;T@o;;[ I"!/hay/ =~ 'haystack' #=> 0 ;TI"!'haystack' =~ /hay/ #=> 0 ;TI"!/a/ =~ 'haystack' #=> 1 ;TI"#/u/ =~ 'haystack' #=> nil ;T;0o; ;[I"PUsing =~ operator with a String and Regexp the $~ global ;TI"Nvariable is set after a successful match. $~ holds a MatchData ;TI"$~.;T@S; ;i;I"Regexp#match method;T@o; ;[I"2The #match method returns a MatchData object:;T@o;;[I"4/st/.match('haystack') #=> # ;T;0S; ;i;I"Metacharacters and Escapes;T@o; ;[ I"EThe following are metacharacters (, ), ;TI"M[, ], {, }, ., ?, ;TI"N+, *. They have a specific meaning when appearing in a ;TI"Opattern. To match them literally they must be backslash-escaped. To match ;TI"Ba backslash literally backslash-escape that: \\\\\\.;T@o;;[I"K/1 \+ 2 = 3\?/.match('Does 1 + 2 = 3?') #=> # ;T;0o; ;[I"HPatterns behave like double-quoted strings so can contain the same ;TI"backslash escapes.;T@o;;[I"5/\s\u{6771 4eac 90fd}/.match("Go to 東京都") ;TI"' #=> # ;T;0o; ;[I"GArbitrary Ruby expressions can be embedded into patterns with the ;TI"#{...} construct.;T@o;;[I"place = "東京都" ;TI")/#{place}/.match("Go to 東京都") ;TI"& #=> # ;T;0S; ;i;I"Character Classes;T@o; ;[ I"MA character class is delimited with square brackets ([, ;TI"K]) and lists characters that may appear at that point in the ;TI"Pmatch. /[ab]/ means _a_ or _b_, as opposed to /ab/ which ;TI"means _a_ followed by _b_.;T@o;;[I"8/W[aeiou]rd/.match("Word") #=> # ;T;0o; ;[I"IWithin a character class the hyphen (-) is a metacharacter ;TI"Ndenoting an inclusive range of characters. [abcd] is equivalent ;TI"Eto [a-d]. A range can be followed by another range, so ;TI"P[abcdwxyz] is equivalent to [a-dw-z]. The order in which ;TI"Hranges or individual characters appear inside a character class is ;TI"irrelevant.;T@o;;[I"1/[0-9a-f]/.match('9f') #=> # ;TI"1/[9f]/.match('9f') #=> # ;T;0o; ;[I"MIf the first character of a character class is a caret (^) the ;TI"Fclass is inverted: it matches any character _except_ those named.;T@o;;[I"1/[^a-eg-z]/.match('f') #=> # ;T;0o; ;[ I"KA character class may contain another character class. By itself this ;TI"Hisn't useful because [a-z[0-9]] describes the same set as ;TI"P[a-z0-9]. However, character classes also support the && ;TI"Ooperator which performs set intersection on its arguments. The two can be ;TI"combined as follows:;T@o;;[I"2/[a-w&&[^c-g]z]/ # ([a-w] AND ([^c-g] OR z)) ;T;0o; ;[I"This is equivalent to:;T@o;;[I"/[abh-w]/ ;T;0o; ;[I"EThe following metacharacters also behave like character classes:;T@o:RDoc::Markup::List: @type:BULLET:@items[o:RDoc::Markup::ListItem:@label0;[o; ;[I"3/./ - Any character except a newline.;To;;0;[o; ;[I"L/./m - Any character (the +m+ modifier enables multiline mode);To;;0;[o; ;[I"=/\w/ - A word character ([a-zA-Z0-9_]);To;;0;[o; ;[I"D/\W/ - A non-word character ([^a-zA-Z0-9_]). ;TI"RPlease take a look at {Bug #4044}[https://bugs.ruby-lang.org/issues/4044] if ;TI"7using /\W/ with the /i modifier.;To;;0;[o; ;[I"7/\d/ - A digit character ([0-9]);To;;0;[o; ;[I"</\D/ - A non-digit character ([^0-9]);To;;0;[o; ;[I"@/\h/ - A hexdigit character ([0-9a-fA-F]);To;;0;[o; ;[I"E/\H/ - A non-hexdigit character ([^0-9a-fA-F]);To;;0;[o; ;[I"E/\s/ - A whitespace character: /[ \t\r\n\f\v]/;To;;0;[o; ;[I"J/\S/ - A non-whitespace character: /[^ \t\r\n\f\v]/;T@o; ;[ I"MPOSIX bracket expressions are also similar to character classes. ;TI"NThey provide a portable alternative to the above, with the added benefit ;TI"Kthat they encompass non-ASCII characters. For instance, /\d/ ;TI"Qmatches only the ASCII decimal digits (0-9); whereas /[[:digit:]]/ ;TI"8matches any character in the Unicode _Nd_ category.;T@o;;;;[o;;0;[o; ;[I">/[[:alnum:]]/ - Alphabetic and numeric character;To;;0;[o; ;[I"2/[[:alpha:]]/ - Alphabetic character;To;;0;[o; ;[I"*/[[:blank:]]/ - Space or tab;To;;0;[o; ;[I"//[[:cntrl:]]/ - Control character;To;;0;[o; ;[I"#/[[:digit:]]/ - Digit;To;;0;[o; ;[I"L/[[:graph:]]/ - Non-blank character (excludes spaces, control ;TI"characters, and similar);To;;0;[o; ;[I">/[[:lower:]]/ - Lowercase alphabetical character;To;;0;[o; ;[I"N/[[:print:]]/ - Like [:graph:], but includes the space character;To;;0;[o; ;[I"3/[[:punct:]]/ - Punctuation character;To;;0;[o; ;[I"Q/[[:space:]]/ - Whitespace character ([:blank:], newline, ;TI"carriage return, etc.);To;;0;[o; ;[I"4/[[:upper:]]/ - Uppercase alphabetical;To;;0;[o; ;[I"L/[[:xdigit:]]/ - Digit allowed in a hexadecimal number (i.e., ;TI"0-9a-fA-F);T@o; ;[I"BRuby also supports the following non-POSIX character classes:;T@o;;;;[o;;0;[o; ;[I"I/[[:word:]]/ - A character in one of the following Unicode ;TI"4general categories _Letter_, _Mark_, _Number_, ;TI"!Connector_Punctuation;To;;0;[o; ;[I"D/[[:ascii:]]/ - A character in the ASCII character set;T@o;;[ I"3# U+06F2 is "EXTENDED ARABIC-INDIC DIGIT TWO" ;TI"B/[[:digit:]]/.match("\u06F2") #=> # ;TI"C/[[:upper:]][[:lower:]]/.match("Hello") #=> # ;TI"C/[[:xdigit:]][[:xdigit:]]/.match("A6") #=> # ;T;0S; ;i;I"Repetition;T@o; ;[I"KThe constructs described so far match a single character. They can be ;TI"Pfollowed by a repetition metacharacter to specify how many times they need ;TI"Ato occur. Such metacharacters are called quantifiers.;T@o;;;;[o;;0;[o; ;[I"$* - Zero or more times;To;;0;[o; ;[I"#+ - One or more times;To;;0;[o; ;[I".? - Zero or one times (optional);To;;0;[o; ;[I":{n} - Exactly n times;To;;0;[o; ;[I";{n,} - n or more times;To;;0;[o; ;[I";{,m} - m or less times;To;;0;[o; ;[I"L{n,m} - At least n and ;TI"at most m times;T@o; ;[I"NAt least one uppercase character ('H'), at least one lowercase character ;TI"-('e'), two 'l' characters, then one 'o':;T@o;;[I"M"Hello".match(/[[:upper:]]+[[:lower:]]+l{2}o/) #=> # ;T;0o; ;[ I"MRepetition is greedy by default: as many occurrences as possible ;TI"Gare matched while still allowing the overall match to succeed. By ;TI"Hcontrast, lazy matching makes the minimal amount of matches ;TI"Onecessary for overall success. A greedy metacharacter can be made lazy by ;TI""following it with ?.;T@o; ;[I"QBoth patterns below match the string. The first uses a greedy quantifier so ;TI"O'.+' matches ''; the second uses a lazy quantifier so '.+?' matches ;TI"'':;T@o;;[I"7/<.+>/.match("") #=> #"> ;TI"4/<.+?>/.match("") #=> #"> ;T;0o; ;[ I"NA quantifier followed by + matches possessively: once it ;TI"Mhas matched it does not backtrack. They behave like greedy quantifiers, ;TI"Jbut having matched they refuse to "give up" their match even if this ;TI"#jeopardises the overall match.;T@S; ;i;I"Capturing;T@o; ;[ I"LParentheses can be used for capturing. The text enclosed by the ;TI"Pn^th group of parentheses can be subsequently referred to ;TI"Bwith n. Within a pattern use the backreference ;TI"-\n; outside of the pattern use ;TI"+MatchData[n].;T@o; ;[I"P'at' is captured by the first group of parentheses, then referred to later ;TI"with \1:;T@o;;[I" # ;T;0o; ;[I"KRegexp#match returns a MatchData object which makes the captured text ;TI"#available with its #[] method:;T@o;;[I"H/[csh](..) [csh]\1 in/.match("The cat sat in the hat")[1] #=> 'at' ;T;0o; ;[I"ECapture groups can be referred to by name when defined with the ;TI"N(?<name>) or (?'name') ;TI"constructs.;T@o;;[I"7/\$(?\d+)\.(?\d+)/.match("$3.67") ;TI"8 => # ;TI"I/\$(?\d+)\.(?\d+)/.match("$3.67")[:dollars] #=> "3" ;T;0o; ;[I"PNamed groups can be backreferenced with \k<name>, ;TI"$where _name_ is the group name.;T@o;;[I">/(?[aeiou]).\k.\k/.match('ototomy') ;TI", #=> # ;T;0o; ;[I"B*Note*: A regexp can't use named backreferences and numbered ;TI"#backreferences simultaneously.;T@o; ;[I"OWhen named capture groups are used with a literal regexp on the left-hand ;TI"Nside of an expression and the =~ operator, the captured text is ;TI"?also assigned to local variables with corresponding names.;T@o;;[I"9/\$(?\d+)\.(?\d+)/ =~ "$3.67" #=> 0 ;TI"dollars #=> "3" ;T;0S; ;i;I" Grouping;T@o; ;[I"OParentheses also group the terms they enclose, allowing them to be ;TI"+quantified as one atomic whole.;T@o; ;[I"EThe pattern below matches a vowel followed by 2 word characters:;T@o;;[I"K/[aeiou]\w{2}/.match("Caenorhabditis elegans") #=> # ;T;0o; ;[I"QWhereas the following pattern matches a vowel followed by a word character, ;TI"5twice, i.e. [aeiou]\w[aeiou]\w: 'enor'.;T@o;;[I"6/([aeiou]\w){2}/.match("Caenorhabditis elegans") ;TI"( #=> # ;T;0o; ;[ I"GThe (?:...) construct provides grouping without ;TI"Pcapturing. That is, it combines the terms it contains into an atomic whole ;TI"Owithout creating a backreference. This benefits performance at the slight ;TI"expense of readability.;T@o; ;[I"QThe first group of parentheses captures 'n' and the second 'ti'. The second ;TI"Cgroup is referred to later with the backreference \2:;T@o;;[I"2/I(n)ves(ti)ga\2ons/.match("Investigations") ;TI"8 #=> # ;T;0o; ;[I"OThe first group of parentheses is now made non-capturing with '?:', so it ;TI"Hstill matches 'n', but doesn't create the backreference. Thus, the ;TI"2backreference \1 now refers to 'ti'.;T@o;;[I"4/I(?:n)ves(ti)ga\1ons/.match("Investigations") ;TI"2 #=> # ;T;0S; ;i;I"Atomic Grouping;T@o; ;[ I"-Grouping can be made atomic with ;TI"P(?>pat). This causes the subexpression pat ;TI"Nto be matched independently of the rest of the expression such that what ;TI"Pit matches becomes fixed for the remainder of the match, unless the entire ;TI"Isubexpression must be abandoned and subsequently revisited. In this ;TI"Lway pat is treated as a non-divisible whole. Atomic grouping is ;TI"Ftypically used to optimise patterns so as to prevent the regular ;TI"4expression engine from backtracking needlessly.;T@o; ;[ I"TThe " in the pattern below matches the first character of the string, ;TI"Tthen .* matches Quote". This causes the overall match to fail, ;TI"Nso the text matched by .* is backtracked by one position, which ;TI"Kleaves the final character of the string available to match ";T@o;;[I">/".*"/.match('"Quote"') #=> # ;T;0o; ;[I"RIf .* is grouped atomically, it refuses to backtrack Quote", ;TI"8even though this means that the overall match fails;T@o;;[I")/"(?>.*)"/.match('"Quote"') #=> nil ;T;0S; ;i;I"Subexpression Calls;T@o; ;[ I"GThe \g<name> syntax matches the previous ;TI"Msubexpression named _name_, which can be a group name or number, again. ;TI"NThis differs from backreferences in that it re-executes the group rather ;TI"2than simply trying to re-match the same text.;T@o; ;[I"TThis pattern matches a ( character and assigns it to the paren ;TI"Rgroup, tries to call that the paren sub-expression again but fails, ;TI"%then matches a literal ):;T@o;;[I"-/\A(?$\g*$)*\z/ =~ '()' ;TI" ;TI"5/\A(?$\g*$)*\z/ =~ '(())' #=> 0 ;TI" # ^1 ;TI"# ^2 ;TI"# ^3 ;TI"# ^4 ;TI"# ^5 ;TI"# ^6 ;TI"# ^7 ;TI" # ^8 ;TI" # ^9 ;TI"%# ^10 ;T;0o;;:NUMBER;[o;;0;[o; ;[I"CMatches at the beginning of the string, i.e. before the first ;TI"character.;To;;0;[o; ;[I"7Enters a named capture group called paren;To;;0;[o; ;[I"BMatches a literal (, the first character in the string;To;;0;[o; ;[I"ECalls the paren group again, i.e. recurses back to the ;TI"second step;To;;0;[o; ;[I"'Re-enters the paren group;To;;0;[o; ;[I"=Matches a literal (, the second character in the ;TI"string;To;;0;[o; ;[I"?Try to call paren a third time, but fail because ;TI"7doing so would prevent an overall successful match;To;;0;[o; ;[I"BMatch a literal ), the third character in the string. ;TI"/Marks the end of the second recursive call;To;;0;[o; ;[I"AMatch a literal ), the fourth character in the string;To;;0;[o; ;[I" Match the end of the string;T@S; ;i;I"Alternation;T@o; ;[I"OThe vertical bar metacharacter (|) combines two expressions into ;TI"Pa single one that matches either of the expressions. Each expression is an ;TI"alternative.;T@o;;[I"G/\w(and|or)\w/.match("Feliformia") #=> # ;TI"I/\w(and|or)\w/.match("furandi") #=> # ;TI"2/\w(and|or)\w/.match("dissemblance") #=> nil ;T;0S; ;i;I"Character Properties;T@o; ;[I"MThe \p{} construct matches characters with the named property, ;TI"%much like POSIX bracket classes.;T@o;;;;[o;;0;[o; ;[I"</\p{Alnum}/ - Alphabetic and numeric character;To;;0;[o; ;[I"0/\p{Alpha}/ - Alphabetic character;To;;0;[o; ;[I"(/\p{Blank}/ - Space or tab;To;;0;[o; ;[I"-/\p{Cntrl}/ - Control character;To;;0;[o; ;[I"!/\p{Digit}/ - Digit;To;;0;[o; ;[I"J/\p{Graph}/ - Non-blank character (excludes spaces, control ;TI"characters, and similar);To;;0;[o; ;[I"</\p{Lower}/ - Lowercase alphabetical character;To;;0;[o; ;[I"U/\p{Print}/ - Like \p{Graph}, but includes the space character;To;;0;[o; ;[I"1/\p{Punct}/ - Punctuation character;To;;0;[o; ;[I"O/\p{Space}/ - Whitespace character ([:blank:], newline, ;TI"carriage return, etc.);To;;0;[o; ;[I"2/\p{Upper}/ - Uppercase alphabetical;To;;0;[o; ;[I"T/\p{XDigit}/ - Digit allowed in a hexadecimal number (i.e., 0-9a-fA-F);To;;0;[o; ;[I"L/\p{Word}/ - A member of one of the following Unicode general ;TI"9category Letter, Mark, Number, ;TI""Connector\_Punctuation;To;;0;[o; ;[I"B/\p{ASCII}/ - A character in the ASCII character set;To;;0;[o; ;[I"F/\p{Any}/ - Any Unicode character (including unassigned ;TI"characters);To;;0;[o; ;[I"4/\p{Assigned}/ - An assigned character;T@o; ;[I"MA Unicode character's General Category value can also be matched ;TI"Lwith \p{Ab} where Ab is the category's ;TI"%abbreviation as described below:;T@o;;;;[,o;;0;[o; ;[I" /\p{L}/ - 'Letter';To;;0;[o; ;[I",/\p{Ll}/ - 'Letter: Lowercase';To;;0;[o; ;[I"'/\p{Lm}/ - 'Letter: Mark';To;;0;[o; ;[I"(/\p{Lo}/ - 'Letter: Other';To;;0;[o; ;[I",/\p{Lt}/ - 'Letter: Titlecase';To;;0;[o; ;[I"+/\p{Lu}/ - 'Letter: Uppercase;To;;0;[o; ;[I"(/\p{Lo}/ - 'Letter: Other';To;;0;[o; ;[I"/\p{M}/ - 'Mark';To;;0;[o; ;[I"+/\p{Mn}/ - 'Mark: Nonspacing';To;;0;[o; ;[I"2/\p{Mc}/ - 'Mark: Spacing Combining';To;;0;[o; ;[I"*/\p{Me}/ - 'Mark: Enclosing';To;;0;[o; ;[I" /\p{N}/ - 'Number';To;;0;[o; ;[I"0/\p{Nd}/ - 'Number: Decimal Digit';To;;0;[o; ;[I")/\p{Nl}/ - 'Number: Letter';To;;0;[o; ;[I"(/\p{No}/ - 'Number: Other';To;;0;[o; ;[I"%/\p{P}/ - 'Punctuation';To;;0;[o; ;[I"1/\p{Pc}/ - 'Punctuation: Connector';To;;0;[o; ;[I",/\p{Pd}/ - 'Punctuation: Dash';To;;0;[o; ;[I",/\p{Ps}/ - 'Punctuation: Open';To;;0;[o; ;[I"-/\p{Pe}/ - 'Punctuation: Close';To;;0;[o; ;[I"5/\p{Pi}/ - 'Punctuation: Initial Quote';To;;0;[o; ;[I"3/\p{Pf}/ - 'Punctuation: Final Quote';To;;0;[o; ;[I"-/\p{Po}/ - 'Punctuation: Other';To;;0;[o; ;[I" /\p{S}/ - 'Symbol';To;;0;[o; ;[I"'/\p{Sm}/ - 'Symbol: Math';To;;0;[o; ;[I"+/\p{Sc}/ - 'Symbol: Currency';To;;0;[o; ;[I"+/\p{Sc}/ - 'Symbol: Currency';To;;0;[o; ;[I"+/\p{Sk}/ - 'Symbol: Modifier';To;;0;[o; ;[I"(/\p{So}/ - 'Symbol: Other';To;;0;[o; ;[I"#/\p{Z}/ - 'Separator';To;;0;[o; ;[I"+/\p{Zs}/ - 'Separator: Space';To;;0;[o; ;[I"*/\p{Zl}/ - 'Separator: Line';To;;0;[o; ;[I"//\p{Zp}/ - 'Separator: Paragraph';To;;0;[o; ;[I"/\p{C}/ - 'Other';To;;0;[o; ;[I")/\p{Cc}/ - 'Other: Control';To;;0;[o; ;[I"(/\p{Cf}/ - 'Other: Format';To;;0;[o; ;[I"./\p{Cn}/ - 'Other: Not Assigned';To;;0;[o; ;[I"-/\p{Co}/ - 'Other: Private Use';To;;0;[o; ;[I"+/\p{Cs}/ - 'Other: Surrogate';T@o; ;[I"LLastly, \p{} matches a character's Unicode script. The ;TI"Ffollowing scripts are supported: Arabic, Armenian, ;TI"GBalinese, Bengali, Bopomofo, Braille, ;TI"OBuginese, Buhid, Canadian_Aboriginal, Carian, ;TI"ACham, Cherokee, Common, Coptic, ;TI"HCuneiform, Cypriot, Cyrillic, Deseret, ;TI"MDevanagari, Ethiopic, Georgian, Glagolitic, ;TI"PGothic, Greek, Gujarati, Gurmukhi, Han, ;TI"DHangul, Hanunoo, Hebrew, Hiragana, ;TI"IInherited, Kannada, Katakana, Kayah_Li, ;TI"OKharoshthi, Khmer, Lao, Latin, Lepcha, ;TI"BLimbu, Linear_B, Lycian, Lydian, ;TI"MMalayalam, Mongolian, Myanmar, New_Tai_Lue, ;TI"CNko, Ogham, Ol_Chiki, Old_Italic, ;TI"HOld_Persian, Oriya, Osmanya, Phags_Pa, ;TI"HPhoenician, Rejang, Runic, Saurashtra, ;TI"LShavian, Sinhala, Sundanese, Syloti_Nagri, ;TI"DSyriac, Tagalog, Tagbanwa, Tai_Le, ;TI"NTamil, Telugu, Thaana, Thai, Tibetan, ;TI"ATifinagh, Ugaritic, Vai, and Yi.;T@o; ;[I"SUnicode codepoint U+06E9 is named "ARABIC PLACE OF SAJDAH" and belongs to the ;TI"Arabic script:;T@o;;[I" # ;T;0o; ;[I"MAll character properties can be inverted by prefixing their name with a ;TI"caret (^).;T@o; ;[I"OLetter 'A' is not in the Unicode Ll (Letter; Lowercase) category, so this ;TI"match succeeds:;T@o;;[I"//\p{^Ll}/.match("A") #=> # ;T;0S; ;i;I"Anchors;T@o; ;[I"KAnchors are metacharacter that match the zero-width positions between ;TI"Ccharacters, anchoring the match to a specific position.;T@o;;;;[o;;0;[o; ;[I"+^ - Matches beginning of line;To;;0;[o; ;[I"%$ - Matches end of line;To;;0;[o; ;[I"/\A - Matches beginning of string.;To;;0;[o; ;[I"I\Z - Matches end of string. If string ends with a newline, ;TI"#it matches just before newline;To;;0;[o; ;[I"(\z - Matches end of string;To;;0;[ o; ;[I"3\G - Matches first matching position:;T@o; ;[I"bIn methods like String#gsub and String#scan, it changes on each iteration. ;TI"}It initially matches the beginning of subject, and in each following iteration it matches where the last match finished.;T@o;;[I"3" a b c".gsub(/ /, '_') #=> "____a_b_c" ;TI"3" a b c".gsub(/\G /, '_') #=> "____a b c" ;T;0o; ;[I"�In methods like Regexp#match and String#match that take an (optional) offset, it matches where the search begins.;T@o;;[I":"hello, world".match(/,/, 3) #=> # ;TI"-"hello, world".match(/\G,/, 3) #=> nil ;T;0o;;0;[o; ;[I"B\b - Matches word boundaries when outside brackets; ;TI"*backspace (0x08) when inside brackets;To;;0;[o; ;[I".\B - Matches non-word boundaries;To;;0;[o; ;[I"M(?=pat) - Positive lookahead assertion: ;TI"Iensures that the following characters match pat, but doesn't ;TI"1include those characters in the matched text;To;;0;[o; ;[I"M(?!pat) - Negative lookahead assertion: ;TI"Hensures that the following characters do not match pat, but ;TI"9doesn't include those characters in the matched text;To;;0;[o; ;[I"D(?<=pat) - Positive lookbehind ;TI"Lassertion: ensures that the preceding characters match pat, but ;TI"9doesn't include those characters in the matched text;To;;0;[o; ;[I"D(?pat) - Negative lookbehind ;TI"Cassertion: ensures that the preceding characters do not match ;TI"Ipat, but doesn't include those characters in the matched text;T@o; ;[I"IIf a pattern isn't anchored it can begin at any point in the string:;T@o;;[I"8/real/.match("surrealist") #=> # ;T;0o; ;[I"TAnchoring the pattern to the beginning of the string forces the match to start ;TI"Rthere. 'real' doesn't occur at the beginning of the string, so now the match ;TI"fails:;T@o;;[I"*/\Areal/.match("surrealist") #=> nil ;T;0o; ;[I"QThe match below fails because although 'Demand' contains 'and', the pattern ;TI"'does not occur at a word boundary.;T@o;;[I"/\band/.match("Demand") ;T;0o; ;[I"LWhereas in the following example 'and' has been anchored to a non-word ;TI"Pboundary so instead of matching the first 'and' it matches from the fourth ;TI" letter of 'demand' instead:;T@o;;[I"M/\Band.+/.match("Supply and demand curve") #=> # ;T;0o; ;[I"PThe pattern below uses positive lookahead and positive lookbehind to match ;TI"Ltext appearing in tags without including the tags in the match:;T@o;;[I"E/(?<=)\w+(?=<\/b>)/.match("Fortune favours the bold") ;TI"! #=> # ;T;0S; ;i;I"Options;T@o; ;[I"QThe end delimiter for a regexp can be followed by one or more single-letter ;TI"5options which control how the pattern can match.;T@o;;;;[ o;;0;[o; ;[I""/pat/i - Ignore case;To;;0;[o; ;[I"K/pat/m - Treat a newline as a character matched by .;To;;0;[o; ;[I"D/pat/x - Ignore whitespace and comments in the pattern;To;;0;[o; ;[I"C/pat/o - Perform #{} interpolation only once;T@o; ;[ I"Gi, m, and x can also be applied on the ;TI""subexpression level with the ;TI"I(?on-off) construct, which ;TI"Henables options on, and disables options off for the ;TI",expression enclosed by the parentheses.;T@o;;[I"4/a(?i:b)c/.match('aBc') #=> # ;TI"4/a(?i:b)c/.match('abc') #=> # ;T;0o; ;[I"7Options may also be used with Regexp.new:;T@o;;[ I"JRegexp.new("abc", Regexp::IGNORECASE) #=> /abc/i ;TI"JRegexp.new("abc", Regexp::MULTILINE) #=> /abc/m ;TI"TRegexp.new("abc # Comment", Regexp::EXTENDED) #=> /abc # Comment/x ;TI"KRegexp.new("abc", Regexp::IGNORECASE | Regexp::MULTILINE) #=> /abc/mi ;T;0S; ;i;I"#Free-Spacing Mode and Comments;T@o; ;[ I"KAs mentioned above, the x option enables free-spacing ;TI"Fmode. Literal white space inside the pattern is ignored, and the ;TI"Moctothorpe (#) character introduces a comment until the end of ;TI"Nthe line. This allows the components of the pattern to be organized in a ;TI"'potentially more readable fashion.;T@o; ;[I"HA contrived pattern to match a number with optional decimal places:;T@o;;[I"float_pat = /\A ;TI"B [[:digit:]]+ # 1 or more digits before the decimal point ;TI"& (\. # Decimal point ;TI"E [[:digit:]]+ # 1 or more digits after the decimal point ;TI"B )? # The decimal point and following digits are optional ;TI" \Z/x ;TI"=float_pat.match('3.14') #=> # ;T;0o; ;[I">There are a number of strategies for matching whitespace:;T@o;;;;[o;;0;[o; ;[I"=Use a pattern such as \s or \p{Space}.;To;;0;[o; ;[I"VUse escaped whitespace such as \ , i.e. a space preceded by a backslash.;To;;0;[o; ;[I"0Use a character class such as [ ].;T@o; ;[I"CComments can be included in a non-x pattern with the ;TI"M(?#comment) construct, where comment is ;TI"1arbitrary text ignored by the regexp engine.;T@o; ;[I"EComments in regexp literals cannot include unescaped terminator ;TI"characters.;T@S; ;i;I" Encoding;T@o; ;[I"MRegular expressions are assumed to use the source encoding. This can be ;TI"4overridden with one of the following modifiers.;T@o;;;;[ o;;0;[o; ;[I",/pat/u - UTF-8;To;;0;[o; ;[I"-/pat/e - EUC-JP;To;;0;[o; ;[I"2/pat/s - Windows-31J;To;;0;[o; ;[I"1/pat/n - ASCII-8BIT;T@o; ;[I"HA regexp can be matched against a string when they either share an ;TI"Pencoding, or the regexp's encoding is _US-ASCII_ and the string's encoding ;TI"is ASCII-compatible.;T@o; ;[I"?If a match between incompatible encodings is attempted an ;TI"?Encoding::CompatibilityError exception is raised.;T@o; ;[ I"PThe Regexp#fixed_encoding? predicate indicates whether the regexp ;TI"Ihas a fixed encoding, that is one incompatible with ASCII. A ;TI"Regexp::FIXEDENCODING as the second argument of ;TI"Regexp.new:;T@o;;[ I"Lr = Regexp.new("a".force_encoding("iso-8859-1"),Regexp::FIXEDENCODING) ;TI"r =~"a\u3042" ;TI"M #=> Encoding::CompatibilityError: incompatible encoding regexp match ;TI"3 (ISO-8859-1 regexp with UTF-8 string) ;T;0S; ;i;I"Special global variables;T@o; ;[I"2Pattern matching sets some global variables :;To;;;;[o;;0;[o; ;[I"4$~ is equivalent to Regexp.last_match;;To;;0;[o; ;[I"4$& contains the complete matched text;;To;;0;[o; ;[I".$` contains string before match;;To;;0;[o; ;[I"-$' contains string after match;;To;;0;[o; ;[I"Q$1, $2 and so on contain text matching first, second, etc ;TI"capture group;;To;;0;[o; ;[I"-$+ contains last capture group.;T@o; ;[I" Example:;T@o;;[I"Pm = /s(\w{2}).*(c)/.match('haystack') #=> # ;TI"P$~ #=> # ;TI"PRegexp.last_match #=> # ;TI" ;TI"$& #=> "stac" ;TI" # same as m[0] ;TI"$` #=> "hay" ;TI"# # same as m.pre_match ;TI"$' #=> "k" ;TI"$ # same as m.post_match ;TI"$1 #=> "ta" ;TI" # same as m[1] ;TI"$2 #=> "c" ;TI" # same as m[2] ;TI"$3 #=> nil ;TI") # no third group in pattern ;TI"$+ #=> "c" ;TI" # same as m[-1] ;T;0o; ;[I"HThese global variables are thread-local and method-local variables.;T@S; ;i;I"Performance;T@o; ;[I"OCertain pathological combinations of constructs can lead to abysmally bad ;TI"performance.;T@o; ;[I"GConsider a string of 25 as, a d, 4 as, and a ;TI"c.;T@o;;[I"(s = 'a' * 25 + 'd' + 'a' * 4 + 'c' ;TI"+#=> "aaaaaaaaaaaaaaaaaaaaaaaaadaaaac" ;T;0o; ;[I"@The following patterns match instantly as you would expect:;T@o;;[I"/(b|a)/ =~ s #=> 0 ;TI"/(b|a+)/ =~ s #=> 0 ;TI"/(b|a+)*/ =~ s #=> 0 ;T;0o; ;[I"=However, the following pattern takes appreciably longer:;T@o;;[I"/(b|a+)*c/ =~ s #=> 26 ;T;0o; ;[ I"IThis happens because an atom in the regexp is quantified by both an ;TI"Fimmediate + and an enclosing * with nothing to ;TI"Hdifferentiate which is in control of any particular character. The ;TI"Mnondeterminism that results produces super-linear performance. (Consult ;TI"@Mastering Regular Expressions (3rd ed.), pp 222, by ;TI"LJeffery Friedl, for an in-depth analysis). This particular case ;TI"Lcan be fixed by use of atomic grouping, which prevents the unnecessary ;TI"backtracking:;T@o;;[ I"A(start = Time.now) && /(b|a+)*c/ =~ s && (Time.now - start) ;TI" #=> 24.702736882 ;TI"C(start = Time.now) && /(?>b|a+)*c/ =~ s && (Time.now - start) ;TI" #=> 0.000166571 ;T;0o; ;[I"FA similar case is typified by the following example, which takes ;TI"0approximately 60 seconds to execute for me:;T@o; ;[I"OMatch a string of 29 as against a pattern of 29 optional as ;TI"(followed by 29 mandatory as:;T@o;;[I"2Regexp.new('a?' * 29 + 'a' * 29) =~ 'a' * 29 ;T;0o; ;[ I"JThe 29 optional as match the string, but this prevents the 29 ;TI"Mmandatory as that follow from matching. Ruby must then backtrack ;TI"Krepeatedly so as to satisfy as many of the optional matches as it can ;TI"Owhile still matching the mandatory 29. It is plain to us that none of the ;TI"Koptional matches can succeed, but this fact unfortunately eludes Ruby.;T@o; ;[ I"RThe best way to improve performance is to significantly reduce the amount of ;TI"Nbacktracking needed. For this case, instead of individually matching 29 ;TI"Roptional as, a range of optional as can be matched all at once ;TI"with a{0,29}:;T@o;;[I"1Regexp.new('a{0,29}' + 'a' * 29) =~ 'a' * 29;T;0: @file@:0@omit_headings_from_table_of_contents_below0