Wildcard meta characters
Single Charactor . (dot)
n.t matches => not ✔ net ✔ nut ✔ n#t ✔ n.t ✔ , neat ✘Note:
5.00 => 5.00 ✔ 5500 ✔ 5-00 ✔
So we need to escape dot character
5\.00 => 5.00 ✔ 5500 ✘ 5-00 ✘
Other special charaters
- Spaces
- Tabs (\t)
- Line Returns (\r, \n, \r\n)
- None printable characters: bell(\a) escape(\e) feed(\f) vertical tab(\v)
- ASCII or ANCI codes: 0xA9 = \xA9
Character sets [ ]
[aeiou] any vowel
b[aeiou]t => bat ✔ but ✔ bait ✘
Character Ranges -
[0-9]
[A-Za-z]
[0-9][0-9][0-9] => 576
Negate a character set ^
[^aeiou] not a vowel
see[^mn] => seek ✔ seem ✘ seen ✘
Note:
It matches spaces also
see[^mn] => 'see ' ✔ 'see' ✘
Below characters should escape inside sets
] - ^ \
(Metacharaters automatically escaped in sets . \n etc )
Shorthand character sets
- \d -> digit [0-9]
- \w -> word character [a-zA-Z0-9_]
- \s -> white space [\t\r\n]
- \D -> not digit [^0-9]
- \W -> not word [^a-zA-Z0-9_]
- \S -> not whitespace [^\t\r\n]
[^\d\s] not equals [\D\S]
Bracket Expresssions
[:alpha:] = A-Za-z
[:digit:] = 0-9
[[:alpha:]]
Not work in Java ,JavaScript, .Net, Python
Work in PHP, Perl, Ruby Unix
Repetition metacharacters
* zero or more
+ One or more
? zero or one
bananas* => banana ✔ bananas ✔ bananasssss ✔
bananas+ => banana ✘ bananas ✔ bananasssss ✔
bananas? => banana ✔ bananas ✔ bananasssss ✘
Quantified repetition { }
{min,max} min is required max is optional.
\d{3,5} 3 to 5 digits
\d{3} exactly 3 digits
\d{3,} 3 to infinite
\d{0,} = \d*
\d{1,} = \d+
\d{2}-\d{3}-\d{4} => 35-345-7896 ✔
Greedy and Lazy Expressions
.*[0-9]+ => page 120 //greedy
.*?[0-9]+ => page 120 //lazy
? make preceding quantifier lazy
- *?
- +?
- {min,max}?
- ??
Grouping metacharacter ( )
(abc)+ => abc ✔ abcabc ✔
gun(s)? => gun ✔ guns ✔
Alternation meta character |
mango|apple => apple ✔ mango ✔
r(u|a)n => run ✔ ran ✔
Start and End anchors
^ Start of string/line
$ End of string/line
\A Start of string only
\Z End of string only
\A,\Z working in Java, PHP, .Net, Perl, Python, Ruby
^mango or \Amango Beginning of string
mango$ or mango\Z End of string
^mango$ or \Amango\Z
Note:
^[A-Z] = Beginning of string
[^A-Z] = Negation
Word Boundaries
\b word boundary (start/end of word)
\B not a word boundary
\b\w+\b => test string. I'm a boy.
matches test, string, I, m, a, boy
Back References
\1 to \9 (some regex engines use $1 to $9)
(mango) to \1 => mango to mango
<(i|em).+?</\1> => <i>test</i> ✔
<em>test</em> ✔
<i>test</em> ✘
Non Capturing group expression ?:
(?:orange) and (apples) to \1 => orange and apples to apples
Look ahead assertions ?= ?!
Positive look ahead ?=
(?=regex)
(?=seashore) sea => 'sea' in seashore ✔ , not 'sea' in seaside ✘
sea(?=shore) is same as previous
eg: Find words before comma
\b[A-Za-z']+?\b(?=,)
Negative look ahead ?!
(?!regex)
(?!seashore)sea => 'sea' in seaside ✔ 'sea' in seashore ✘
sea(?!shore) is same as previous
Look behind assertions ?<= ?<!
Positive look behind ?<=
(?<=base)ball => 'ball' in baseball ✔ 'ball' in football ✘
Negative Look Behind ?<!
(?<!base)ball => 'ball' in football ✔ 'ball' in baseball ✘
Note:
Look behind assertions not work in JavaScript
Matching Unicodes
Unicode indicator \u
caf\u00E9 => café ✔ cafe ✘
Unicode wildcard \X
caf\X => café ✔ cafe ✔ //Only work on PHP and Perl
Unicode property \p {property} , not property \P{property}
- L -> Letter
- M -> Mark
- Z -> Separator
- S -> Symbol
- N -> Number
- P -> Punctuation
- C -> Other
Work on Java,.Net, Perl, PHP, Ruby
Reference: Using Regular Expressions - Lynda.com
No comments:
Post a Comment