PCRE groups
The group is a pretty powerful thing in regular expressions. You can make your regex more readable, more clearly and more accurate with it. But there are many different types of group in PCRE standard. You’d better know the difference and features of them to use its power in your cases.
Group types
group type | defeniton | grep example |
---|---|---|
(foo) | group, which can be called by \1 | echo -e 'a a\na b' | grep -Po '(\w) \1' |
as \<number of used group> | a a | |
(?:foo) | group without callback | echo 'abc' | grep -Po '(?:\w+|\d+) \1' |
grep: reference to non-existent subpattern | ||
(?#foo) | just a comment | echo '123qwe' | grep -Po '\d+(?#qwe)' |
123 | ||
(?i) | inline modificators | echo -e 'fo\nFo' | grep -Po '(?i)f(?-i)o' |
fo | ||
Fo | ||
(?i:foo) | group without callback | |
with modificators | ||
(?|(foo)|(bar)) | “branch reset” group. ‘foo’ and ‘bar’ | echo -e'foofoo\nbarbar'\ | |
both link by \1 reference | grep -Po '(?|(foo)|(bar))\1' |
|
foofoo | ||
barbar | ||
(?’NAME’pattern) | named group called by | |
(?<NAME>pattern) | \k’NAME’ or \k<NAME>1 | |
(?(cond)true|false) | group, which match ‘true’ part when | echo -e '{1}\n(2)\n{3)'\ | |
‘cond’ is matched or match | grep -Po '^(?:({)|\()\d+(?(1)}|\))$' |
|
‘false’ otherwise | {1} | |
(2) | ||
(?>foo) | atomic group. it’s a group without | echo into \ | |
callback which does not continue | grep -Po '(?>in|into|inside)' |
|
after first matching. |
Lookaround assertions
The most important thing about lookaround is the cursor does not move during the processing lookaround group. So you can put several groups with different conditions and check all of it before moving forward or grab results.
accertion type | defeniton | grep example |
---|---|---|
(?=pattern) | positive lookahead | echo 123qwe | grep -Po '(?=.*\d{3}.*)(?=.*[a-z]{3}.*).' |
1 #it checks all lookahead conditions and after that match one char | ||
(?!pattern) | negative lookahead | |
(?<=pattern) | positive lookbehind | echo '<qwe>asd</qwe>' | grep -Po '(?<=<(\w{3})>).*(?=</\1>)' |
asd #cause regexp engine process expressions from left to right. | ||
lookbehind accertions should be fixed length | ||
(?<!pattern) | negative lookbehind |
Recursion
Maybe it looks like callback of group. But in a case of the group we’re reference to matched substring. And in recursions group we call to regexp in the exact group. So we can make our expressions shorter and clearer if we need to match some repeated substrings. So the first example with recursion instead of group reference will match both lines: echo -e 'foo foo\nfoo bar' | grep -Po (\w+) (?1)'
in additional we can match some recursion strings by short regexp.
recurtion type | defeniton | grep example |
---|---|---|
(?N) | absolute reference to the Nth group | echo '(123)(()()())' | grep -Po '^(\([^()]*(?1)*\))*$' |
#matches only bracets balansed string | ||
(?-N) (?+N) | relative reference to the Nth left | |
or right group from current | ||
(?0) (?R) | reference to all expression | echo '(123)(()()))' | grep -Po '\([^()]*(?0)?\)(?0)?' |
#matches bracet balansed part of string |
-
NAME just alias of group number. It should be considered in case of ‘branch reset’ using.
(?|(?'same'patt)|(?'same'erns))
same names for alternate group should be used ↩