|
Home
Articles/Links
Mugs, T-shirts
Comments/Raves
New in 1.5.3
A Game
An Online Test
Questions
Copyright/License
Download Free
If you need a non-LGPL version
You Can Buy!
Online help...
Quick Start
Tutorial Part 1
Tutorial Part 2
Tutorial Part 3
Tutorial Part 4
Tutorial Part 5
Tutorial Part 6
Examples
Support
FAQ
Documentation
Useful apps...
Java Beautifier
Code Colorizer
GUI Grep
Swing Grep
Other stuff...
Phreida
xmlser
 |
Tutorial Part 2
Pattern Elements:
(), (?:), (?!), (?=)
You can use parenthesis in a variety of ways. One way is
to provide grouping for a set of pattern elements. In this
example, "{2,}" applies to the entire subpattern "foo" and
thus there must be at least a "foofoo" for the pattern to match.
Regex r = new Regex("(foo){2,}");
r.search("foo");
System.out.println(""+r.didMatch());
// Prints "false"
r.search("foofoofoo");
System.out.println(r.stringMatched());
// Prints "foofoofoo"
|
Another use for parenthesis is for pulling out matching subpatterns.
These subpatterns are called "backreferences."
Regex r = new Regex("[abc]([def])");
r.search("==> be <==");
System.out.println(""+r.didMatch());
// Prints "true"
System.out.println(r.stringMatched());
// Prints "be"
System.out.println(r.stringMatched(1));
// Prints "e"
// This is the contents of the first backreference
|
Each parenthesis has a number, and the part of the matching string
that falls within the parenthesis is the part of the backreference
with that number.
Regex r = new Regex("([abc])([def])");
r.search("==> be <==");
System.out.println(r.stringMatched(1));
// Prints "b"
// This is the contents of the first backreference
System.out.println(r.stringMatched(2));
// Prints "e"
// This is the contents of the second backreference
|
The backreferences are given numbers according to the
position of the "(" character in the pattern. The leftmost
gets 1, the next one to the right gets 2, and so on. This
is especially useful to know if you like to nest backreferences.
Regex r = new Regex("(ab(cd))ef");
r.search("==>abcdef<==");
System.out.println(r.stringMatched());
// Prints "abcdef"
System.out.println(r.stringMatched(1));
// Prints "abcd"
System.out.println(r.stringMatched(2));
// Prints "cd"
|
Please note how the following patterns
behave, as it will bring out a few
subtleties of pattern writing
Regex r = new Regex("(a)+b*");
r.search("==>aaaabbb<==");
System.out.println(r.stringMatched(1));
// Prints "a"
// Note that the subpattern is just the
// literal character "a" so that is what
// the backreference sees.
r = new Regex("(a+)b*");
r.search("==>aaaabbb<==");
System.out.println(r.stringMatched(1));
// Prints "aaaa"
// Now the () contains the * as well, so
// all the matching a's are returned in
// the backreference.
r = new Regex("([abc])+");
r.search("==>aaabbbc<==");
System.out.println(r.stringMatched(1));
// Prints "c"
// When you have something of the form (...)*
// the backreference returns the last thing
// that matched.
|
Note: You can also use methods left(1) and right(1) to get
the text to the left and right of backreference one
just as you can use the methods left() and right() to
get the text to the left and right of the entire match.
Another use of parenthesis is to select one of a set
of patterns. The character "|" is used to distinguish
the different patterns. For example:
Regex r = new Regex("(apple|banana|pear|orange)");
r.search("apple");
System.out.println(""+r.didMatch());
// Prints "true"
r.search("orange");
System.out.println(""+r.didMatch());
// Prints "true"
r.search("grape");
System.out.println(""+r.didMatch());
// Prints "false"
|
If you just want the grouping ability of ()'s but are
not interested in getting a backreference it is faster and more
efficient to use (?:) instead. (By the way, if speed in matching,
as opposed to compiling, is what you're after, you should
always call the optimize() method or include "(?o)" near
the front of your pattern.)
Regex r1 = new Regex("(?:foo){2,}");
// is the same as
Regex r2 = new Regex("(foo){2,}");
// except that r1 produces no backreference.
|
The pattern (?=) can be used to look ahead in the pattern,
as it is always a zero-length match. Otherwise, it behaves
as (?:).
Regex r = new Regex("(?i)foo(?=bar)");
r.search("Foo or foobar?");
System.out.println(r.stringMatched());
// Prints "foo"
// Matches on the lower case version of
// foo because it is followed by bar -- but
// since the match is zero-width "bar" is
// not part of the matched string.
r = new Regex("(?i)foo");
r.search("Foo or foobar?");
System.out.println(r.stringMatched());
// Prints "Foo"
|
The pattern element (?!) also provides a lookahead
functionality with zero-width match -- but only if
the subpattern fails to match.
Regex r = new Regex("(?i)foo(?!bar)");
r.search("Foobar or foo?");
System.out.println(r.stringMatched());
// Prints "foo"
// Cannot match on "Foo" because it is followed
// by bar.
r = new Regex("(?i)foo");
r.search("Foobar or foo?");
System.out.println(r.stringMatched());
// Prints "Foo"
|
Review:
Parenthesis have three basic functions
- Grouping of patterns
- Producing backreferences
- Selecting one of a set of patterns to match
- (?: ... ) is like ( ... ) except no backreference
is produced.
- (?= ... ) is like (? ... ) except it produces a
match of zero width.
- (?! ... ) is like (?= ... ) but it only matches if
the pattern inside is not found.
Previous
Next
|