- Print
- DarkLight
- PDF
Regular Expression Syntax Reference
The following articles describe the regular expression syntax you can use to specify patterns found in log file records when configuring various WebTrends Analytics advanced features. For example, you can use them to match URL parameters or page names when configuring Content Groups, custom reports, and URL Rebuilding definitions. Regular expressions are useful because they provide more flexible and powerful pattern matching than you can achieve with wildcards.
Express Analysis profiles do not support regular expressions.
In general, regular expressions are not as resource-efficient as absolute definitions, and the more complex the regular expression, the more resources are allocated to matching. If you find that using complex regular expressions causes performance problems, keep in mind that regular expressions using beginning-of-string matching (^) and end-of-string matching ($) are the most efficient.
The regular expression syntax supported by WebTrends Analytics may not be identical to the syntax used in other applications. You can verify your regular expressions using the Test button in dialog boxes that accept regular expressions.
Regular Expression Components
Basic Elements
The following table provides examples of basic elements.
Basic element | Example |
---|---|
. | Matches any single character. For example, cou.h matches couth ,couch , and cough . |
\ followed by a single character | Lets special characters be used as a single character or “escaped”. For example, to use this character: . as a period, precede it with a backslash: \ Escaped characters are especially useful when describing paths. For example \.html$ matches any string ending in .html . The following characters need to be preceded by a backslash if they are to be used without special meaning: \ . $ * ? + [ ] ( ) | |
$ | Matches any string where the specified pattern occurs at the end of the string. For example, cause$ matches cause and because but not causes . |
^ | Matches any string where the specified pattern occurs at the beginning of the string. For example, ^couch matches couches and couch but not uncouch . Use this element carefully when specifying a domain. For example, the expression ^/couch matches /couch/index.htm , but not www.domain.com/couch/index.htm . |
[ ] | Matches any single character in the range or set enclosed in the brackets. For example, [aeiou] matches any vowel. You can use a shorthand notation for a range of characters. For example, [0-9] matches any decimal digit. If the sequence is preceded by a carat: ^ it matches any single character not from the range or set. For example, [^a-z] matches any character that is not a letter of the alphabet. |
| | Indicates an OR operator. For example: couch|chair finds couch or chair . |
a regular expression in parenthesis () | Used for grouping expressions. The expression (couch[0-9])|(bed[0-9]) matches couch36A or full_bed33b but not couch . |
a single character | Matches any string containing the single character to be matched. For example, a matches cause . You could also combine several characters together, such as couch . |
Regular Expression Components
Qualifying Characters
Any regular expression element can be qualified by one of the following three characters: *
, +
, or ?
.
Qualifying Character | Example |
---|---|
* | Matches 0 or more occurrences of the element that precedes it. For example, couch_[a-z0-9]* matches couch_ followed by 0 or more alphanumeric values. This expression matches couch_0 , couch_aaa , couch_a33 , and couch_ . Do not confuse * in a regular expression with * used as a wildcard character. To match all html files in the following path, for example, specify: /mydir/.*\.html$ not /mydir/*.html The correct regular expression above specifies any string containing /mydir/ , followed by 0 or more characters, followed by .html , for example /mydir/index.html . |
+ | Matches 1 or more occurrences of the element it follows. For example, couch_[a-z0-9]+ matches couch_ followed by one or more alphanumeric values. This expression matches couch_0 , couch_aaa , and couch_a33 , but not couch_ . |
? | Matches 0 or 1 occurrences of the element it follows. For example, couch_[a-z0-9]? matches couch followed by 0 or 1 alphanumeric values. This expression matches couch_0 , couch_ , and couch_a , and couch_bb . |
Because parameter names and values are not case-sensitive, you do not need to match case in your regular expressions.
Building Regular Expressions
Most regular expressions that you use with WebTrends are very simple, often consisting of a few basic elements.
Example 1
To match all of the values that begin with couch, use the following regular expression:
^couch
Example 2
To match all of the values that end with couch, use the following regular expression:
couch$
Example 3
To match all values containing either couch or chair, including blue_chair
, chair_55
, and big_couch_55
, use the following regular expression:
couch|chair
Example 4
To match a qualifying page URL that contains any product news HTML pages from January, February, or March, use the following regular expression:
/product/news/(jan|feb|mar)/.+\.htm
This expression matches any item such as a URL containing the string /product/news/,
followed by either jan
, feb
, or mar
, followed by /
and one or more of any character (.+
), followed by .htm
.
This expression matches the following URLs:
/product/news/jan/chair.htm
/product/news/feb/mirror.htm
/product/news/mar/couch.htm
/product/news/jan/table.htm
/product/news/jan/table.html
but not these URLs:
/product/news/jan/chair.asp
/product/news/jan/chair.gif
/product/news/apr/chair.htm
Example 5
To match all URLs indicating that an individual product in the furniture category has been registered, use the following regular expression:
^/product/furniture/.+/register.htm
This expression matches all URLs that begin with/product/furniture/
, followed by one or more occurrences of any character, followed by /register.htm
.
This expression matches the following URLs:
/product/furniture/couch/register.htm
/product/furniture/chair/register.htm
/product/furniture/couch/register.htm
/product/furniture/bedroom/armoire/register.htm
but not this URL:
/product/furniture/index.htm
Matching Order Rules
There are several rules involved with how regular expression matching occurs:
- The first match found takes priority over other matches found if there are two matching input strings.
- In a list of concatenated expressions, the left-most match takes priority.
- The matches found using
*
,+
, and?
are considered longest first. - Nested constructs are evaluated from the outside in.
Comparing Regular Expressions to Wildcards
This table shows how you might use a wildcard or regular expression to accomplish the same thing.
Wildcard (*) | Regular expression | Meaning |
---|---|---|
*chair* | chair | Contains chair |
*chair | chair$ | Ends with chair |
chair* | ^chair | Begins with chair |
chair (no wildcard) | ^chair$ | Matches chair exactly |