Regular Expression Syntax Reference

The following articles describe the regular expression syntax you can use to specify patterns found in log file records when configuring various WebTrends Analytics advanced features. For example, you can use them to match URL parameters or page names when configuring Content Groups, custom reports, and URL Rebuilding definitions. Regular expressions are useful because they provide more flexible and powerful pattern matching than you can achieve with wildcards.

Note

Express Analysis profiles do not support regular expressions.

In general, regular expressions are not as resource-efficient as absolute definitions, and the more complex the regular expression, the more resources are allocated to matching. If you find that using complex regular expressions causes performance problems, keep in mind that regular expressions using beginning-of-string matching (^) and end-of-string matching ($) are the most efficient.

The regular expression syntax supported by WebTrends Analytics may not be identical to the syntax used in other applications. You can verify your regular expressions using the Test button in dialog boxes that accept regular expressions.

Regular Expression Components

Basic Elements

The following table provides examples of basic elements.

Basic element	Example
.	Matches any single character. For example, `cou.h` matches `couth`,`couch`, and `cough`.
\ followed by a single character	Lets special characters be used as a single character or “escaped”. For example, to use this character: `.` as a period, precede it with a backslash: `\` Escaped characters are especially useful when describing paths. For example `\.html$` matches any string ending in `.html`. The following characters need to be preceded by a backslash if they are to be used without special meaning: `\` `.` `$` `*` `?` `+` `[ ]` `(` `)` `\|`
$	Matches any string where the specified pattern occurs at the end of the string. For example, `cause$` matches `cause` and `because` but not `causes`.
^	Matches any string where the specified pattern occurs at the beginning of the string. For example, `^couch` matches `couches` and `couch` but not `uncouch`. Use this element carefully when specifying a domain. For example, the expression `^/couch` matches `/couch/index.htm`, but not `www.domain.com/couch/index.htm`.
[ ]	Matches any single character in the range or set enclosed in the brackets. For example, `[aeiou]` matches any vowel. You can use a shorthand notation for a range of characters. For example, `[0-9]` matches any decimal digit. If the sequence is preceded by a carat: `^` it matches any single character not from the range or set. For example, `[^a-z]` matches any character that is not a letter of the alphabet.
\|	Indicates an OR operator. For example: `couch\|chair` finds `couch` or `chair`.
a regular expression in parenthesis ()	Used for grouping expressions. The expression `(couch[0-9])\|(bed[0-9])` matches `couch36A` or `full_bed33b`but not `couch`.
a single character	Matches any string containing the single character to be matched. For example, `a` matches `cause`. You could also combine several characters together, such as `couch`.

Regular Expression Components

Qualifying Characters

Any regular expression element can be qualified by one of the following three characters: *, +, or ?.

Qualifying Character	Example
*	Matches 0 or more occurrences of the element that precedes it. For example, `couch_[a-z0-9]` matches `couch_` followed by 0 or more alphanumeric values. This expression matches `couch_0`, `couch_aaa`, `couch_a33`, and `couch_`. Do not confuse `` in a regular expression with `` used as a wildcard character. To match all html files in the following path, for example, specify: `/mydir/.\.html$` not `/mydir/*.html` The correct regular expression above specifies any string containing `/mydir/`, followed by 0 or more characters, followed by `.html`, for example `/mydir/index.html`.
+	Matches 1 or more occurrences of the element it follows. For example, `couch_[a-z0-9]+` matches `couch_` followed by one or more alphanumeric values. This expression matches `couch_0`, `couch_aaa`, and `couch_a33`, but not `couch_`.
?	Matches 0 or 1 occurrences of the element it follows. For example, `couch_[a-z0-9]?` matches couch followed by 0 or 1 alphanumeric values. This expression matches `couch_0`, `couch_`, and `couch_a`, and `couch_bb`.

Note

Because parameter names and values are not case-sensitive, you do not need to match case in your regular expressions.

Building Regular Expressions

Most regular expressions that you use with WebTrends are very simple, often consisting of a few basic elements.

Example 1

To match all of the values that begin with couch, use the following regular expression:
^couch

Example 2

To match all of the values that end with couch, use the following regular expression:
couch$

Example 3

To match all values containing either couch or chair, including blue_chair, chair_55, and big_couch_55, use the following regular expression:
couch|chair

Example 4

To match a qualifying page URL that contains any product news HTML pages from January, February, or March, use the following regular expression:
/product/news/(jan|feb|mar)/.+\.htm

This expression matches any item such as a URL containing the string /product/news/, followed by either jan, feb, or mar, followed by / and one or more of any character (.+), followed by .htm.

This expression matches the following URLs:

/product/news/jan/chair.htm
/product/news/feb/mirror.htm
/product/news/mar/couch.htm
/product/news/jan/table.htm
/product/news/jan/table.html

but not these URLs:

/product/news/jan/chair.asp
/product/news/jan/chair.gif
/product/news/apr/chair.htm

Example 5

To match all URLs indicating that an individual product in the furniture category has been registered, use the following regular expression:
^/product/furniture/.+/register.htm

This expression matches all URLs that begin with/product/furniture/, followed by one or more occurrences of any character, followed by /register.htm.

This expression matches the following URLs:
/product/furniture/couch/register.htm
/product/furniture/chair/register.htm
/product/furniture/couch/register.htm
/product/furniture/bedroom/armoire/register.htm

but not this URL:
/product/furniture/index.htm

Matching Order Rules

There are several rules involved with how regular expression matching occurs:

The first match found takes priority over other matches found if there are two matching input strings.
In a list of concatenated expressions, the left-most match takes priority.
The matches found using *, +, and ? are considered longest first.
Nested constructs are evaluated from the outside in.

Comparing Regular Expressions to Wildcards

This table shows how you might use a wildcard or regular expression to accomplish the same thing.

Wildcard (*)	Regular expression	Meaning
chair	chair	Contains chair
*chair	chair$	Ends with chair
chair*	^chair	Begins with chair
chair (no wildcard)	^chair$	Matches chair exactly