Finding 2 letter domains with Oracle regular expressions
I had cause recently to search for domains that contained 2 and only 2 letters. One way of performing this is to use the relatively new Oracle support for querying with regular expressions - regexp_like. The Oracle implementation of regular expressions follow enhanced POSIX support.
First off how not to do it. If you attempt something like the following:
select * from domains where key like '__';
This will indeed return all domains containing 2 characters, but it also returns domains with numbers in them as well and this was not what we were interested in. It really had to be a regular expression:
select *
from domains
where regexp_like (key, '^[a-z]{2}$');
So how do we decode this expression?
- ^ In this position actually donates the beginning of the data. We need this as we want to find 2 occurances and only 2 occurances so we start counting from the beginning of the data, rather than at some arbitrary point.
- [a-z] is the standard find me any alphabetic character, we know the data is lowercase.
- {2} states we need to find 2 occurances of the previous expression, i.e. 2 alphabetic characters.
- The $ signifies the end of the line. All domain names contain 2 or more characters we do not want to find occurrances that have more than 2 characters.
I’m sure there is more than one way of performing this little bit of sql, but certainly regular expressions make it easier, though they can take a bit of getting used to.


August 15th, 2008 at 11:03 am
Note that regexp matching is quite expensive.
I’ve tested the same data and found that changing the query to check for “length(key) = 2″ before matching the regexp gives about a three-fold increase in speed.
Furthermore, adding a computed index on “length(key)” makes the query more than 10 times faster than the original.