Friday, January 23, 2009

Using REGEXP_REPLACE in oracle 10g

I ran into an issue where I need to update all matching records in a table so that their clob content changed all instances of
<a href="javascript:CustomHandler(unknownnumberofcharacters@someurl);">somelinkname</a>
to
<a href="someurl">somelinkname</a>

I ended up using a regular expression with the REGEXP_REPLACE oracle command to do the work.
UPDATE mytable SET myclob = REGEXP_REPLACE(myclob,'javascript:CustomHandler[^@]*@([[:alnum:][:punct:][:space:]]+)\);','\1')
WHERE REGEXP_LIKE(myclob,'javascript:CustomHandler');

The real trick was using the POSIX character classes rather than matching on the dot wildcard. Using the dot to match any character worked only when one match was found in the clob content. In cases where I expected more than one match, it did all kinds of strange subtitutions that I didnt expect. So now I know to avoid using the dot wildcard in regex if at all possible XD

For more info on oracle 10g regex, check out http://www.oracle.com/technology/oramag/webcolumns/2003/techarticles/rischert_regexp_pt2.html, its got some great examples, lists out all the available character classes, etc.

No comments: