[Cialug] OT: can somebody help me with a regular expression?

Nathan C. Smith nathan.smith at ipmvs.com
Fri Jul 25 14:46:05 CDT 2008


Do we have any regex wizrds in our midst?

I need help creating a regular expression that can find a US patent number in a block of text.

Elements that should match could look like 7,225,309 or 7225309 and 88,809 or 88809 and D339,456 or D339456.  Some numbers also start with PP for plant patent.

Obviously 7225309 looks like pretty much any number, but I'm using a free from input where any number is likely to be a patent.  I'm not sure if I should use one, two, or more regexes or really how to approach the problem.  Do I try to match the numbers as numbers or as characters, or do the comma format one way and the non-comma format the other?

This was my first attempt: \b([D]?[0-9](,?)|D)[0-9]?[0-9]?[0-9](,?)[0-9][0-9][0-9]

It matches most of my test case, but I need to figure out how to account for shorter variations without commas (use more question marks?) and probably more conditionals '|' ?  It also has the drawback of matching 3.14159.  It also looks a little long and I suspect there is a better way to do some of it.

It looks like I would be happy matching about anything that wasn't PI, a social security number, or an IP Address.  My the expression engine I am using is in JAVA but is *supposed* to be pcre-compatible.

Thanks.

-Nate




More information about the Cialug mailing list