[Cialug] multiline regular expression

Matthew Nuzum newz at bearfruit.org
Thu Jun 28 09:23:19 CDT 2007


On 6/28/07, Jeffrey C. Ollie <jeff at ocjtech.us> wrote:
> > > I use kodos to design complex regular expressions.
> Yeah, Kodos rocks...  Here's the monstrosity that I came up with (using
> Kodos of course):
>
> ^\s+(?P<LATA>\d+)\s+(?P<USGE_GP>\S+)\s+...
>
> This is Python syntax, dunno how close to Java syntax that is.

Python's regex's named thingy's (can't remember the right term) are
diff from pcre.  For example, (?P<LATA>\d+) names the thingy "LATA."
iirc, Java uses Perl compatible, so you'll have to use a slightly diff
syntax for the named stuff.

> It would seem to me though that the fields are of fixed size.  If so,
> it'd probably be faster to code something up that pulls out the data
> based upon that rather than relying on a regular expression.

I agree. I'd read line by line, validating each and then splitting on
boundaries (either white space or fixed field size). Any row that
didn't work out could be logged and handled manually.

Especially since any missing line would be lost revenue. (Murphy's law
says that the 18 hour call to Kashmir would be the line that gets
dropped)
-- 
Matthew Nuzum
newz2000 on freenode


More information about the Cialug mailing list