[Cialug] multiline regular expression

Jeffrey C. Ollie jeff at ocjtech.us
Thu Jun 28 08:29:36 CDT 2007


On Thu, 2007-06-28 at 08:09 -0500, Dave Weis wrote:
> Kendall Bailey wrote:
> > I use kodos to design complex regular expressions.  It's a Python
> > (PyQt) program.  You can find it on sourceforge.  It has checkboxes
> > for things like multi-line, dotall and verbose and constructs code
> > samples.  You plug in an example of the string you are trying to match
> > and then as you write the regexp you get to see the match attributes
> > immediately.
> 
> I might give that a try. Looks helpful

Yeah, Kodos rocks...  Here's the monstrosity that I came up with (using
Kodos of course):

^\s+(?P<LATA>\d+)\s+(?P<USGE_GP>\S+)\s+(?P<DATE>\d{2}/\d{2}/\d{2})\s+(?P<TIME>\d+:\d+(?:A|P))\s+(?P<DEST_CITY>\d+\s+\d+-\d+-\d+)\s+(?P<DESTINATION>.*?)\s+(?P<RECS>\d+)\s+(?P<MINUTES>\d*\.\d+)\s+(?P<AMOUNT_1>\d*\.\d+)\s+(?P<AMOUNT_2>\d*\.\d+)\s+(?P<VOL_AMT>\d*\.\d+)\s+$.*^\s+(?P<ANI>Y|N)\s+(?P<STATUS>\S+)\s+(?P<ACT_DUR>\d+:\d+:\d+)\s+(?P<ORIG_CITY>\d+\s+\d+-\d+-\d+)\s+(?P<ORIGINATION>.*?)\s+$

This is Python syntax, dunno how close to Java syntax that is.  You'd
need to compile this regular expression with the multiline and dot-all
flags.  The MISC-1, MISC-2, and VOL-COD fields would need to be figured
out (they weren't in your sample text).

It would seem to me though that the fields are of fixed size.  If so,
it'd probably be faster to code something up that pulls out the data
based upon that rather than relying on a regular expression.

> > On 6/28/07, Dave Weis <djweis at internetsolver.com> wrote:
> >>
> >> I need to parse a text file that contains two lines per record in this
> >> format:
> >>      324     NOR  05/17/07 10:21A    0000000   999-999-9999 COLUMBUS OH
> >>       1      .9        .0700        .0000        .0000
> >>
> >>       Y    CURRENT         00:00:54  0000000   999-999-9999 DES MOINES IA
> >>
> >> There are other lines in the file that are similar like
> >>     LATA   USGE-GP  DATE   TIME     DEST-CITY
> >> --------DESTINATION--------   #RECS  MINUTES  AMOUNT-1     AMOUNT-2
> >> VOL-AMT
> >>      ANI   STATUS          ACT-DUR  ORIG-CITY
> >> --------ORIGINATION--------           MISC-1               MISC-2
> >> VOL-COD
> >> that is the header.
> >>
> >> I'll be using the java regexp but if anyone can direct me on any regexp
> >> setup I'll convert it myself.
> >>
> >> Thanks
> >> dave
> >>
> > _______________________________________________
> > Cialug mailing list
> > Cialug at cialug.org
> > http://cialug.org/mailman/listinfo/cialug
> 
> _______________________________________________
> Cialug mailing list
> Cialug at cialug.org
> http://cialug.org/mailman/listinfo/cialug
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://cialug.org/pipermail/cialug/attachments/20070628/88212bd3/attachment.pgp


More information about the Cialug mailing list