[Cialug] Text Processing Choices

John Lengeling John.Lengeling at radisys.com
Tue Jun 30 11:42:06 CDT 2009


> Except for perl, I'm with Daniel. If the command line tools don't work
> out I often move to a spreadsheet. For example I commonly need to get
> a bunch of data ready to put into a database so I will bring the raw
> ...
> ...

You usually have to end up using a mix of tools.  

I don't like spreadsheets IMHO since they are pretty restrictive on the
size of data.   There are limits on the # of rows and also the max size
of cells.  

I work a lot with multiline text data and spreadsheets just don't handle
multiline text well.

UNIX tools like sed/awk/cut/etc and perl don't have as many limits.  I
use the UNIX tools for simpler stuff (stuff written in 10-20 lines) and
perl for more complicated stuff like data validation, data translation,
code page conversion, screen scraping or XML formatting.

johnl


More information about the Cialug mailing list