[Cialug] Algorithm; Cutting Up A File

Todd Walton tdwalton at gmail.com
Tue Dec 12 16:52:08 CST 2006


Hey scripters,

I'm having trouble concocting an algorithm to cut up a text file into
blocks.  I'm going to have text files that have three distinct blocks
of information in them, and each block will be marked in some way.  By
HTML style tags, I suppose.  For example:

~/filez> cat datafile.txt
<description>
This is a data file.  It holds data.
</description>

<procedure>
1. Read the file.
2. Ponder meaning of existence.
3. Write new file.
</procedure>

<reference>
/usr/dict/datafile
</reference>

~/filez> _

What I can assume about these files is that each will have three
pre-defined blocks of text, enclosed by HTML style tags.  The tags are
on their own line.  There may or may not be text outside of these
three blocks.  There may or may not be blank lines between the blocks.
 The blocks may or may not be in a given order.  Etc.

How can I read in the file's contents, take out the text between the
tags (but not the tags!), and write that text to a file?  I begin with
datafile.txt, I run the script, and I end up with
datafile-description.txt, datafile-procedure.txt, and
datafile-reference.txt.  Here's what I have so far:

while datafile.position != end
       # The block for description.
       strLine = datafile.readNextLine
       if strLine contains "<description>" then
               until strLine = "</description>"
                       strLine = datafile.readNextLine
                       write strLine to datafile-description.txt
               end until
       end if

       # The block for procedure. (same as for description)
       # The block for reference. (same as for description)
end while

So, the script runs through the text file line by line, until it finds
the opening description tag and then, starting with the next line,
writes it all out to a new file until it comes to the end-description
tag.  Same for the other two.  Will this work?  If the blocks are out
of order in the datafile will this still work?  Should I change
something?

-todd


More information about the Cialug mailing list