[Cialug] Algorithm; Cutting Up A File

Claus cniesen at gmx.net
Wed Dec 13 16:04:53 CST 2006


I do something like that in php for my web site.  I take the <header> 
and <body> content of an html file and create a new dynamic file with 
it.  Works quite well and for several years now.

I simply do that by putting the whole html file into a variable.  Then I 
search for the starting position of the tags using strpos().  Once I 
have that I adjust the positions to exclude the tags and move the data 
into another variable. Viola! No magic here and performs quite decently.

See http://niesens.com for the end results.  ;)  The bottom of the 
source will even reveal the elapsed time on the server.

   Claus

On 12/12/2006 4:52 PM, Todd Walton wrote:
> Hey scripters,
> 
> I'm having trouble concocting an algorithm to cut up a text file into
> blocks.  I'm going to have text files that have three distinct blocks
> of information in them, and each block will be marked in some way.  By
> HTML style tags, I suppose.  For example:
> 
> ~/filez> cat datafile.txt
> <description>
> This is a data file.  It holds data.
> </description>
> 
> <procedure>
> 1. Read the file.
> 2. Ponder meaning of existence.
> 3. Write new file.
> </procedure>
> 
> <reference>
> /usr/dict/datafile
> </reference>
> 
> ~/filez> _
> 
> What I can assume about these files is that each will have three
> pre-defined blocks of text, enclosed by HTML style tags.  The tags are
> on their own line.  There may or may not be text outside of these
> three blocks.  There may or may not be blank lines between the blocks.
> The blocks may or may not be in a given order.  Etc.
> 
> How can I read in the file's contents, take out the text between the
> tags (but not the tags!), and write that text to a file?  I begin with
> datafile.txt, I run the script, and I end up with
> datafile-description.txt, datafile-procedure.txt, and
> datafile-reference.txt.  Here's what I have so far:
> 
> while datafile.position != end
>       # The block for description.
>       strLine = datafile.readNextLine
>       if strLine contains "<description>" then
>               until strLine = "</description>"
>                       strLine = datafile.readNextLine
>                       write strLine to datafile-description.txt
>               end until
>       end if
> 
>       # The block for procedure. (same as for description)
>       # The block for reference. (same as for description)
> end while
> 
> So, the script runs through the text file line by line, until it finds
> the opening description tag and then, starting with the next line,
> writes it all out to a new file until it comes to the end-description
> tag.  Same for the other two.  Will this work?  If the blocks are out
> of order in the datafile will this still work?  Should I change
> something?
> 
> -todd
> _______________________________________________
> Cialug mailing list
> Cialug at cialug.org
> http://cialug.org/mailman/listinfo/cialug
> 



More information about the Cialug mailing list