[Cialug] Interesting problem [OT?]

Stuart Thiessen thiessenstuart at aol.com
Fri Nov 14 17:56:18 CST 2008


The img tag shows (for example) <img src="http://www.signbank.org/swis/glyph.php?code=45568 
">.  The glph script returns the appropriate image file for that code  
number. I have the dump of the html, but it is these <img> tags that  
are blocking me from having a complete offline copy. That was why I  
was also thinking of trying to automate some kind of PDF dump of each  
page.

Tony, I did look at the Selenium website, but I am not sure how it  
will help with this task. Could you explain?

Thanks,

Stuart

On Nov 14, 2008, at 17:36 , Matthew Nuzum wrote:

> On Fri, Nov 14, 2008 at 5:20 PM, Stuart Thiessen <thiessenstuart at aol.com 
> > wrote:
>> Anyway, my technical challenge is that the organization developing  
>> this
>> writing system has published a PHP database of the symbols at:
>> http://www.signbank.org/swis/data.php?subset=&bs_code=*
>>
>> I need to get a offline dump of each of the basesymbol child pages  
>> listed on
>> that page. I can't do a simple download of the page as HTML because  
>> the
>> image file showing the symbol is actually a link to a script that  
>> finds the
>> right symbol and plugs it in, so when I use programs like wget, a  
>> broken
>> link for the symbol image appears when I try to look at it offline.
>
> Does wget download the symbols and give them a funny name like
> glyph.php?code=368.html or glyph.php?code=368.png?
>
> I don't see anything sneaky on that page that would prevent you from
> using an automated tool, my guess is that the names are getting
> mangled.  If so, view the source of your downloaded page and see what
> the filename is that it's expecting and how it differs from what was
> actually generated. If they don't differ and the problem is really
> that the name is illegal for the filesystem then you may be able to
> just use a script to rename the images and the paths to the images in
> the html files.
>
> I've run into this problem before and just tried a different
> downloader program. It's been a while since I've used one so I can't
> think of one to suggest at the moment.
>
> -- 
> Matthew Nuzum
> newz2000 on freenode
> _______________________________________________
> Cialug mailing list
> Cialug at cialug.org
> http://cialug.org/mailman/listinfo/cialug



More information about the Cialug mailing list