[Cialug] Interesting problem [OT?]

Lars Althof lars at larch.dk
Fri Nov 14 18:35:33 CST 2008


I have seen websites that do not allow links to images generated by scripts.
In order to bypass those restictions I have used the Apache rewrite engine.
You could do that same, and make the browser think it is getting a normal
png file.

A rule like this would do it:
RewriteRule   ^glyph.([^.]*).png  glyph.php?code=$1

Then change the img tags to read glyph.123.png


On Fri, Nov 14, 2008 at 6:14 PM, Stuart Thiessen <thiessenstuart at aol.com>wrote:

> Oh, ok. I haven't used these tools before, so I wasn't sure where the
> connection was. Thanks for explaining. :)
> Stuart
>
> On Nov 14, 2008, at 18:06 , Tony Bibbs wrote:
>
> A lot of the UI test tools automate doing things in the browser.  I was
> suggesting that it or something like it (Bad Boy + Jmeter) might be able to
> do what you want.
>
> --Tony
>
> On Fri, Nov 14, 2008 at 5:56 PM, Stuart Thiessen <thiessenstuart at aol.com>wrote:
>
>> The img tag shows (for example) <img src="
>> http://www.signbank.org/swis/glyph.php?code=45568">.  The glph script
>> returns the appropriate image file for that code number. I have the dump of
>> the html, but it is these <img> tags that are blocking me from having a
>> complete offline copy. That was why I was also thinking of trying to
>> automate some kind of PDF dump of each page.
>>
>> Tony, I did look at the Selenium website, but I am not sure how it will
>> help with this task. Could you explain?
>>
>> Thanks,
>>
>> Stuart
>>
>> On Nov 14, 2008, at 17:36 , Matthew Nuzum wrote:
>>
>>  On Fri, Nov 14, 2008 at 5:20 PM, Stuart Thiessen <thiessenstuart at aol.com>
>>> wrote:
>>>
>>>> Anyway, my technical challenge is that the organization developing this
>>>> writing system has published a PHP database of the symbols at:
>>>> http://www.signbank.org/swis/data.php?subset=&bs_code=*
>>>>
>>>> I need to get a offline dump of each of the basesymbol child pages
>>>> listed on
>>>> that page. I can't do a simple download of the page as HTML because the
>>>> image file showing the symbol is actually a link to a script that finds
>>>> the
>>>> right symbol and plugs it in, so when I use programs like wget, a broken
>>>> link for the symbol image appears when I try to look at it offline.
>>>>
>>>
>>> Does wget download the symbols and give them a funny name like
>>> glyph.php?code=368.html or glyph.php?code=368.png?
>>>
>>> I don't see anything sneaky on that page that would prevent you from
>>> using an automated tool, my guess is that the names are getting
>>> mangled.  If so, view the source of your downloaded page and see what
>>> the filename is that it's expecting and how it differs from what was
>>> actually generated. If they don't differ and the problem is really
>>> that the name is illegal for the filesystem then you may be able to
>>> just use a script to rename the images and the paths to the images in
>>> the html files.
>>>
>>> I've run into this problem before and just tried a different
>>> downloader program. It's been a while since I've used one so I can't
>>> think of one to suggest at the moment.
>>>
>>> --
>>> Matthew Nuzum
>>> newz2000 on freenode
>>> _______________________________________________
>>> Cialug mailing list
>>> Cialug at cialug.org
>>> http://cialug.org/mailman/listinfo/cialug
>>>
>>
>> _______________________________________________
>> Cialug mailing list
>> Cialug at cialug.org
>> http://cialug.org/mailman/listinfo/cialug
>>
>
> _______________________________________________
> Cialug mailing list
> Cialug at cialug.org
> http://cialug.org/mailman/listinfo/cialug
>
>
> =
> _______________________________________________
> Cialug mailing list
> Cialug at cialug.org
> http://cialug.org/mailman/listinfo/cialug
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://cialug.org/pipermail/cialug/attachments/20081114/f60c014f/attachment.html


More information about the Cialug mailing list