[ciapug] Extract files used on a page

Darcy Baston ciapug@cialug.org
Tue, 22 Feb 2005 18:24:53 -0600


This is a multi-part message in MIME format.
--------------060404090602010802000503
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Ok. This page has code showing how to extract links from pages. It 
checks their status, but you can change it to get file sizes instead.

http://www.webreference.com/programming/php/cookbook/chap11/2/3.html

This b it seems the most useful:

|function pc_link_extractor($s) {|
|    $a = array();|
|    if (preg_match_all('/<A\s+.*?HREF=[\"\']?([^\"\' >]*)[\"\']?[^>]*>(.*?)<\/A>/i',|
|                       $s,$matches,PREG_SET_ORDER)) {|
|        foreach($matches as $match) {|
|            array_push($a,array($match[1],$match[2]));|
|        }|
|    }|
|    return $a;|
|}
|

Darcy


jcbailey@code0.net wrote:

>I already have a function to get the file size. I need something that will
>parse HTML and get all the files its linking to (CSS,JS,images, etc).
>
>
>Jon
>
>  
>
>
>  
>


--------------060404090602010802000503
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Ok. This page has code showing how to extract links from pages. It
checks their status, but you can change it to get file sizes instead.<br>
<br>
<a class="moz-txt-link-freetext" href="http://www.webreference.com/programming/php/cookbook/chap11/2/3.html">http://www.webreference.com/programming/php/cookbook/chap11/2/3.html</a><br>
<br>
This b it seems the most useful:<br>
<br>
<pre><code class="Code">function pc_link_extractor($s) {</code>
<code class="Code">    $a = array();</code>
<code class="Code">    if (preg_match_all('/&lt;A\s+.*?HREF=[\"\']?([^\"\' &gt;]*)[\"\']?[^&gt;]*&gt;(.*?)&lt;\/A&gt;/i',</code>
<code class="Code">                       $s,$matches,PREG_SET_ORDER)) {</code>
<code class="Code">        foreach($matches as $match) {</code>
<code class="Code">            array_push($a,array($match[1],$match[2]));</code>
<code class="Code">        }</code>
<code class="Code">    }</code>
<code class="Code">    return $a;</code>
<code class="Code">}
</code></pre>
Darcy<br>
<br>
<br>
<a class="moz-txt-link-abbreviated" href="mailto:jcbailey@code0.net">jcbailey@code0.net</a> wrote:
<blockquote cite="mid26505.63.84.4.1.1109117103.squirrel@63.84.4.1"
 type="cite">
  <pre wrap="">I already have a function to get the file size. I need something that will
parse HTML and get all the files its linking to (CSS,JS,images, etc).


Jon

  </pre>
  <pre wrap="">

  </pre>
</blockquote>
<br>
</body>
</html>

--------------060404090602010802000503--