[ciapug] Extract files used on a page
Darcy Baston
ciapug@cialug.org
Tue, 22 Feb 2005 18:24:53 -0600
This is a multi-part message in MIME format.
--------------060404090602010802000503
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Ok. This page has code showing how to extract links from pages. It
checks their status, but you can change it to get file sizes instead.
http://www.webreference.com/programming/php/cookbook/chap11/2/3.html
This b it seems the most useful:
|function pc_link_extractor($s) {|
| $a = array();|
| if (preg_match_all('/<A\s+.*?HREF=[\"\']?([^\"\' >]*)[\"\']?[^>]*>(.*?)<\/A>/i',|
| $s,$matches,PREG_SET_ORDER)) {|
| foreach($matches as $match) {|
| array_push($a,array($match[1],$match[2]));|
| }|
| }|
| return $a;|
|}
|
Darcy
jcbailey@code0.net wrote:
>I already have a function to get the file size. I need something that will
>parse HTML and get all the files its linking to (CSS,JS,images, etc).
>
>
>Jon
>
>
>
>
>
>
--------------060404090602010802000503
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Ok. This page has code showing how to extract links from pages. It
checks their status, but you can change it to get file sizes instead.<br>
<br>
<a class="moz-txt-link-freetext" href="http://www.webreference.com/programming/php/cookbook/chap11/2/3.html">http://www.webreference.com/programming/php/cookbook/chap11/2/3.html</a><br>
<br>
This b it seems the most useful:<br>
<br>
<pre><code class="Code">function pc_link_extractor($s) {</code>
<code class="Code"> $a = array();</code>
<code class="Code"> if (preg_match_all('/<A\s+.*?HREF=[\"\']?([^\"\' >]*)[\"\']?[^>]*>(.*?)<\/A>/i',</code>
<code class="Code"> $s,$matches,PREG_SET_ORDER)) {</code>
<code class="Code"> foreach($matches as $match) {</code>
<code class="Code"> array_push($a,array($match[1],$match[2]));</code>
<code class="Code"> }</code>
<code class="Code"> }</code>
<code class="Code"> return $a;</code>
<code class="Code">}
</code></pre>
Darcy<br>
<br>
<br>
<a class="moz-txt-link-abbreviated" href="mailto:jcbailey@code0.net">jcbailey@code0.net</a> wrote:
<blockquote cite="mid26505.63.84.4.1.1109117103.squirrel@63.84.4.1"
type="cite">
<pre wrap="">I already have a function to get the file size. I need something that will
parse HTML and get all the files its linking to (CSS,JS,images, etc).
Jon
</pre>
<pre wrap="">
</pre>
</blockquote>
<br>
</body>
</html>
--------------060404090602010802000503--