[ciapug] Logging outgoing traffic while being crawler friendly

Claus ciapug@cialug.org
Fri, 30 May 2003 09:14:03 -0500


LOL That's exactly what I did (see quoted original below).  It looks like a 
link, but only like a link to the redirection script and not to 
www.ames.ia.us .  The goal here is to log the usage of outgoing links while 
having these links count on search engines.  Some search engines rate pages 
higher if they are linked to from more sites.

I think my goal needs to be to make the link appear <a 
href="http://www.ames.ia.us"> to at least the search engines.  Maybe 
include some detection script that creates different links for crawlers/and 
other browsers.  Darned, that means I have to phrase the whole html 
document before serving it.

Till now I just extracted the header and body of the html and surrounded it 
with my template layout ( background, table, site index, and stuff like 
that).  Did anybody do something similar to modify the hrefs of an existing 
html file on the fly?

   Claus

---- header and body extraction logic ----
<?php
   /* Verify that file exists */
   if (!file_exists($page)) {
     $error_msg = "File " . $page . " does not exist!";
     error_routine("/template.php",$error_msg);
   }

   /* Extract header and body from html file */
   $page_content = join(' ', file($page));
   $page_content_low_case = strtolower($page_content);

   // Extract header
   $position = strpos($page_content_low_case,"<head>");
   if ($position === false) {
    $error_msg = "Phrasing the file " . $page . " to find the <head> tag 
failed!";
     error_routine("/template.php",$error_msg);
   }
   $head_start = $position + 6;

   $position = strpos($page_content_low_case,"</head>");
   if ($position === false) {
     $error_msg = "Phrasing the file " . $page . " to find the <head> tag 
failed!";
     error_routine("/template.php",$error_msg);
   }
   $head_length = $position - $head_start;

   $page_head = substr($page_content,$head_start,$head_length);

   // Extract body
   $position = strpos($page_content_low_case,"<body");
   if ($position === false) {
     $error_msg = "Phrasing the file " . $page . " to find the <body tag 
failed!";
     error_routine("/template.php",$error_msg);
   }
   $position = strpos($page_content_low_case,">",$position);
   if ($position === false) {
     $error_msg = "Phrasing the file " . $page . " to find the > of the 
<body tag failed!";
     error_routine("/template.php",$error_msg);
   }
   $body_start = $position + 1;

   $position = strpos($page_content_low_case,"</body>");
   if ($position === false) {
     $error_msg = "Phrasing the file " . $page . " to find the </body> tag 
failed!";
     error_routine("/template.php",$error_msg);
   }
   $body_length = $position - $body_start;

   $page_body = substr($page_content,$body_start,$body_length);
?>

At 05:03 PM 05/29/2003, Lathrop Preston wrote:
>or have go.php do something like this
>
><?
>
>//log the outlink
>
>header("Location:".$golink);
>
>?>
>
>that should look like a link
>I think
>
>
>
>David Champion wrote:
>>Could you maybe do something with either rewrite, or some kind of error 
>>handling, and make the link like:
>><a href="go/www.ames.ia.us">Go to the City of Ames site</a>
>>...then your magic thingy looks for anything after "go/" and redirects.
>>-dc
>>Claus wrote:
>>
>>>Hello
>>>
>>>I have a php redirect page that I use to redirect outgoing links 
>>>through.  That way I can keep track of what links people use from my 
>>>site.  Things work great with the exception that search engines / 
>>>crawlers don't account these links.  So for instance the link to 
>>>/go.php?http://www.ames.ia.us will never get put into the "Find web 
>>>pages that link to www.ames.ia.us" list of google.  That's because it 
>>>doesn't appear as a link.
>>>
>>>Is there another way that could do both, track outgoing links and be 
>>>crawler friendly?
>>>
>>>Thanks,
>>>   Claus
>>>
>>>----Link----
>>><A HREF="/go.php?http://www.ames.ia.us"
>>>  onMouseOver="window.status='http://www.ames.ia.us'; return true"
>>>  onMouseOut="window.status=''; return true">
>>>  Ames City Government</A>
>>>
>>>----Redirection/Logging Page----
>>><?php
>>>   $link = getenv('QUERY_STRING');
>>>
>>>   if ($REMOTE_ADDR != 'insert my ip here' &&
>>>       $REMOTE_ADDR != 'insert my other ip here') {
>>>     require_once('DB.php');
>>>     $db = DB::connect('pgsql://www@unix+localhost/database');
>>>     if (DB::isError($db)) {die ($db->getMessage());}
>>>
>>>     $result = $db->query("insert into web_stats_links_visited 
>>> (timestamp, server
>>>_name, remote_ip, remote_name, url) values (current_timestamp, 
>>>'$SERVER_NAME', '
>>>$REMOTE_ADDR', ' ', '$link')");
>>>     if (DB::isError($result)) {die ($result->getMessage());}
>>>     $db->disconnect();
>>>   }
>>>
>>>   header("Location: " . $link);
>>>   exit;
>>>?>
>>>
>>>_______________________________________________
>>>ciapug mailing list
>>>ciapug@cialug.org
>>>http://cialug.org/mailman/listinfo/ciapug
>>
>>_______________________________________________
>>ciapug mailing list
>>ciapug@cialug.org
>>http://cialug.org/mailman/listinfo/ciapug
>
>
>_______________________________________________
>ciapug mailing list
>ciapug@cialug.org
>http://cialug.org/mailman/listinfo/ciapug