perl - Using curlmirror.pl gives different outputs -
using http://curl.haxx.se/programs/curlmirror.txt, i'm looking download website , check changes between newly downloaded website , 1 have downloaded previously. when download same website links on website use relative paths, use absolute paths, , counts "change" though website did not change.
usage: curlmirror.pl -l -d 3 -o someoutputfiledirectory/url http://url output 1: <td><a href="testing.htm">link</a></td> output 2: <td><a href="http://mydomain.com/testing.htm">link</a></td> is there way convert relative paths absolute paths or other way around? need standardize download these links not appear "changes"
updated
i assume url placed $url variable. can try bellow:
perl -pe 'begin {$url="http://somedomain.org"} s!(\b(?:url|href)=")([^/]+)(")!$1$url/$2$3!gi' << xxx <td><a href="testing.htm">link</a></td> <td><a href="http://mydomain.com/testing.htm">link</a></td> <meta http-equiv="refresh" content="0;url="home"> xxx output:
<td><a href="http://mymain.org/testing.htm">link</a></td> <td><a href="http://mydomain.com/testing.htm">link</a></td> <meta http-equiv="refresh" content="0;url="http://mymain.org/home"> it replaces href="..." or url="..." (case-insensitive) patterns href="$url/..." or url="$url/..." if ... not contains / character.
if input file, can replace these patterns in file directly:
cat >tfile << xxx <td><a href="testing.htm">link</a></td> <td><a href="http://mydomain.com/testing.htm">link</a></td> <meta http-equiv="refresh" content="0;url="home"> xxx cat tfile perl -i -pe 'begin {$url="http://mymain.org"} s!(\b(?:url|href)=")([^/]+)(")!$1$url/$2$3!gi' tfile echo "---" cat tfile output:
<td><a href="testing.htm">link</a></td> <td><a href="http://mydomain.com/testing.htm">link</a></td> <meta http-equiv="refresh" content="0;url="home"> --- <td><a href="http://mymain.org/testing.htm">link</a></td> <td><a href="http://mydomain.com/testing.htm">link</a></td> <meta http-equiv="refresh" content="0;url="http://mymain.org/home">
Comments
Post a Comment