ruby on rails - How do I convert a Nokogiri statement into Mechanize for screen scraping? -
i'm trying use mechanize scape tags page. i've used nokogiri scrape them before, i'm trying combine them wider mechanize class. here nokogiri statement:
page = nokogiri::html(open(@model.url, "user-agent" => request.env['http_user_agent'])) @model.icons = page.css("link[rel='apple-touch-icon']").to_s
and here thought mechanize equivalent it's not working:
agent = mechanize.new page = agent.get(@model.url, "user-agent" => request.env['http_user_agent']) @model.icons = page.search("link[rel='apple-touch-icon']").to_s
the first 1 returns link tag expected <link rel="apple-touch-icon" etc etc..></link>
. second statement returns blank string. if take to_s
off end super long output. assume it's error or actual mechanize object or something.
link long output when not converting string: https://gist.github.com/eadam/5583541
without sample html it's difficult recreate problem, general information might you.
that "long output" inspect
output of nokogiri::nodeset got when used search
method. if search
returns multiple nodes, or nodes have lots of children, inspect
output can go on ways, but, that's should do.
css
, search
similar, in return nodeset. css
assumes string passed in css accessor, while search
more generic, , attempts figure out whether passed in css or xpath expression. if figures wrong odds bad pattern find match. can use at
or search
generic , let nokogiri figure out, or at_css
, at_xpath
or css
, xpath
respectively replace them. at
derivations return first matching node, similar using search('some_path').first
.
to_s
turns nodeset representation of source passed in. prefer more explicit, using either to_xml
, to_xhtml
or to_html
.
why don't output search
css
? don't know because can't test against html you're parsing. answering questions, data-processing, gigo situation.
Comments
Post a Comment