ruby on rails - How do I convert a Nokogiri statement into Mechanize for screen scraping? -


i'm trying use mechanize scape tags page. i've used nokogiri scrape them before, i'm trying combine them wider mechanize class. here nokogiri statement:

page = nokogiri::html(open(@model.url, "user-agent" => request.env['http_user_agent'])) @model.icons = page.css("link[rel='apple-touch-icon']").to_s 

and here thought mechanize equivalent it's not working:

agent = mechanize.new page = agent.get(@model.url, "user-agent" => request.env['http_user_agent']) @model.icons = page.search("link[rel='apple-touch-icon']").to_s 

the first 1 returns link tag expected <link rel="apple-touch-icon" etc etc..></link>. second statement returns blank string. if take to_s off end super long output. assume it's error or actual mechanize object or something.

link long output when not converting string: https://gist.github.com/eadam/5583541

without sample html it's difficult recreate problem, general information might you.

that "long output" inspect output of nokogiri::nodeset got when used search method. if search returns multiple nodes, or nodes have lots of children, inspect output can go on ways, but, that's should do.

css , search similar, in return nodeset. css assumes string passed in css accessor, while search more generic, , attempts figure out whether passed in css or xpath expression. if figures wrong odds bad pattern find match. can use at or search generic , let nokogiri figure out, or at_css, at_xpath or css , xpath respectively replace them. at derivations return first matching node, similar using search('some_path').first.

to_s turns nodeset representation of source passed in. prefer more explicit, using either to_xml, to_xhtml or to_html.

why don't output search css? don't know because can't test against html you're parsing. answering questions, data-processing, gigo situation.


Comments

Popular posts from this blog

php - cannot display multiple markers in google maps v3 from traceroute result -

c# - DetailsView in ASP.Net - How to add another column on the side/add a control in each row? -

javascript - firefox memory leak -