Perceived mismatch between hand-rolled re and XPath selector in scrapy shell -


i've opened scrapy shell url want , trying select instances of p tags such that:

<div class="foo"><p>blah</p></div> 

but there seems mismatch can't instances of tags.

in [12]: len(hxs.re("<div class=\"foo")) out[12]: 13  in [13]: len(hxs.select('//div[contains(@class, "foo")]')) out[13]: 1 

and in fact, can't full account of p tags xpath @ all...

in [14]: len(hxs.select('//p')) out[14]: 6 

what missing? thought line [14] give instances of p tags in document.

the html trying select embedded block, wasn't considered valid html xpath. seems common issue new scrapy users page has ajax/javascript content, detectable hashtag in uri: http://example.com/content1#slide1

all of content resides in html code, browser needs run javascript populate whatever content hashtag points dom itself, xpath/bs4 for.

tt will, however, pullable regular expressions, if you're bold (hacky) enough. i'm considering other alternatives... making new xml dom out of contents of script block.


Comments

Popular posts from this blog

c# - DetailsView in ASP.Net - How to add another column on the side/add a control in each row? -

javascript - firefox memory leak -

Trying to import CSV file to a SQL Server database using asp.net and c# - can't find what I'm missing -