java - How do I extract only the bold text from an HTML document? -
i need extract bold snippets in body of html document. need on server side using java (not on browser)
the text on page can bold because of tags e.g. <b>
, <h1>
, etc., or because of inline css styling style="font-weight:bold;"
, or because of external css styling using css clases.
i using jsoup, can use other library done.
thanks time!
a plain javascript solution: on sufficiently new browsers, can use getpropertyvalue
method retrieve computed style of element. can traverse document tree , check text nodes; text nodes not have style, need check parents:
function consume(string) { console.log(string); } function traverse(tree) { var i; if(tree.nodetype === 3) { if(getcomputedstyle(tree.parentnode).getpropertyvalue('font-weight') === 'bold') { consume(tree.textcontent); } } for(i = 0; < tree.childnodes.length; i++) { traverse(tree.childnodes[i]); } } traverse(document.body);
replace consume
own function processes bold texts.
it seems computed value of font-weight
bold
when declared 700
.
note pick text font weight set bold (700). elements computed font weight of 600, 800, or 900 appear in bold (depending on availability of typefaces of course). covered making obvious modification test.
Comments
Post a Comment