java - How do I extract only the bold text from an HTML document? -


i need extract bold snippets in body of html document. need on server side using java (not on browser)

the text on page can bold because of tags e.g. <b>, <h1>, etc., or because of inline css styling style="font-weight:bold;", or because of external css styling using css clases.

i using jsoup, can use other library done.

thanks time!

a plain javascript solution: on sufficiently new browsers, can use getpropertyvalue method retrieve computed style of element. can traverse document tree , check text nodes; text nodes not have style, need check parents:

function consume(string) {   console.log(string); } function traverse(tree) {   var i;   if(tree.nodetype === 3) {     if(getcomputedstyle(tree.parentnode).getpropertyvalue('font-weight') === 'bold') {       consume(tree.textcontent);     }   }   for(i = 0; < tree.childnodes.length; i++) {     traverse(tree.childnodes[i]);   } } traverse(document.body); 

replace consume own function processes bold texts.

it seems computed value of font-weight bold when declared 700.

note pick text font weight set bold (700). elements computed font weight of 600, 800, or 900 appear in bold (depending on availability of typefaces of course). covered making obvious modification test.


Comments

Popular posts from this blog

c# - DetailsView in ASP.Net - How to add another column on the side/add a control in each row? -

javascript - firefox memory leak -

Trying to import CSV file to a SQL Server database using asp.net and c# - can't find what I'm missing -