java - Count number of word in string which is having html tags -


consider following string has html tag

"<p>article</p> <p>article</p> <p>article</p> <p>&nbsp</p>"; 

now want count number of word contained in above mentioned string

it produce worong output

instead of 3 word count displays 4 word count

it consider <p>&nbsp</p> word wrong

please correct following program

string str = "<p>article</p> <p>article</p> <p>article</p> <p>&nbsp</p>"; org.jsoup.nodes.document dom = jsoup.parse(str); string str2 = dom.text(); system.out.println(str2.split(" ").length); 

what changes should made correct output?

thanks in advance.

as benjamin mentioned in comment, after &nbsp, should add semicolon ( ). if not add it, cannot parse based on instructions because thought "one element", not want.


Comments

Popular posts from this blog

java - Static nested class instance -

c# - Bluetooth LE CanUpdate Characteristic property -

JavaScript - Replace variable from string in all occurrences -