java - Count number of word in string which is having html tags -


consider following string has html tag

"<p>article</p> <p>article</p> <p>article</p> <p>&nbsp</p>"; 

now want count number of word contained in above mentioned string

it produce worong output

instead of 3 word count displays 4 word count

it consider <p>&nbsp</p> word wrong

please correct following program

string str = "<p>article</p> <p>article</p> <p>article</p> <p>&nbsp</p>"; org.jsoup.nodes.document dom = jsoup.parse(str); string str2 = dom.text(); system.out.println(str2.split(" ").length); 

what changes should made correct output?

thanks in advance.

as benjamin mentioned in comment, after &nbsp, should add semicolon ( ). if not add it, cannot parse based on instructions because thought "one element", not want.


Comments

Popular posts from this blog

matlab - error with cyclic autocorrelation function -

django - (fields.E300) Field defines a relation with model 'AbstractEmailUser' which is either not installed, or is abstract -

c# - What is a good .Net RefEdit control to use with ExcelDna? -