php - libxml fails to parse correctly -


for reason , libxml fails text correctly page. want text "in last few days ..." main text of article, libxml exits saying couldn't find end tag of start tag. code:

[niko@dev1 tmp]$ cat domtest2.php  <?php      $url="http://www.journaldev.com/253/65-html5-tutorials-examples-and-resources-for-web-developers";     $ch=curl_init();     curl_setopt($ch,curlopt_url,$url);     curl_setopt($ch,curlopt_returntransfer,1);     $htmltext=curl_exec($ch);      $dom=new domdocument;     $result=$dom->loadhtml($htmltext);     $full_text=$dom->textcontent;      echo $full_text; ?> [niko@dev1 tmp]$  

code output:

[niko@dev1 tmp]$ php -f domtest2.php php warning:  domdocument::loadhtml(): htmlparsestarttag: misplaced <head> tag in entity, line: 57 in /tmp/domtest2.php on line 10 php stack trace: php   1. {main}() /tmp/domtest2.php:0 php   2. domdocument->loadhtml() /tmp/domtest2.php:10 php warning:  domdocument::loadhtml(): tag header invalid in entity, line: 69 in /tmp/domtest2.php on line 10 php stack trace: php   1. {main}() /tmp/domtest2.php:0 php   2. domdocument->loadhtml() /tmp/domtest2.php:10 php warning:  domdocument::loadhtml(): attvalue: " expected in entity, line: 76 in /tmp/domtest2.php on line 10 php stack trace: php   1. {main}() /tmp/domtest2.php:0 php   2. domdocument->loadhtml() /tmp/domtest2.php:10 php warning:  domdocument::loadhtml(): tag section invalid in entity, line: 76 in /tmp/domtest2.php on line 10 php stack trace: php   1. {main}() /tmp/domtest2.php:0 php   2. domdocument->loadhtml() /tmp/domtest2.php:10 php warning:  domdocument::loadhtml(): couldn't find end of start tag section in entity, line: 76 in /tmp/domtest2.php on line 10 php stack trace: php   1. {main}() /tmp/domtest2.php:0 php   2. domdocument->loadhtml() /tmp/domtest2.php:10 65 html5 tutorials, examples , resources web developers - journaldevwindow._wpemojisettings={"baseurl":"https:\/\/s.w.org\/images\/core\/emoji\/72x72\/","ext":".png","source":{"concatemoji":"http:\/\/www.journaldev.com\/wp-includes\/js\/wp-emoji-release.min.js"}};!function(a,b,c){function d(a){var c,d,e,f=b.createelement("canvas"),g=f.getcontext&&f.getcontext("2d"),h=string.fromcharcode;if(!g||!g.filltext)return!1;switch(g.textbaseline="top",g.font="600 32px arial",a){case"flag":return g.filltext(h(55356,56806,55356,56826),0,0),f.todataurl().length>3e3;case"diversity":return g.filltext(h(55356,57221),0,0),c=g.getimagedata(16,16,1,1).data,d=c[0]+","+c[1]+","+c[2]+","+c[3],g.filltext(h(55356,57221,55356,57343),0,0),c=g.getimagedata(16,16,1,1).data,e=c[0]+","+c[1]+","+c[2]+","+c[3],d!==e;case"simple":return g.filltext(h(55357,56835),0,0),0!==g.getimagedata(16,16,1,1).data[0];case"unicode8":return g.filltext(h(55356,57135),0,0),0!==g.getimagedata(16,16,1,1).data[0]}return!1}function e(a){var c=b.createelement("script");c.src=a,c.type="text/javascript",b.getelementsbytagname("head")[0].appendchild(c)}var f,g,h,i;for(i=array("simple","flag","unicode8","diversity"),c.supports={everything:!0,everythingexceptflag:!0},h=0;h<i.length;h++)c.supports[i[h]]=d(i[h]),c.supports.everything=c.supports.everything&&c.supports[i[h]],"flag"!==i[h]&&(c.supports.everythingexceptflag=c.supports.everythingexceptflag&&c.supports[i[h]]);c.supports.everythingexceptflag=c.supports.everythingexceptflag&&!c.supports.flag,c.domready=!1,c.readycallback=function(){c.domready=!0},c.supports.everything||(g=function(){c.readycallback()},b.addeventlistener?(b.addeventlistener("domcontentloaded",g,!1),a.addeventlistener("load",g,!1)):(a.attachevent("onload",g),b.attachevent("onreadystatechange",function(){"complete"===b.readystate&&c.readycallback()})),f=c.source||{},f.concatemoji?e(f.concatemoji):f.wpemoji&&f.twemoji&&(e(f.twemoji),e(f.wpemoji)))}(window,document,window._wpemojisettings);img.wp-smiley,img.emoji{display:inline !important;border:none !important;box-shadow:none !important;height:1em !important;width:1em !important;margin:0 .07em !important;vertical-align:-0.1em !important;background:none !important;padding:0 !important}jquery(function(){jquery('.wpdm-popup').click(function(){tb_show(jquery(this).html(),this.href+'&modal=1&width=600&height=400');return false;});jquery('.haspass').click(function(){var url=jquery(this).attr('href');var id=jquery(this).attr('rel');var password=jquery('#pass_'+id).val();jquery.post('http://www.journaldev.com/',{download:id,password:password},function(res){if(res=='error'){jquery('#wpdm_file_'+id+' .perror').html('wrong password');settimeout("jquery('#wpdm_file_"+id+" .perror').html('');",3000);return false;}else{location.href='http://www.journaldev.com/?wpdmact=process&did='+res;}});return false;});}).enews .screenread{height:1px;left:-1000em;overflow:hidden;position:absolute;top:-1000em;width:1px}(function(i,s,o,g,r,a,m){i['googleanalyticsobject']=r;i[r]=i[r]||function(){(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new date();a=s.createelement(o),m=s.getelementsbytagname(o)[0];a.async=1;a.src=g;m.parentnode.insertbefore(a,m)})(window,document,'script','//www.google-analytics.com/analytics.js','ga');ga('create','ua-12171637-4','auto');ga('send','pageview');.site-title a{background:url(http://cdn.journaldev.com/wp-content/uploads/2014/05/cropped-final-jd-logo.png) no-repeat !important}.simple-social-icons ul li a, .simple-social-icons ul li a:hover{background-color:#f6f5f2 !important;border-radius:3px;color:#aaa !important;border:0px #fff solid !important;font-size:18px;padding:9px}.simple-social-icons ul li a:hover{background-color:#000 !important;border-color:#fff !important;color:#fff !important}.mctb-bar,.mctb-response,.mctb-close{background:#f7682c !important}.mctb-bar,.mctb-label,.mctb-close{color:#fff !important}.mctb-button{background:#096abf !important;border-color:#096abf !important}.mctb-email:focus{outline-color:#096abf !important}.mctb-button{color:#fff !important}journaldevjava, java ee, android, web development tutorials [niko@dev1 tmp]$ 

how can find wrong here? can reproduce error if copy code , run in terminal. also, want point out when copied manually html source url file , ran test worked, suspect there might encoding. however, article written in english, doesn't make sense.


Comments

Popular posts from this blog

java - Static nested class instance -

c# - Bluetooth LE CanUpdate Characteristic property -

JavaScript - Replace variable from string in all occurrences -