regex - Create new list from old using re.sub() in python 2.7 -


my goal take xml file, pull out instances of specific element, remove xml tags, work on remaining text.

i started this, works remove xml tags, entire xml file:

from urllib import urlopen import re  url = [url of xml file here]  #the url of file search  raw = urlopen(url).read()   #open file , read variable  exp = re.compile(r'<.*?>') text_only = exp.sub('',raw).strip() 

i've got this, text2 = soup.find_all('quoted-block'), creates list of quoted-block elements (yes, know need import beautifulsoup).

but can't figure out how apply regex list resulting soup.find_all. i've tried use text_only = [item item in text2 if exp.sub('',item).strip()] , variations keep getting error: typeerror: expected string or buffer

what doing wrong?

you don't want regex this. instead use beautifulsoup's existing support grabbing text:

quoted_blocks = soup.find_all('quoted-block') text_chunks = [block.get_text() block in quoted_blocks] 

Comments

Popular posts from this blog

java - Static nested class instance -

c# - Bluetooth LE CanUpdate Characteristic property -

JavaScript - Replace variable from string in all occurrences -