python - How to get scraped items from main script using scrapy? -
i hope list of scraped items in main script instead of using scrapy shell.
i know there method parse
in class foospider
define, , method return list of item
. scrapy framework calls method. but, how can returned list myself?
i found many posts that, don't understand saying.
as context, put official example code here
import scrapy tutorial.items import dmozitem class dmozspider(scrapy.spider): name = "dmoz" allowed_domains = ["dmoz.org"] start_urls = [ "http://www.dmoz.org/computers/programming/languages/python/", ] def parse(self, response): href in response.css("ul.directory.dir-col > li > a::attr('href')"): url = response.urljoin(href.extract()) yield scrapy.request(url, callback=self.parse_dir_contents) def parse_dir_contents(self, response): result = [] sel in response.xpath('//ul/li'): item = dmozitem() item['title'] = sel.xpath('a/text()').extract() item['link'] = sel.xpath('a/@href').extract() item['desc'] = sel.xpath('text()').extract() result.append(item) return result
how returned result
main python script main.py
or run.py
?
if __name__ == "__main__": ... result = xxxx() item in result: print item
could provide code snippet in returned list
somewhere?
thank much!
if want work with/process/transform or store items should item pipeline , usual scrapy crawl trick.
Comments
Post a Comment