python - How to get scraped items from main script using scrapy? -

- May 15, 2014

i hope list of scraped items in main script instead of using scrapy shell.

i know there method parse in class foospider define, , method return list of item. scrapy framework calls method. but, how can returned list myself?

i found many posts that, don't understand saying.

as context, put official example code here

import scrapy  tutorial.items import dmozitem  class dmozspider(scrapy.spider):     name = "dmoz"     allowed_domains = ["dmoz.org"]     start_urls = [         "http://www.dmoz.org/computers/programming/languages/python/",     ]      def parse(self, response):         href in response.css("ul.directory.dir-col > li > a::attr('href')"):             url = response.urljoin(href.extract())             yield scrapy.request(url, callback=self.parse_dir_contents)      def parse_dir_contents(self, response):         result = []         sel in response.xpath('//ul/li'):             item = dmozitem()             item['title'] = sel.xpath('a/text()').extract()             item['link'] = sel.xpath('a/@href').extract()             item['desc'] = sel.xpath('text()').extract()             result.append(item)          return result

how returned result main python script main.py or run.py?

if __name__ == "__main__":     ...     result = xxxx()     item in result:         print item

could provide code snippet in returned list somewhere?

thank much!

if want work with/process/transform or store items should item pipeline , usual scrapy crawl trick.

Search This Blog

Prevent

python - How to get scraped items from main script using scrapy? -

Comments

Post a Comment

Popular posts from this blog

github - Git errors while pushing -

django - (fields.E300) Field defines a relation with model 'AbstractEmailUser' which is either not installed, or is abstract -

Unity3d perpendicular vector3 -