试想一下,前面做的实验和例子都只有一个spider。然而,现实的开发的爬虫肯定不止一个。既然这样,那么就会有如下几个问题:1、在同一个项目中怎么创建多个爬虫的呢?2、多个爬虫的时候是怎么将他们运行起来呢?
说明:本文章是基于前面几篇文章和实验的基础上完成的。如果您错过了,或者有疑惑的地方可以在此查看:
scrapy爬虫成长日记之创建工程-抽取数据-保存为json格式的数据
一、创建spider
1、创建多个spider, scrapy genspider spidername domain
1 |
scrapy genspider CnblogsHomeSpider cnblogs.com |
通过上述命令创建了一个spider name为CnblogsHomeSpider的爬虫,start_urls为http://www.cnblogs.com/的爬虫
2、查看项目下有几个爬虫scrapy list
1 |
[root<span class="variable" style="color:#008080;">@bogon</span> cnblogs]<span style="color:#008000;"><span class="comment" style="color:#999988;font-style:italic;">#</span></span><span class="comment" style="color:#999988;font-style:italic;"><span style="color:#008000;"> scrapy list</span></span><span style="color:#008000;"></span> <span style="color:#000000;"><span class="constant">CnblogsHomeSpider</span> <span class="constant">CnblogsSpider</span></span> |
由此可以知道我的项目下有两个spider,一个名称叫CnblogsHomeSpider,另一个叫CnblogsSpider。
更多关于scrapy命令可参考:http://doc.scrapy.org/en/latest/topics/commands.html
二、让几个spider同时运行起来
现在我们的项目有两个spider,那么现在我们怎样才能让两个spider同时运行起来呢?你可能会说写个shell脚本一个个调用,也可能会说写个python脚本一个个运行等。然而我在stackoverflow.com上看到。的确也有不上前辈是这么实现。然而官方文档是这么介绍的。
1、Run Scrapy from a script
1 2 3 4 |
<span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">import</span></span><span style="color:#000000;"> scrapy </span><span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">from</span></span> scrapy.crawler <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">import</span></span><span style="color:#000000;"> CrawlerProcess </span><span style="color:#0000FF;"><span class="class" style="color:#445588;font-weight:bold;"><span class="keyword" style="color:#333333;">class</span></span></span><span class="class" style="color:#445588;font-weight:bold;"><span style="color:#000000;"> <span class="title" style="color:#445588;">MySpider</span><span class="params">(scrapy.Spider)</span>:</span></span><span style="color:#000000;"> <span class="indent"> </span></span><span style="color:#008000;"><span class="comment" style="color:#999988;font-style:italic;">#</span></span><span class="comment" style="color:#999988;font-style:italic;"><span style="color:#008000;"> Your spider definition</span></span><span style="color:#008000;"></span> <span style="color:#000000;"><span class="indent"> </span>... process </span>=<span style="color:#000000;"> CrawlerProcess({ <span class="indent"> </span></span><span style="color:#800000;"><span class="string" style="color:#DD1144;">'</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">USER_AGENT</span><span style="color:#800000;">'</span></span><span style="color:#800000;"></span>: <span style="color:#800000;"><span class="string" style="color:#DD1144;">'</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)</span><span style="color:#800000;">'</span></span><span style="color:#800000;"></span><span style="color:#000000;"> }) process.crawl(MySpider) process.start() </span><span style="color:#008000;"><span class="comment" style="color:#999988;font-style:italic;">#</span></span><span class="comment" style="color:#999988;font-style:italic;"><span style="color:#008000;"> the script will block here until the crawling is finished</span></span><span style="color:#008000;"></span> |
这里主要通过 scrapy.crawler.CrawlerProcess来实现在脚本里运行一个spider。更多的例子可以在此查看:https://github.com/scrapinghub/testspiders
2、Running multiple spiders in the same process
- 通过 CrawlerProcess
1 2 3 4 5 |
<span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">import</span></span><span style="color:#000000;"> scrapy </span><span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">from</span></span> scrapy.crawler <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">import</span></span><span style="color:#000000;"> CrawlerProcess </span><span style="color:#0000FF;"><span class="class" style="color:#445588;font-weight:bold;"><span class="keyword" style="color:#333333;">class</span></span></span><span class="class" style="color:#445588;font-weight:bold;"><span style="color:#000000;"> <span class="title" style="color:#445588;">MySpider1</span><span class="params">(scrapy.Spider)</span>:</span></span><span style="color:#000000;"> <span class="indent"> </span></span><span style="color:#008000;"><span class="comment" style="color:#999988;font-style:italic;">#</span></span><span class="comment" style="color:#999988;font-style:italic;"><span style="color:#008000;"> Your first spider definition</span></span><span style="color:#008000;"></span> <span style="color:#000000;"><span class="indent"> </span>... </span><span style="color:#0000FF;"><span class="class" style="color:#445588;font-weight:bold;"><span class="keyword" style="color:#333333;">class</span></span></span><span class="class" style="color:#445588;font-weight:bold;"><span style="color:#000000;"> <span class="title" style="color:#445588;">MySpider2</span><span class="params">(scrapy.Spider)</span>:</span></span><span style="color:#000000;"> <span class="indent"> </span></span><span style="color:#008000;"><span class="comment" style="color:#999988;font-style:italic;">#</span></span><span class="comment" style="color:#999988;font-style:italic;"><span style="color:#008000;"> Your second spider definition</span></span><span style="color:#008000;"></span> <span style="color:#000000;"><span class="indent"> </span>... process </span>=<span style="color:#000000;"> CrawlerProcess() process.crawl(MySpider1) process.crawl(MySpider2) process.start() </span><span style="color:#008000;"><span class="comment" style="color:#999988;font-style:italic;">#</span></span><span class="comment" style="color:#999988;font-style:italic;"><span style="color:#008000;"> the script will block here until all crawling jobs are finished</span></span><span style="color:#008000;"></span> |
- 通过 CrawlerRunner
1 2 3 4 5 6 7 8 |
<span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">import</span></span><span style="color:#000000;"> scrapy </span><span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">from</span></span> twisted.internet <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">import</span></span><span style="color:#000000;"> reactor </span><span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">from</span></span> scrapy.crawler <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">import</span></span><span style="color:#000000;"> CrawlerRunner </span><span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">from</span></span> scrapy.utils.log <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">import</span></span><span style="color:#000000;"> configure_logging </span><span style="color:#0000FF;"><span class="class" style="color:#445588;font-weight:bold;"><span class="keyword" style="color:#333333;">class</span></span></span><span class="class" style="color:#445588;font-weight:bold;"><span style="color:#000000;"> <span class="title" style="color:#445588;">MySpider1</span><span class="params">(scrapy.Spider)</span>:</span></span><span style="color:#000000;"> <span class="indent"> </span></span><span style="color:#008000;"><span class="comment" style="color:#999988;font-style:italic;">#</span></span><span class="comment" style="color:#999988;font-style:italic;"><span style="color:#008000;"> Your first spider definition</span></span><span style="color:#008000;"></span> <span style="color:#000000;"><span class="indent"> </span>... </span><span style="color:#0000FF;"><span class="class" style="color:#445588;font-weight:bold;"><span class="keyword" style="color:#333333;">class</span></span></span><span class="class" style="color:#445588;font-weight:bold;"><span style="color:#000000;"> <span class="title" style="color:#445588;">MySpider2</span><span class="params">(scrapy.Spider)</span>:</span></span><span style="color:#000000;"> <span class="indent"> </span></span><span style="color:#008000;"><span class="comment" style="color:#999988;font-style:italic;">#</span></span><span class="comment" style="color:#999988;font-style:italic;"><span style="color:#008000;"> Your second spider definition</span></span><span style="color:#008000;"></span> <span style="color:#000000;"><span class="indent"> </span>... configure_logging() runner </span>=<span style="color:#000000;"> CrawlerRunner() runner.crawl(MySpider1) runner.crawl(MySpider2) d </span>=<span style="color:#000000;"> runner.join() d.addBoth(</span><span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">lambda</span></span><span style="color:#000000;"> _: reactor.stop()) reactor.run() </span><span style="color:#008000;"><span class="comment" style="color:#999988;font-style:italic;">#</span></span><span class="comment" style="color:#999988;font-style:italic;"><span style="color:#008000;"> the script will block here until all crawling jobs are finished</span></span><span style="color:#008000;"></span> |
- 通过CrawlerRunner和链接(chaining) deferred来线性运行
1 2 3 4 5 |
<span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">from</span></span> twisted.internet <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">import</span></span><span style="color:#000000;"> reactor, defer </span><span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">from</span></span> scrapy.crawler <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">import</span></span><span style="color:#000000;"> CrawlerRunner </span><span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">from</span></span> scrapy.utils.log <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">import</span></span><span style="color:#000000;"> configure_logging </span><span style="color:#0000FF;"><span class="class" style="color:#445588;font-weight:bold;"><span class="keyword" style="color:#333333;">class</span></span></span><span class="class" style="color:#445588;font-weight:bold;"><span style="color:#000000;"> <span class="title" style="color:#445588;">MySpider1</span><span class="params">(scrapy.Spider)</span>:</span></span><span style="color:#000000;"> <span class="indent"> </span></span><span style="color:#008000;"><span class="comment" style="color:#999988;font-style:italic;">#</span></span><span class="comment" style="color:#999988;font-style:italic;"><span style="color:#008000;"> Your first spider definition</span></span><span style="color:#008000;"></span> <span style="color:#000000;"><span class="indent"> </span>... </span><span style="color:#0000FF;"><span class="class" style="color:#445588;font-weight:bold;"><span class="keyword" style="color:#333333;">class</span></span></span><span class="class" style="color:#445588;font-weight:bold;"><span style="color:#000000;"> <span class="title" style="color:#445588;">MySpider2</span><span class="params">(scrapy.Spider)</span>:</span></span><span style="color:#000000;"> <span class="indent"> </span></span><span style="color:#008000;"><span class="comment" style="color:#999988;font-style:italic;">#</span></span><span class="comment" style="color:#999988;font-style:italic;"><span style="color:#008000;"> Your second spider definition</span></span><span style="color:#008000;"></span> <span style="color:#000000;"><span class="indent"> </span>... configure_logging() runner </span>=<span style="color:#000000;"> CrawlerRunner() <span class="decorator">@defer.inlineCallbacks</span> </span><span style="color:#0000FF;"><span class="function"><span class="keyword" style="color:#333333;font-weight:bold;">def</span></span></span><span class="function"><span style="color:#000000;"> <span class="title" style="color:#990000;font-weight:bold;">crawl</span><span class="params">()</span>:</span></span><span style="color:#000000;"> <span class="indent"> </span></span><span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">yield</span></span><span style="color:#000000;"> runner.crawl(MySpider1) <span class="indent"> </span></span><span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">yield</span></span><span style="color:#000000;"> runner.crawl(MySpider2) <span class="indent"> </span>reactor.stop() crawl() reactor.run() </span><span style="color:#008000;"><span class="comment" style="color:#999988;font-style:italic;">#</span></span><span class="comment" style="color:#999988;font-style:italic;"><span style="color:#008000;"> the script will block here until the last crawl call is finished</span></span><span style="color:#008000;"></span> |
这是官方文档提供的几种在script里面运行spider的方法。
三、通过自定义scrapy命令的方式来运行
1、创建commands目录
1 |
<span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">mkdir</span></span> commands |
注意:commands和spiders目录是同级的
2、在commands下面添加一个文件crawlall.py
这里主要通过修改scrapy的crawl命令来完成同时执行spider的效果。crawl的源码可以在此查看:https://github.com/scrapy/scrapy/blob/master/scrapy/commands/crawl.py
1 |
<span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">from</span></span> scrapy.commands <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">import</span></span><span style="color:#000000;"> ScrapyCommand </span><span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">from</span></span> scrapy.crawler <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">import</span></span><span style="color:#000000;"> CrawlerRunner </span><span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">from</span></span> scrapy.utils.conf <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">import</span></span><span style="color:#000000;"> arglist_to_dict </span><span style="color:#0000FF;"><span class="class" style="color:#445588;font-weight:bold;"><span class="keyword" style="color:#333333;">class</span></span></span><span class="class" style="color:#445588;font-weight:bold;"><span style="color:#000000;"> <span class="title" style="color:#445588;">Command</span><span class="params">(ScrapyCommand)</span>:</span></span><span style="color:#000000;"> <span class="indent"> </span>requires_project </span>=<span style="color:#000000;"> <span class="built_in" style="color:#0086B3;">True</span> <span class="indent"> </span></span><span style="color:#0000FF;"><span class="function"><span class="keyword" style="color:#333333;font-weight:bold;">def</span></span></span><span class="function"><span style="color:#000000;"> <span class="title" style="color:#990000;font-weight:bold;">syntax</span><span class="params">(self)</span>:</span></span><span style="color:#000000;"> <span class="indent"> </span><span class="indent"> </span></span><span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">return</span></span> <span style="color:#800000;"><span class="string" style="color:#DD1144;">'</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">[options]</span><span style="color:#800000;">'</span></span><span style="color:#800000;"></span> <span class="indent"> </span><span style="color:#0000FF;"><span class="function"><span class="keyword" style="color:#333333;font-weight:bold;">def</span></span></span><span class="function"><span style="color:#000000;"> <span class="title" style="color:#990000;font-weight:bold;">short_desc</span><span class="params">(self)</span>:</span></span><span style="color:#000000;"> <span class="indent"> </span><span class="indent"> </span></span><span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">return</span></span> <span style="color:#800000;"><span class="string" style="color:#DD1144;">'</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">Runs all of the spiders</span><span style="color:#800000;">'</span></span><span style="color:#800000;"></span> <span class="indent"> </span><span style="color:#0000FF;"><span class="function"><span class="keyword" style="color:#333333;font-weight:bold;">def</span></span></span><span class="function"><span style="color:#000000;"> <span class="title" style="color:#990000;font-weight:bold;">add_options</span><span class="params">(self, parser)</span>:</span></span><span style="color:#000000;"> <span class="indent"> </span><span class="indent"> </span>ScrapyCommand.add_options(self, parser) <span class="indent"> </span><span class="indent"> </span>parser.add_option(</span><span style="color:#800000;"><span class="string" style="color:#DD1144;">"</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">-a</span><span style="color:#800000;">"</span></span><span style="color:#800000;"></span>, dest=<span style="color:#800000;"><span class="string" style="color:#DD1144;">"</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">spargs</span><span style="color:#800000;">"</span></span><span style="color:#800000;"></span>, action=<span style="color:#800000;"><span class="string" style="color:#DD1144;">"</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">append</span><span style="color:#800000;">"</span></span><span style="color:#800000;"></span>, default=[], metavar=<span style="color:#800000;"><span class="string" style="color:#DD1144;">"</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">NAME=VALUE</span><span style="color:#800000;">"</span></span><span style="color:#800000;"></span><span style="color:#000000;">, <span class="indent"> </span><span class="indent"> </span><span class="indent"> </span><span class="indent"> </span><span class="indent"> </span><span class="indent"> </span> help</span>=<span style="color:#800000;"><span class="string" style="color:#DD1144;">"</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">set spider argument (may be repeated)</span><span style="color:#800000;">"</span></span><span style="color:#800000;"></span><span style="color:#000000;">) <span class="indent"> </span><span class="indent"> </span>parser.add_option(</span><span style="color:#800000;"><span class="string" style="color:#DD1144;">"</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">-o</span><span style="color:#800000;">"</span></span><span style="color:#800000;"></span>, <span style="color:#800000;"><span class="string" style="color:#DD1144;">"</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">--output</span><span style="color:#800000;">"</span></span><span style="color:#800000;"></span>, metavar=<span style="color:#800000;"><span class="string" style="color:#DD1144;">"</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">FILE</span><span style="color:#800000;">"</span></span><span style="color:#800000;"></span><span style="color:#000000;">, <span class="indent"> </span><span class="indent"> </span><span class="indent"> </span><span class="indent"> </span><span class="indent"> </span><span class="indent"> </span> help</span>=<span style="color:#800000;"><span class="string" style="color:#DD1144;">"</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">dump scraped items into FILE (use - for stdout)</span><span style="color:#800000;">"</span></span><span style="color:#800000;"></span><span style="color:#000000;">) <span class="indent"> </span><span class="indent"> </span>parser.add_option(</span><span style="color:#800000;"><span class="string" style="color:#DD1144;">"</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">-t</span><span style="color:#800000;">"</span></span><span style="color:#800000;"></span>, <span style="color:#800000;"><span class="string" style="color:#DD1144;">"</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">--output-format</span><span style="color:#800000;">"</span></span><span style="color:#800000;"></span>, metavar=<span style="color:#800000;"><span class="string" style="color:#DD1144;">"</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">FORMAT</span><span style="color:#800000;">"</span></span><span style="color:#800000;"></span><span style="color:#000000;">, <span class="indent"> </span><span class="indent"> </span><span class="indent"> </span><span class="indent"> </span><span class="indent"> </span><span class="indent"> </span> help</span>=<span style="color:#800000;"><span class="string" style="color:#DD1144;">"</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">format to use for dumping items with -o</span><span style="color:#800000;">"</span></span><span style="color:#800000;"></span><span style="color:#000000;">) <span class="indent"> </span></span><span style="color:#0000FF;"><span class="function"><span class="keyword" style="color:#333333;font-weight:bold;">def</span></span></span><span class="function"><span style="color:#000000;"> <span class="title" style="color:#990000;font-weight:bold;">process_options</span><span class="params">(self, args, opts)</span>:</span></span><span style="color:#000000;"> <span class="indent"> </span><span class="indent"> </span>ScrapyCommand.process_options(self, args, opts) <span class="indent"> </span><span class="indent"> </span></span><span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">try</span></span><span style="color:#000000;">: <span class="indent"> </span><span class="indent"> </span><span class="indent"> </span>opts.spargs </span>=<span style="color:#000000;"> arglist_to_dict(opts.spargs) <span class="indent"> </span><span class="indent"> </span></span><span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">except</span></span><span style="color:#000000;"> ValueError: <span class="indent"> </span><span class="indent"> </span><span class="indent"> </span></span><span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">raise</span></span> UsageError(<span style="color:#800000;"><span class="string" style="color:#DD1144;">"</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">Invalid -a value, use -a NAME=VALUE</span><span style="color:#800000;">"</span></span><span style="color:#800000;"></span>, print_help=<span style="color:#000000;"><span class="built_in" style="color:#0086B3;">False</span>) <span class="indent"> </span></span><span style="color:#0000FF;"><span class="function"><span class="keyword" style="color:#333333;font-weight:bold;">def</span></span></span><span class="function"><span style="color:#000000;"> <span class="title" style="color:#990000;font-weight:bold;">run</span><span class="params">(self, args, opts)</span>:</span></span><span style="color:#000000;"> <span class="indent"> </span><span class="indent"> </span></span><span style="color:#008000;"><span class="comment" style="color:#999988;font-style:italic;">#</span></span><span class="comment" style="color:#999988;font-style:italic;"><span style="color:#008000;">settings = get_project_settings()</span></span><span style="color:#008000;"></span> <span style="color:#000000;"> <span class="indent"> </span><span class="indent"> </span>spider_loader </span>=<span style="color:#000000;"> self.crawler_process.spider_loader <span class="indent"> </span><span class="indent"> </span></span><span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">for</span></span> spidername <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">in</span></span> args <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">or</span></span><span style="color:#000000;"> spider_loader.list(): <span class="indent"> </span><span class="indent"> </span><span class="indent"> </span></span><span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">print</span></span> <span style="color:#800000;"><span class="string" style="color:#DD1144;">"</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">*********cralall spidername************</span><span style="color:#800000;">"</span></span><span style="color:#800000;"></span> +<span style="color:#000000;"> spidername <span class="indent"> </span><span class="indent"> </span><span class="indent"> </span>self.crawler_process.crawl(spidername, </span>**<span style="color:#000000;">opts.spargs) <span class="indent"> </span><span class="indent"> </span>self.crawler_process.start()</span> |
这里主要是用了self.crawler_process.spider_loader.list()方法获取项目下所有的spider,然后利用self.crawler_process.crawl运行spider
3、commands命令下添加__init__.py文件
1 |
<span style="color:#0000FF;">touch</span> __init__.py |
注意:这一步一定不能省略。 我就是因为这个问题折腾了一天。囧。。。就怪自己半路出家的吧。
如果省略了会报这样一个异常
1 2 3 4 5 6 7 8 9 10 11 |
<span style="color:#000000;">Traceback (most recent <span class="operator"><span class="keyword" style="color:#333333;font-weight:bold;">call</span> <span class="keyword" style="color:#333333;font-weight:bold;">last</span>): File </span></span><span class="operator"><span style="color:#800000;"><span class="string" style="color:#DD1144;">"</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">/usr/local/bin/scrapy</span><span style="color:#800000;">"</span></span><span style="color:#800000;"></span>, line <span class="number" style="color:#009999;">9</span>, <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">in</span></span> <<span class="keyword" style="font-weight:bold;">module</span>><span style="color:#000000;"> load_entry_point(</span><span style="color:#800000;"><span class="string" style="color:#DD1144;">'</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">Scrapy==1.0.0rc2</span><span style="color:#800000;">'</span></span><span style="color:#800000;"></span>, <span style="color:#800000;"><span class="string" style="color:#DD1144;">'</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">console_scripts</span><span style="color:#800000;">'</span></span><span style="color:#800000;"></span>, <span style="color:#800000;"><span class="string" style="color:#DD1144;">'</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">scrapy</span><span style="color:#800000;">'</span></span><span style="color:#800000;"></span><span style="color:#000000;">)() File </span><span style="color:#800000;"><span class="string" style="color:#DD1144;">"</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/cmdline.py</span><span style="color:#800000;">"</span></span><span style="color:#800000;"></span>, line <span class="number" style="color:#009999;">122</span>, <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">in</span></span><span style="color:#000000;"> <span class="keyword" style="color:#333333;font-weight:bold;">execute</span> cmds </span>=<span style="color:#000000;"> _get_commands_dict(settings, inproject) File </span><span style="color:#800000;"><span class="string" style="color:#DD1144;">"</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/cmdline.py</span><span style="color:#800000;">"</span></span><span style="color:#800000;"></span>, line <span class="number" style="color:#009999;">50</span>, <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">in</span></span><span style="color:#000000;"> _get_commands_dict cmds.<span class="keyword" style="color:#333333;font-weight:bold;">update</span>(_get_commands_from_module(cmds_module, inproject)) File </span><span style="color:#800000;"><span class="string" style="color:#DD1144;">"</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/cmdline.py</span><span style="color:#800000;">"</span></span><span style="color:#800000;"></span>, line <span class="number" style="color:#009999;">29</span>, <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">in</span></span><span style="color:#000000;"> _get_commands_from_module </span><span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">for</span></span> cmd <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">in</span></span><span style="color:#000000;"> _iter_command_classes(<span class="keyword" style="color:#333333;font-weight:bold;">module</span>): File </span><span style="color:#800000;"><span class="string" style="color:#DD1144;">"</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/cmdline.py</span><span style="color:#800000;">"</span></span><span style="color:#800000;"></span>, line <span class="number" style="color:#009999;">20</span>, <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">in</span></span><span style="color:#000000;"> _iter_command_classes </span><span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">for</span></span> <span class="keyword" style="font-weight:bold;">module</span> <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">in</span></span><span style="color:#000000;"> walk_modules(module_name): File </span><span style="color:#800000;"><span class="string" style="color:#DD1144;">"</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">/usr/local/lib/python2.7/site-packages/Scrapy-1.0.0rc2-py2.7.egg/scrapy/utils/misc.py</span><span style="color:#800000;">"</span></span><span style="color:#800000;"></span>, line <span class="number" style="color:#009999;">63</span>, <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">in</span></span><span style="color:#000000;"> walk_modules mod </span>=<span style="color:#000000;"> import_module(path) File </span><span style="color:#800000;"><span class="string" style="color:#DD1144;">"</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">/usr/local/lib/python2.7/importlib/__init__.py</span><span style="color:#800000;">"</span></span><span style="color:#800000;"></span>, line <span class="number" style="color:#009999;">37</span>, <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">in</span></span><span style="color:#000000;"> import_module </span><span style="color:#800080;">__import__</span><span style="color:#000000;">(name) ImportError: <span class="keyword" style="color:#333333;font-weight:bold;">No</span> <span class="keyword" style="color:#333333;font-weight:bold;">module</span> named commands</span></span><span style="color:#000000;"></span> |
一开始怎么找都找不到原因在哪。耗了我一整天,后来到http://stackoverflow.com/上得到了网友的帮助。再次感谢万能的互联网,要是没有那道墙该是多么的美好呀!扯远了,继续回来。
4、settings.py目录下创建setup.py( 这一步去掉也没影响,不知道官网帮助文档这么写有什么具体的意义。 )
1 2 3 4 5 6 7 |
<span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">from</span></span> setuptools <span style="color:#0000FF;"><span class="keyword" style="color:#333333;font-weight:bold;">import</span></span><span style="color:#000000;"> setup, find_packages setup(name</span>=<span style="color:#800000;"><span class="string" style="color:#DD1144;">'</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">scrapy-mymodule</span><span style="color:#800000;">'</span></span><span style="color:#800000;"></span><span style="color:#000000;">, entry_points</span>=<span style="color:#000000;">{ </span><span style="color:#800000;"><span class="string" style="color:#DD1144;">'</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">scrapy.commands</span><span style="color:#800000;">'</span></span><span style="color:#800000;"></span><span style="color:#000000;">: [ </span><span style="color:#800000;"><span class="string" style="color:#DD1144;">'</span></span><span class="string" style="color:#DD1144;"><span style="color:#800000;">crawlall=cnblogs.commands:crawlall</span><span style="color:#800000;">'</span></span><span style="color:#800000;"></span><span style="color:#000000;">, ], }, )</span> |
这个文件的含义是定义了一个crawlall命令,cnblogs.commands为命令文件目录,crawlall为命令名。
5. 在settings.py中添加配置:
1 |
<span class="setting">COMMANDS_MODULE = <span style="color:#800000;"><span class="value">'</span></span><span class="value"><span style="color:#800000;">cnblogs.commands</span><span style="color:#800000;">'</span></span><span style="color:#800000;"></span></span><span style="color:#800000;"></span> |
6. 运行命令scrapy crawlall
最后源码更新至此: https://github.com/jackgitgz/CnblogsSpider