码迷,mamicode.com
首页 > 其他好文 > 详细

FIRST SCRAPY PRJ

时间:2014-12-19 23:14:04      阅读:497      评论:0      收藏:0      [点我收藏+]

标签:

zpc@Lenovo-PC:/prj/pyscrapy/a$ scrapy startproject helloword
New Scrapy project ‘helloword‘ created in:
    /cygdrive/e/01.prj/pyscrapy/a/helloword

You can start your first spider with:
    cd helloword
    scrapy genspider example example.com

 

zpc@Lenovo-PC:/prj/pyscrapy/a/helloword$ scrapy genspider baidu www.baidu.com
Created spider ‘baidu‘ using template ‘basic‘ in module:
  helloword.spiders.baidu

 

问题:


zpc@Lenovo-PC:/prj/pyscrapy/a/tutorial$ scrapy crawl dmoz
/cygdrive/e/01.prj/pyscrapy/a/tutorial/tutorial/spiders/dmoz_spider.py:3: ScrapyDeprecationWarning: tutorial.spiders.dmoz_spider.DmozSpider inherits from deprecated class scrapy.spider.BaseSpider, please inherit from scrapy.spider.Spider. (warning only on first subclass, there may be others)
  class DmozSpider(BaseSpider):
2014-12-17 11:32:38+0000 [scrapy] INFO: Scrapy 0.24.4 started (bot: tutorial)
2014-12-17 11:32:38+0000 [scrapy] INFO: Optional features available: ssl, http11
2014-12-17 11:32:38+0000 [scrapy] INFO: Overridden settings: {‘NEWSPIDER_MODULE‘: ‘tutorial.spiders‘, ‘SPIDER_MODULES‘: [‘tutorial.spiders‘], ‘BOT_NAME‘: ‘tutorial‘}
2014-12-17 11:32:40+0000 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-12-17 11:32:41+0000 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2014-12-17 11:32:41+0000 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2014-12-17 11:32:41+0000 [scrapy] INFO: Enabled item pipelines:
2014-12-17 11:32:41+0000 [dmoz] INFO: Spider opened
2014-12-17 11:32:41+0000 [dmoz] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2014-12-17 11:32:41+0000 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2014-12-17 11:32:41+0000 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080


 

iconv [选项...] [文件...]
有如下选项可用:
输入/输出格式规范:
-f, --from-code=名称 原始文本编码
-t, --to-code=名称 输出编码
信息:
-l, --list 列举所有已知的字符集
输出控制:
-c 从输出中忽略无效的字符
-o, --output=FILE 输出文件
-s, --silent 关闭警告
--verbose 打印进度信息

iconv -f utf-8 -t gb2312 /server_test/reports/software_.txt > /server_test/reports/software_asserts.txt

FIRST SCRAPY PRJ

标签:

原文地址:http://www.cnblogs.com/zhang-pengcheng/p/4174777.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!