Python爬虫框架Scrapy 学习笔记 6 ------- 基本命令

时间：2015-01-07 19:07:36 阅读：324 评论：0 收藏：0 [点我收藏+]

标签：scrapy

1. 有些scrapy命令，只有在scrapy project根目录下才available,比如crawl命令

2 . scrapy genspider taobao http://detail.tmall.com/item.htm?id=12577759834

自动在spider目录下生成taobao.py

# -*- coding: utf-8 -*-
import scrapy


class TaobaoSpider(scrapy.Spider):
    name = "taobao"
    allowed_domains = ["http://detail.tmall.com/item.htm?id=12577759834"]
    start_urls = (
        ‘http://www.http://detail.tmall.com/item.htm?id=12577759834/‘,
    )

    def parse(self, response):
        pass

还有其它模板可以用

scrapy genspider taobao2 http://detail.tmall.com/item.htm?id=12577759834 --template=crawl

# -*- coding: utf-8 -*-
import scrapy
from scrapy.contrib.linkextractors import LinkExtractor
from scrapy.contrib.spiders import CrawlSpider, Rule

from project004.items import Project004Item


class Taobao2Spider(CrawlSpider):
    name = ‘taobao2‘
    allowed_domains = [‘http://detail.tmall.com/item.htm?id=12577759834‘]
    start_urls = [‘http://www.http://detail.tmall.com/item.htm?id=12577759834/‘]

    rules = (
        Rule(LinkExtractor(allow=r‘Items/‘), callback=‘parse_item‘, follow=True),
    )

    def parse_item(self, response):
        i = Project004Item()
        #i[‘domain_id‘] = response.xpath(‘//input[@id="sid"]/@value‘).extract()
        #i[‘name‘] = response.xpath(‘//div[@id="name"]‘).extract()
        #i[‘description‘] = response.xpath(‘//div[@id="description"]‘).extract()
        return i

3.列出当前项目所有spider: scrapy list

4.fetch命令用法

A. scrapy fetch --nolog http://www.example.com/some/page.html

B. scrapy fetch --nolog --headers http://www.example.com/

5.view命令在浏览器中查看网页内容

scrapy view http://www.example.com/some/page.html

6.查看设置

scrapy settings --get BOT_NAME

7.运行自包含的spider，不需要创建项目

scrapy runspider <spider_file.py>

8.scrapy project的部署： scrapy deploy

部署spider首先要有spider的server环境，一般使用scrapyd

安装scrapyd:pip install scrapyd

文档：http://scrapyd.readthedocs.org/en/latest/install.html

9.所有可用命令

C:\Users\IBM_ADMIN\PycharmProjects\pycrawl\project004>scrapy

Scrapy 0.24.4 - project: project004

Usage:

scrapy <command> [options] [args]

Available commands:

bench Run quick benchmark test

check Check spider contracts

crawl Run a spider

deploy Deploy project in Scrapyd target

edit Edit spider

fetch Fetch a URL using the Scrapy downloader

genspider Generate new spider using pre-defined templates

list List available spiders

parse Parse URL (using its spider) and print the results

runspider Run a self-contained spider (without creating a project)

settings Get settings values

shell Interactive scraping console

startproject Create new project

version Print Scrapy version

view Open URL in browser, as seen by Scrapy

Python爬虫框架Scrapy 学习笔记 6 ------- 基本命令

标签：scrapy

原文地址：http://dingbo.blog.51cto.com/8808323/1600296

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行