码迷,mamicode.com
首页 > 其他好文 > 详细

爬虫之CrawlSpider简单案例之读书网

时间:2019-10-26 16:45:45      阅读:90      评论:0      收藏:0      [点我收藏+]

标签:follow   books   int   path   value   page   esc   llb   bookslist   

项目名py文件下

class DsSpider(CrawlSpider):
    name = ds
    allowed_domains = [dushu.com]
    start_urls = [https://www.dushu.com/book/1163_1.html]

    rules = (
        Rule(LinkExtractor(restrict_xpaths=//div[@class="pages"]), callback=parse_item, follow=True),
    )

    def parse_item(self, response):
        item = {}
        # print(response.url)
        lis = response.xpath(//div[@class="bookslist"]/ul/li)
        for li in lis:
            item[name] = li.xpath(.//h3/a/text()).extract_first()
            item[link] = li.xpath(.//h3/a/@href).extract_first()
            item[author] = li.xpath(.//p[1]/a/text()).extract_first()
        #item[‘domain_id‘] = response.xpath(‘//input[@id="sid"]/@value‘).get()
        #item[‘name‘] = response.xpath(‘//div[@id="name"]‘).get()
        #item[‘description‘] = response.xpath(‘//div[@id="description"]‘).get()
            yield item

 

爬虫之CrawlSpider简单案例之读书网

标签:follow   books   int   path   value   page   esc   llb   bookslist   

原文地址:https://www.cnblogs.com/zry-yt/p/11743410.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!