码迷,mamicode.com
首页 > 其他好文 > 详细

Scrapy 如何控制导出顺序

时间:2021-04-05 12:28:39      阅读:0      评论:0      收藏:0      [点我收藏+]

标签:strong   test   time   tin   lease   class   utf-8   lis   efi   

Scrapy 如何控制导出顺序

1. 遇到的问题

在用Scrapy到处item的时候,发现顺序错乱(应该是按照abc的顺序排列的),并不是items.py文件中定义的顺序,那么如何控制呢?

2. fields_to_export

我在查看官网文档的时候找到了这个属性,它的解释是这样的:

fields_to_export

A list with the name of the fields that will be exported, or None if you want to export all fields. Defaults to None.

Some exporters (like CsvItemExporter) respect the order of the fields defined in this attribute.

When using item objects that do not expose all their possible fields, exporters that do not support exporting a different subset of fields per item will only export the fields found in the first item exported. Use fields_to_export to define all the fields to be exported.

大致意思是:这个列表(它是一个列表)可以控制导出的字段个数,但是在一些导出器像CsvItemExporter可以控制导出字段的顺序

所以:只需要在使用Exporter的时候,传一个fields_to_export的参数,就可以控制导出字段的个数/顺序

3. 示例

pipelines.py

from scrapy.exporters import JsonLinesItemExporter, CsvItemExporter
from itemadapter import ItemAdapter

fields_to_export = [‘city_name‘, ‘house_addr‘, ‘house_class‘, ‘house_size‘, ‘house_facility‘, ‘house_price‘,
                    ‘house_release_time‘]


class JsonLinesItemPipeline:

    def __init__(self):
        self.file = open(‘storages/renting.jl‘, ‘wb‘)
        self.exporter = JsonLinesItemExporter(self.file, encoding=‘utf-8‘, fields_to_export=fields_to_export)

    def open_spider(self, spider):
        pass

    def process_item(self, item, spider):
        self.exporter.export_item(item)
        return item

    def close_spider(self, spider):
        self.file.close()


class CsvItemPipeline:

    def __init__(self):
        self.file = open(‘storages/renting.csv‘, ‘wb‘)
        self.exporter = CsvItemExporter(self.file, fields_to_export=fields_to_export)

    def open_spider(self, spider):
        pass

    def process_item(self, item, spider):
        self.exporter.export_item(item)
        return item

    def close_spider(self, spider):
        self.file.close()

参考:

Scrapy 如何控制导出顺序

标签:strong   test   time   tin   lease   class   utf-8   lis   efi   

原文地址:https://www.cnblogs.com/pineapple-py/p/14613390.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!