码迷,mamicode.com
首页 > 编程语言 > 详细

python爬虫爬取豆瓣电影前250名电影及评分(requests+pyquery)

时间:2018-05-25 01:39:12      阅读:524      评论:0      收藏:0      [点我收藏+]

标签:[1]   tps   int   []   ext   download   top   面向   pen   

写了两个版本:

1、面向过程版本:

import requests
from pyquery import PyQuery as pq
url=https://movie.douban.com/top250
moves=[]
def sec(item):
    return item[1]
for i in range(0,255,25):
    content=requests.get(url+"?start="+str(i))#?start=25
    for  movie in pq(content.text).find(.item):
        moves.append([pq(movie).find(.title).html(),pq(movie).find(.rating_num).html()])
moves.sort(key=sec,reverse=True)
for move in moves:
    print(move[0],move[1])

2、面向对象版本:

import requests
from pyquery import PyQuery as pq

class Douban:
    def __init__(self):
        self.moves=[]
    def geturl(self):
        url=https://movie.douban.com/top250?start=%s
        urls=[]
        for i in range(0,250,25):
            urls.append(url%i)
        return urls
    def downloader(self,url):
        r=requests.get(url)
        return r.text
    def html_parser(self,page):
        for movie in pq(page).find(.item):
            title=pq(movie).find(.title).html()
            score=pq(movie).find(.rating_num).html()
            self.moves.append({
                    title:title,
                    score:score,
                    })
    def output(self):
        self.moves.sort(key=lambda x:x[score],reverse=True)
        for move in self.moves:
            print(move[title],move[score])
    def start(self):
        for url in self.geturl():
            #print(url)
            page=self.downloader(url)
            self.html_parser(page)
        self.output()
dou=Douban()
dou.start()

 

python爬虫爬取豆瓣电影前250名电影及评分(requests+pyquery)

标签:[1]   tps   int   []   ext   download   top   面向   pen   

原文地址:https://www.cnblogs.com/babihuang/p/9085867.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!