标签:
安装 pip install selenium
web
phantomjs下载 :http://phantomjs.org/download.html
浏览器驱动下载:http://www.seleniumhq.com/download
chrome: http://chromedriver.storage.googleapis.com/index.html?path=2.22/
#!/usr/bin/env python # encoding: utf-8 from selenium import webdriver driver = webdriver.Chrome() url = ‘http://www.toutiao.com/news_fashion/‘ driver.get(url) print driver.title
爬取今日头条实例,使用刷新方法,来改变文章内容,暂时还不会控制鼠标滑动来实现
#!/usr/bin/env python
# encoding: utf-8
import time
from selenium import webdriver
import itertools
driver = webdriver.Chrome()
url = ‘http://www.toutiao.com/news_fashion/‘
driver.get(url)
print driver.get(url)
for x in range(2):
driver.refresh()
titles = driver.find_elements_by_class_name("title-box")
contents = driver.find_elements_by_class_name("abstract")
imgs = driver.find_element_by_css_selector(".feedimg")
for title, content, img in zip(titles, contents, itertools.repeat(imgs)):
data = {
‘title‘: title.text,
‘content‘: content.text,
‘img‘: img.get_attribute(‘src‘)
}
print data
time.sleep(10)
driver.close()
标签:
原文地址:http://www.cnblogs.com/jwong/p/5671426.html