Working with Data Sources 2

时间：2016-11-17 08:05:30 阅读：114 评论：0 收藏：0 [点我收藏+]

标签：rar data als for graph parse with like elements

Web Scriping:

1. We can also use requests.get to get the HTML file form a webpage.

2. If we would like to extract the content from the webpage, we can use BeautifulSoup Library.

　　from bs4 import BeautifulSoup

　　parser = BeautifulSoup(content, ‘html.parser‘) #initial the parser, pass the content by using BeautifulSoup

　　body = parser.body # extract the <p></p> from the parser

　　p = body.p #Get body from <p></p>

　　head = parser.head

　　title_text = head.title.text #Get the content from <title></title>

3. We can use find_all function to find all the relevant content in the webpage. The find_all function can only being usd to bs4 elements.(tag)

　　head = parser.find_all("head") # Find all the files with tag head and save them as a list into variable head.

　　title = head[0].find_all("title")

　　title_text = title[0].text

4. Find_all function can also find the content by its id. Find_all always return a list.

　　second_paragraph_text = parser.find_all("p", id ="second")[0].text

5. Find_all function can also find the content by class.

　　second_inner_paragraph_text = parser.find_all("p", class_= "inner-text")[1].text # "p" indicates the tag of the class.

6. We can also use CSS selector to find the specific content. Same as find_all method. selector method also works on the sb4 format and return a list.

　　first_outer_text = parser.select(".outer-text")[0].text

　　second_text = parser.select("#second")[0].text

Working with Data Sources 2

标签：rar data als for graph parse with like elements

原文地址：http://www.cnblogs.com/kingoscar/p/6072286.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行