正则表达式查找网页源代码提取指定内容

时间：2020-01-30 21:05:50 阅读：184 评论：0 收藏：0 [点我收藏+]

import requests
import re

txt=‘<a href="https://www.vgirls.com/13404.html" class="list-title text-md h-2x" target="_blank">想把夏日的阳光寄给冬日的你</a>‘
urla=re.findall(‘<a href="(.*?)" class="list-title text-md h-2x" target="_blank">.*?</a>‘,txt)
for i in urla:
print(i)
urlb=re.findall(‘<a href=".*?" class="list-title text-md h-2x" target="_blank">(.*?)</a>‘,txt)
for i in urlb:
print(i)
结果：

https://www.vgirls.com/13404.html
想把夏日的阳光寄给冬日的你

总结：

1。根据网页源代码找到关键位置，主要分析相关同一级别的源代码的共同点

2。找到关键如txt的内容，复制下来
3。粘贴到空白处：urla=re.findall(‘ ‘,txt)
4.需要选择出来的部分去掉改成 (.*?)；不想选择但内容又变化的去掉改成 .?*,一定不能加括号

5。所以第一个只提取超级连接的地址；第二个只提取“标签A中的文字"

正则表达式查找网页源代码提取指定内容

标签：col int 根据 ref 正则表达式内容空白 sts htm

原文地址：https://www.cnblogs.com/xkdn/p/12243681.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行

正则表达式 查找网页源代码 提取指定内容

正则表达式查找网页源代码提取指定内容