码迷,mamicode.com
首页 > 编程语言 > 详细

python工作实战---对文本文件进行分析

时间:2020-01-14 21:01:29      阅读:131      评论:0      收藏:0      [点我收藏+]

标签:size   bsp   use   dir   体验   文件   std   apache   pos   

目录:

  • 查找以什么关键字结尾的文件
  • 判断文件大小
  • 使用python分析Apache的访问日志

      

 

 

 

判断目录下.py结尾的文件

[smcuser@smc-postman-script test]$ ll
total 4
-rw-rw-r--. 1 smcuser smcuser   0 Jan 12 23:13 1.txt
-rw-rw-r--. 1 smcuser smcuser   0 Jan 12 23:13 2.txt
-rw-rw-r--. 1 smcuser smcuser   0 Jan 12 23:13 3.txt
-rw-rw-r--. 1 smcuser smcuser   0 Jan 12 23:13 4.txt
-rw-rw-r--. 1 smcuser smcuser   0 Jan 12 23:13 5.txt
-rw-rw-r--. 1 smcuser smcuser   0 Jan 12 23:14 a.py
-rw-rw-r--. 1 smcuser smcuser   0 Jan 12 23:14 b.py
-rw-rw-r--. 1 smcuser smcuser   0 Jan 12 23:14 c.py
-rw-rw-r--. 1 smcuser smcuser   0 Jan 12 23:14 d.py
-rw-rw-r--. 1 smcuser smcuser   0 Jan 12 23:14 e.py
-rw-rw-r--. 1 smcuser smcuser 116 Jan 12 23:35 test.py
#!/url/bin/evn python
#
#
import os

test = [item for item in os.listdir(.‘) if item.endswith(.py)]

print(test)
执行结果
[smcuser@smc-postman-script test]$ python test.py 
[a.py‘, b.py‘, c.py‘, d.py‘, e.py‘, test.py‘]

 判断文件大小

#!/url/bin/evn python
#
#
import os
txt = [item for item in os.listdir(.) if item.endswith(.txt)]

sun_size = sum(os.path.getsize(os.path.join(/tmp/test,item)) for item in txt)

print(sun_size)

使用python分析Apache的访问日志

 

Apache日志示例
193.252.243.232 - - [29/Mar/2009:06:05:34+0200)” GET /index.php HTTP/1.1” 200 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )”””
193.252.243.232 - - [29/Mar/2009:06:05:34+0200)” GET /index.html HTTP/1.1” 200 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )”””
193.252.243.231 - - [29/Mar/2009:06:05:34+0200)” GET /index.php HTTP/1.1” 200 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )”””
193.252.243.230 - - [29/Mar/2009:06:05:34+0200)” GET /index.php HTTP/1.1” 200 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )”””
193.252.243.237 - - [29/Mar/2009:06:05:34+0200)” GET /index.php HTTP/1.1” 200 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )”””
193.252.243.237 - - [29/Mar/2009:06:05:34+0200)” GET /index.php HTTP/1.1” 200 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )”””
193.252.243.230 - - [29/Mar/2009:06:05:34+0200)” GET /index.html HTTP/1.1” 400 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )”””
193.252.243.232 - - [29/Mar/2009:06:05:34+0200)” GET /index.php HTTP/1.1” 503 8714 ”-” ”Mozilla/5 . 0 (compatible ; PJBot/3 . 0 ; +http : //crawl . pagesjaunes . fr/robot )”””

跟进IP获取网站的PV和UV(PV是网站的访问请求数,UV是网站的独立访客数)

#!/bin/usr/evn python

ips = []
with open(access.log) as f:
    for line in f:
        ips.append(line.split()[0])

print("pv is {0}".format(len(ips)))
print("uv is {0}".format(len(set(ips))))

统计网站中最热的资源,counter是dict的子类,对于普通的计数功能,Counter比字典更好用
#!/usr/bin/env python

from collections import Counter

c = Counter()

with open(access.log) as f:
    for line in f:
        c[line.split()[5]] += 1

print(c.most_common(10))

统计用户体验,如果http code 为4xx 5xx则视为访问出错,统计出错比例
#!/url/bin/env python
#
d = {}
with open(access.log) as f:
    for line in f:
        key = line.split()[7]
        d.setdefault(key,0)
        d[key] += 1
print(d)
sum_requests = 0
error_requests = 0

for key,val in d.iteritems():
    if int(key) >= 400:
        error_requests += val
    sum_requests += val

print(error rate: {0:.2f}%.format(error_requests * 100 / sum_requests))

 

1

1

1

1

1

1

python工作实战---对文本文件进行分析

标签:size   bsp   use   dir   体验   文件   std   apache   pos   

原文地址:https://www.cnblogs.com/weidongliu/p/12193624.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!