码迷,mamicode.com
首页 > 编程语言 > 详细

Python—使用xml.sax解析xml文件

时间:2015-09-14 19:23:32      阅读:338      评论:0      收藏:0      [点我收藏+]

标签:

什么是sax?

SAX是一种基于事件驱动的API。

利用SAX解析XML文档牵涉到两个部分:解析器和事件处理器。

解析器负责读取XML文档,并向事件处理器发送事件,如元素开始跟元素结束事件;

而事件处理器则负责对事件作出相应,对传递的XML数据进行处理。

sax适于处理下面的问题:

  • 1、对大型文件进行处理;
  • 2、只需要文件的部分内容,或者只需从文件中得到特定信息;
  • 3、想建立自己的对象模型的时候。

在python中使用sax方式处理xml要先引入xml.sax中的parse函数,还有xml.sax.handler中的ContentHandler。

 

movies.xml:需要解析的xml文件,上一篇博客中使用dom解析的一样

<collection shelf="New Arrivals">
<movie title="Enemy Behind">
   <type>War, Thriller</type>
   <format>DVD</format>
   <year>2003</year>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Talk about a US-Japan war</description>
</movie>
<movie title="Transformers">
   <type>Anime, Science Fiction</type>
   <format>DVD</format>
   <year>1989</year>
   <rating>R</rating>
   <stars>8</stars>
   <description>A schientific fiction</description>
</movie>
<movie title="Trigun">
   <type>Anime, Action</type>
   <format>DVD</format>
   <episodes>4</episodes>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Vash the Stampede!</description>
</movie>
<movie title="Ishtar">
   <type>Comedy</type>
   <format>VHS</format>
   <rating>PG</rating>
   <stars>2</stars>
   <description>Viewable boredom</description>
</movie>
</collection>

 

xmltest.py:解析代码如下

# -*- coding:UTF-8 -*-

‘‘‘
Created on 2015年9月10日

@author: xiaowenhui
‘‘‘

import xml.sax

#第二种方法,sax解析 
class MovieHandler(xml.sax.ContentHandler):  #继承于xml.sax.ContentHandler类
        
    def __init__(self):
        self.CurrentData = ""
        self.type = ""
        self.format = ""
        self.year = ""
        self.episodes = ""
        self.rating = ""
        self.stars = ""
        self.description = ""
        self.title = ""

    # 元素开始事件处理
    def startElement(self, tag, attributes):
        self.CurrentData = tag
        if tag == "movie":
            print "*****Movie*****"
            self.title = attributes["title"]
            print "Title:", self.title

    # 内容事件处理 
    def characters(self, content):
        if self.CurrentData == "type":
            self.type = content  
        elif self.CurrentData == "format":
            self.format = content
        elif self.CurrentData == "year":
            self.year = content
        elif self.CurrentData == "episodes":
            self.episodes = content
        elif self.CurrentData == "rating":
            self.rating = content
        elif self.CurrentData == "stars":
            self.stars = content
        elif self.CurrentData == "description":
            self.description = content
            
    # 元素结束事件处理
    def endElement(self, tag):
        if self.CurrentData == "type":
            print "Type:", self.type
        elif self.CurrentData == "format":
            print "Format:", self.format
        elif self.CurrentData == "year":
            print "Year:", self.year
        elif self.CurrentData == "episodes":
            print "Episodes:", self.episodes
        elif self.CurrentData == "rating":
            print "Rating:", self.rating
        elif self.CurrentData == "stars":
            print "Stars:", self.stars
        elif self.CurrentData == "description":
            print "Description:", self.description
                           
  
# 创建一个 XMLReader
parser = xml.sax.make_parser()
# turn off namepsaces
parser.setFeature(xml.sax.handler.feature_namespaces, 0)

# 重写 ContextHandler
Handler = MovieHandler()
parser.setContentHandler( Handler )
   
parser.parse("movies.xml")

 

输出结果如下:

疑问:不知道为什么会多输出一个description,可能是sax解析的时候哪里写的不对,现在还没找到原因,我把

 elif self.CurrentData == "description":
 print "Description:", self.description

改成
 elif self.CurrentData == "description":
 print  self.description
后就没有输出“description”,只输出了self.description这个参数

*****Movie*****
Title: Enemy Behind
Type: War, Thriller
Format: DVD
Year: 2003
Rating: PG
Stars: 10
description: Talk about a US-Japan war
description: 

*****Movie*****
Title: Transformers
Type: Anime, Science Fiction
Format: DVD
Year: 1989
Rating: R
Stars: 8
description: A schientific fiction
description: 

*****Movie*****
Title: Trigun
Type: Anime, Action
Format: DVD
Episodes: 4
Rating: PG
Stars: 10
description: Vash the Stampede!
description: 

*****Movie*****
Title: Ishtar
Type: Comedy
Format: VHS
Rating: PG
Stars: 2
description: Viewable boredom
description: 

description: 

 

Python—使用xml.sax解析xml文件

标签:

原文地址:http://www.cnblogs.com/xiaowenhui/p/4807924.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!