标签:
Python的set和其他语言类似, 是一个无序不重复元素集, 基本功能包括关系测试和消除重复元素. 集合对象还支持union(联合), intersection(交), difference(差)和sysmmetric difference(对称差集)等数学运算.
可以简单的使用l = list(set(l)) 去重,
#coding:utf-8#coding:utf-8l =[1,2,3,4,4,5,6,6]s =set(l)print sl = list(s)print lset([1,2,3,4,5,6])[1,2,3,4,5,6]fromkeys用法:dict.fromkeys(seq[, value])),value默认是None
说明:创建并返回一个新字典,以序列seq中元素做字典的键,val为字典所有键对应的初始值(默认为None)
{}.fromkeys(l).keys(),直接按照l内容,生成一个key为l中值,v为Notes的词典,然后取key的集合。 可以直接对l去重,且性能更好
#coding:utf-8l =[1,2,3,4,4,5,6,6]l2 ={}.fromkeys(l).keys()print l2[1,2,3,4,5,6]通过列表生成式创建2千万个对象,每个对象出现两次的list,然后进行去重
#coding:utf-8import timel1 =[ x for x in range(10000000)]l2 =[ x for x in range(10000000)]for x in l2: l1.append(x)print‘start‘start = time.clock()s =set(l1)l = list(s)end= time.clock()print‘end‘print("耗时: %.03f seconds"%(end-start))print len(l)startend耗时:0.990 seconds10000000#coding:utf-8import timel1 =[ x for x in range(10000000)]l2 =[ x for x in range(10000000)]for x in l2: l1.append(x)print‘start‘start = time.clock()l2 ={}.fromkeys(l1).keys()end= time.clock()print‘end‘print("耗时: %.03f seconds"%(end-start))print len(l2)startend耗时:1.246 seconds10000000验证结果反而是set方式更快,难道是数据特殊性导致?
调整为1000万个数字出现20次,共2亿数据去重
set耗时:
200000000startend耗时:10.679 seconds10000000200000000startend耗时:8.654 seconds10000000fromkeys耗时:
200000000startend耗时:9.293 seconds10000000200000000startend耗时:9.255 seconds10000000耗时差不多,随便用。。有时候甚至set更快,想IB用fromkeys。
标签:
原文地址:http://www.cnblogs.com/assd2001/p/5914967.html