一、集合
1、集合的定义
In [74]: s = {} In [74]: s = {} # 空大括号是空的字典 In [75]: type(s) Out[75]: dict In [77]: type(s) Out[77]: set In [78]: help(set) Help on class set in module builtins: class set(object) | set() -> new empty set object | set(iterable) -> new set object | | Build an unordered collection of unique elements. | | Methods defined here: In [80]: s = set([1, 2]) In [81]: s Out[81]: {1, 2} In [82]: s = set("xxj") In [83]: s Out[83]: {'j', 'x'} In [84]: s = {1, 2, 1, 3} In [85]: s Out[85]: {1, 2, 3}
集合是无序的,元素不能重复,元素要能被哈希(hash,不可变)
二、集合的操作
1、增
z## set.add() In [86]: s Out[86]: {1, 2, 3} In [87]: s.add("a") # 原地增加单个元素,元素要可哈希 In [88]: s Out[88]: {1, 2, 3, 'a'} In [89]: s.add(3) In [90]: s Out[90]: {1, 2, 3, 'a'} In [93]: s.add([1, 2]) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-93-2beaf0c16593> in <module>() ----> 1 s.add([1, 2]) TypeError: unhashable type: 'list' In [94]: help(s.add) In [95]: s.add((1, 2)) In [96]: s Out[96]: {(1, 2), 1, 2, 3, 'a'} ## set.update() # 原地增加可迭代对象的元素 In [99]: help(s.update) Help on built-in function update: update(...) method of builtins.set instance Update a set with the union of itself and others. In [127]: s = set() In [128]: s Out[128]: set() In [129]: type(s) Out[129]: set In [101]: s.update(10) ----------------------------------------------------------------------- TypeError Traceback (most recent call l <ipython-input-101-c184888ad9c5> in <module>() ----> 1 s.update(10) TypeError: 'int' object is not iterable In [131]: s.update(["a"]) In [132]: s Out[132]: {'a'} In [133]: s.update(["a"], ["b"]) In [134]: s Out[134]: {'a', 'b'} In [135]: s.update(["a"], ["b"], 1) ----------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-135-fc556b8d9726> in <module>() ----> 1 s.update(["a"], ["b"], 1) TypeError: 'int' object is not iterable In [136]: s.update(["a"], ["b"], "xj") In [137]: s Out[137]: {'a', 'b', 'j', 'x'} In [139]: s.update([["S", "B"]]) ----------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-139-da563f39a191> in <module>() ----> 1 s.update([["S", "B"]]) TypeError: unhashable type: 'list'
2、删
## set.remove() In [142]: s Out[142]: {'a', 'b', 'j', 'x'} In [143]: s.remove("a") In [144]: s Out[144]: {'b', 'j', 'x'} In [151]: s.remove("S") ----------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-151-332efdd48daa> in <module>() ----> 1 s.remove("S") KeyError: 'S' ## set.pop() In [153]: s = {1, 2, 3, 4} In [154]: s.pop() Out[154]: 1 In [155]: s Out[155]: {2, 3, 4} In [156]: s.pop(5) ----------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-156-23a1c03efc29> in <module>() ----> 1 s.pop(5) TypeError: pop() takes no arguments (1 given) In [157]: s.pop() Out[157]: 2 In [158]: s.pop() Out[158]: 3 In [159]: s.pop() Out[159]: 4 In [160]: s.pop() ----------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-160-e76f41daca5e> in <module>() ----> 1 s.pop() KeyError: 'pop from an empty set' ## set.discard() In [165]: help(set.discard) Help on method_descriptor: discard(...) Remove an element from a set if it is a member. If the element is not a member, do nothing. In [166]: s = {1, 2, 3} In [167]: s.discard(2) In [168]: s.discard(1, 3) ----------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-168-8702b734cbc4> in <module>() ----> 1 s.discard(1, 3) TypeError: discard() takes exactly one argument (2 given) In [169]: s.discard(2) # 元素不存在时,不会报错 In [170]: s Out[170]: {1, 3} In [32]: s.clear() In [33]: s Out[33]: set() In [47]: del(s) In [48]: s ----------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-48-f4d5d0c0671b> in <module>() ----> 1 s NameError: name 's' is not defined
小结:
remove 删除给定的元素,元素不存在时,抛出KeyError
discard 删除给定的元素,元素不存在时,什么也不做
pop 随机删除一个元素并返回,集合为空返回KeyError,
clear 清空集合
3、改
set不能修改单个元素
4、查找
集合不能通过索引,集合不是线性结构,没有索引
集合没有访问单个元素的方法
集合没有查找的方法
做成员运算(in和not in)的时候,set的效率远高于list(O(1)和O(n));
O(n)不一定小于O(1),还需要看数据规模
三、集合运算
1、交集
## set.intersection() In [1]: s1 = {1, 2, 3} In [2]: s2 = {2, 3, 4} In [3]: s1.intersection() Out[3]: {1, 2, 3} In [4]: s1.intersection(s2) # 返回交集;不会修改原set Out[4]: {2, 3} In [26]: s2.intersection(s1) Out[26]: {2, 3} In [5]: s1.intersection([2,3]) Out[5]: {2, 3} In [6]: help(set.intersection) In [7]: s1.intersection(2) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-7-94b820092aa3> in <module>() ----> 1 s1.intersection(2) TypeError: 'int' object is not iterable In [17]: s1.intersection_update(s2) # set.intersection的_update版本,修改原set,返回None In [18]: s1 Out[18]: {2, 3} In [19]: s2 Out[19]: {2, 3, 4} In [20]: s1 = {1, 2, 3} In [21]: s2 = {2, 3, 4} In [22]: s1 & s2 # set重载了按位与运算为求交集运算 Out[22]: {2, 3} In [23]: s1 Out[23]: {1, 2, 3} In [24]: s2 Out[24]: {2, 3, 4}
2、差集
In [27]: s1 Out[27]: {1, 2, 3} In [28]: s2 Out[28]: {2, 3, 4} In [29]: s1.difference(s2) Out[29]: {1} In [30]: s2.difference(s1) Out[30]: {4} In [31]: s1 Out[31]: {1, 2, 3} In [32]: s2 Out[32]: {2, 3, 4} In [33]: s1.difference_update(s2) In [34]: s1 Out[34]: {1} In [35]: s2 Out[35]: {2, 3, 4} In [38]: s1 Out[38]: {1, 2, 3} In [39]: s2 Out[39]: {2, 3, 4} In [40]: s1 - s2 # set重载了运算符- 执行差集计算,相当于s1.difference(s2) Out[40]: {1} In [41]: s2 - s1 Out[41]: {4} In [42]: s1 + s2 --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-42-1659087814e1> in <module>() ----> 1 s1 + s2 TypeError: unsupported operand type(s) for +: 'set' and 'set' In [50]: s1.symmetric_difference(s2) # 对称差集 Out[50]: {1, 4} In [51]: s1.symmetric_difference_update(s2) In [52]: s1 Out[52]: {1, 4} In [53]: s2 Out[53]: {2, 3, 4} In [55]: s1 # set重载了异或运算符,执行求对称差集运算 Out[55]: {1, 2, 3} In [56]: s2 Out[56]: {2, 3, 4} In [57]: s1 ^ s2 Out[57]: {1, 4}
3、并集
In [58]: s1 Out[58]: {1, 2, 3} In [59]: s2 Out[59]: {2, 3, 4} In [60]: s1.union(s2) # 那set的union有update版本吗?其实update就是union的update版本 Out[60]: {1, 2, 3, 4} In [61]: s1 | s2 # set重载了|运算符,执行求对称并集运算 Out[61]: {1, 2, 3, 4}
4、集合相关的判断
In [68]: s1 = {2, 3} In [69]: s2 = {1, 2, 3, 4} In [70]: s1.isdisjoint(s2) # 是否没有交集 Out[70]: False In [71]: s1.issubset(s2) # 是否是子集 Out[71]: True In [72]: s1.issuperset(s2) # 是否是父超集 Out[72]: False In [73]: s2.issuperset(s1) Out[73]: True In [74]: s1 = {"a", "b"} In [75]: s1.isdisjoint(s2) Out[75]: True
四、集合的应用和限制
set常用于去重和大规模数据时成员运算时较快
str、bytes、bytearray对元素有要求,必须是8位的int;0-255
集合的元素不能重复,必须可hash(可变的类型都不能hash)