字典和集合
dict类型是Python语言的基石,跟它有关的内置函数都在__builtins__.dict模块中。
class dict(object):
"""
dict() -> new empty dictionary
dict(mapping) -> new dictionary initialized from a mapping object's
(key, value) pairs
dict(iterable) -> new dictionary initialized as if via:
d = {}
for k, v in iterable:
d[k] = v
dict(**kwargs) -> new dictionary initialized with the name=value pairs
in the keyword argument list. For example: dict(one=1, two=2)
"""
def clear(self): # real signature unknown; restored from __doc__
""" D.clear() -> None. Remove all items from D. """
pass
def copy(self): # real signature unknown; restored from __doc__
""" D.copy() -> a shallow copy of D """
pass
@staticmethod # known case
def fromkeys(*args, **kwargs): # real signature unknown
""" Returns a new dict with keys from iterable and values equal to value. """
pass
def get(self, k, d=None): # real signature unknown; restored from __doc__
""" D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None. """
pass
def items(self): # real signature unknown; restored from __doc__
""" D.items() -> a set-like object providing a view on D's items """
pass
def keys(self): # real signature unknown; restored from __doc__
""" D.keys() -> a set-like object providing a view on D's keys """
pass
def pop(self, k, d=None): # real signature unknown; restored from __doc__
"""
D.pop(k[,d]) -> v, remove specified key and return the corresponding value.
If key is not found, d is returned if given, otherwise KeyError is raised
"""
pass
def popitem(self): # real signature unknown; restored from __doc__
"""
D.popitem() -> (k, v), remove and return some (key, value) pair as a
2-tuple; but raise KeyError if D is empty.
"""
pass
def setdefault(self, k, d=None): # real signature unknown; restored from __doc__
""" D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D """
pass
def update(self, E=None, **F): # known special case of dict.update
"""
D.update([E, ]**F) -> None. Update D from dict/iterable E and F.
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k]
If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v
In either case, this is followed by: for k in F: D[k] = F[k]
"""
pass
def values(self): # real signature unknown; restored from __doc__
""" D.values() -> an object providing a view on D's values """
pass
def __contains__(self, *args, **kwargs): # real signature unknown
""" True if D has a key k, else False. """
pass
def __delitem__(self, *args, **kwargs): # real signature unknown
""" Delete self[key]. """
pass
def __eq__(self, *args, **kwargs): # real signature unknown
""" Return self==value. """
pass
def __getattribute__(self, *args, **kwargs): # real signature unknown
""" Return getattr(self, name). """
pass
def __getitem__(self, y): # real signature unknown; restored from __doc__
""" x.__getitem__(y) <==> x[y] """
pass
def __ge__(self, *args, **kwargs): # real signature unknown
""" Return self>=value. """
pass
def __gt__(self, *args, **kwargs): # real signature unknown
""" Return self>value. """
pass
def __init__(self, seq=None, **kwargs): # known special case of dict.__init__
"""
dict() -> new empty dictionary
dict(mapping) -> new dictionary initialized from a mapping object's
(key, value) pairs
dict(iterable) -> new dictionary initialized as if via:
d = {}
for k, v in iterable:
d[k] = v
dict(**kwargs) -> new dictionary initialized with the name=value pairs
in the keyword argument list. For example: dict(one=1, two=2)
# (copied from class doc)
"""
pass
def __iter__(self, *args, **kwargs): # real signature unknown
""" Implement iter(self). """
pass
def __len__(self, *args, **kwargs): # real signature unknown
""" Return len(self). """
pass
def __le__(self, *args, **kwargs): # real signature unknown
""" Return self<=value. """
pass
def __lt__(self, *args, **kwargs): # real signature unknown
""" Return self<value. """
pass
@staticmethod # known case of __new__
def __new__(*args, **kwargs): # real signature unknown
""" Create and return a new object. See help(type) for accurate signature. """
pass
def __ne__(self, *args, **kwargs): # real signature unknown
""" Return self!=value. """
pass
def __repr__(self, *args, **kwargs): # real signature unknown
""" Return repr(self). """
pass
def __setitem__(self, *args, **kwargs): # real signature unknown
""" Set self[key] to value. """
pass
def __sizeof__(self): # real signature unknown; restored from __doc__
""" D.__sizeof__() -> size of D in memory, in bytes """
pass
__hash__ = None
__builtins__.dict
正是因为字典至关重要, Python对它的实现做了高度优化,而 散列表 则是字典类型性能出众的根本原因, set的实现也是依赖 散列表。
Python中list对象的存储结构采用的是线性表,因此其查询复杂度为O(n), 而dict对象的存储结构采用的是散列表(hash表),其在最优情况下查询复杂度为O(1)。 因此有时可以替换list优化代码,并实现类似算法。
Python的映射类型就是: dict key=value
frozenset 不可变集合类型。
1. 列表推导式
2. 字典推导式
3. 集合推导式
l = [x for x in range(10)]
case = {'a': 10, 'b': 34}
d = {b: a for a, b in case.items()} # 字典推导式,快速更换key和value
s = {x for x in range(10)} # 集合推导式
print(l)
>>>
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
print(d)
>>>
{34: 'b', 10: 'a'}
print(s)
>>>
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
b = ((1, 11), (2, 22), (3, 33))
bb = {a:b for a,b in bbb}
print(bb)
>>>
{1: 11, 2: 22, 3: 33}
3.1 泛映射类型
collections内置模块中有Mapping, MutableMapping两个抽象基类, 他们的作用是为dict和其他类似的类型定义形式接口。然而非抽象基类一般不会直接继承这些抽象基类,他们会直接对dict或是collections.User.Dict进行扩展。这些抽象基类的主要作用是作为形式化的文档。
import collections
my_dict = {}
print(isinstance(my_dict, collections.Mapping))
print(isinstance(my_dict, collections.MutableMapping))
>>>
True
True
标准库里的所有映射类型都是利用dict来实现的, 因为他们有个共同限制,即只有 可散列 的数据类型才能作用这些映射里的键(值不需要可散列)。
什么是可散列的数据类型?
如果一个对象是可散列的,那么在这个对象的生命周期中,它的散列值是不变的,而且这个对象需要实现 __hash__()方法。
原子不可变数据类型(str, bytes, int)都是可散列类型。frozenset也是可散列的,因为根据其定义,frozenset里只能容纳可散列类型。 元组的话,只有当一个元组包含的所有元素都是可散列类型的情况下,它才是可散列的。
一般来讲用户自定义的类型的对象都是可散列的,散列值就是他们的id()函数返回的值。
多种创建字典的方式:
a = dict(one=1, two=2, three=3)
b = dict(((1, 11), (2, 22), (3, 33)))
c = dict(zip([1,2,3], [4,5,6]))
print(a)
print(b)
print(c)
>>>
{'one': 1, 'two': 2, 'three': 3}
{1: 11, 2: 22, 3: 33}
{1: 4, 2: 5, 3: 6}
用setdefault处理找不到的键
self.registered_admins.setdefault(app_label, {}).update({model._meta.model_name: admin_class})
strings = ('puppy', 'kitten', 'puppy', 'puppy', 'weasel', 'puppy', 'kitten', 'puppy')
counts = {}
for kw in strings:
counts[kw] = counts.setdefault(kw, 0) + 1
print(counts)
>>>
{'puppy': 5, 'kitten': 2, 'weasel': 1
my_dict.setdefault(key, []).append(new_value)
相当于
if key not in my_dict:
my_dict[key] = []
my_dict[key].append(new_value)
3.4 映射的弹性键查询
有时候为了方便,就算某个键在映射(dict)里不存在,我们也希望在通过这个键读取值的时候能够得到一个默认值。 有两种方法: defaultdict 或者 自定义dict子类,在子类中实现__missing__方法。
示例,统计每个单词出现的频率:
strings = ('puppy', 'kitten', 'puppy', 'puppy',
'weasel', 'puppy', 'kitten', 'puppy')
counts = {}
for kw in strings:
counts[kw] += 1
print(counts)
>>>
KeyError: 'puppy'
import collections
strings = ('puppy', 'kitten', 'puppy', 'puppy', 'weasel', 'puppy', 'kitten', 'puppy')
counts = collections.defaultdict(int) # 申明defaultdict为int类型
for kw in strings:
counts[kw] += 1
print(counts)
>>>
defaultdict(<class 'int'>, {'puppy': 5, 'weasel': 1, 'kitten': 2})
import collections # 使用collections.Counter计算更简单
strings = ('puppy', 'kitten', 'puppy', 'puppy', 'weasel', 'puppy', 'kitten', 'puppy')
print(collections.Counter(strings))
defaultdict类是如何实现的
通过上面的内容,想必大家已经了解了defaultdict类的用法,那么在defaultdict类中又是如何来实现默认值的功能呢?这其中的关键是使用了看__missing__()
这个方法:
def __missing__(self, key): # real signature unknown; restored from __doc__
"""
__missing__(key) # Called by __getitem__ for missing key; pseudo-code:
if self.default_factory is None: raise KeyError((key,))
self[key] = value = self.default_factory()
return value
"""
pass
通过查看__getitem__()
方法访问一个不存在的键时会调用__missing__()
方法获取默认值,并将该键添加到字典中去。 __missing__()
方法只会被__getitem__()
调用。
3.7 不可变映射类型
从python3.3开始,types模块中引入了一个封装类名叫MappingProxyType. 如果给这个类一个映射,它会返回一个只读的映射视图。虽然是个只读视图,但是它是动态的。这意味着如果对原映射做出了改动,我们通过这个视图可以观察到,但是无法通过这个视图对原映射做出修改。
from types import MappingProxyType
a = {1: "AA"}
a_proxy = MappingProxyType(a)
print(a_proxy)
>>>
{1: 'AA'}
a_proxy[1] = "BB"
TypeError: 'mappingproxy' object does not support item assignment
3.8 集合论
集合常用于去重 和 关系比较。
集合中的元素必须是可散列的,set类型本身是不可散列的,但是 frozenset可以。
如果是空集,那么必须写成 set() 形式,否则{} 会被当成字典。
3.9 dict的实现及其导致的结果
1、键必须是可散列的
2、字典在内存上开销巨大
由于字典使用了散列表,而且散列表又必须是稀疏的(散列表其实是一个稀疏数组,总是有空白元素的数组称为稀疏数组),这导致它在空间上的效率低下,如果你需要存放数量巨大的记录,那么放在由元组或是有名元组构成的列表中会是比较好的选择。
3、键查询很快
dict的实现就是典型的空间换时间。