字典和集合

dict类型是Python语言的基石,跟它有关的内置函数都在__builtins__.dict模块中。

Python 散列和加盐_元组

Python 散列和加盐_抽象基类_02

class dict(object):
    """
    dict() -> new empty dictionary
    dict(mapping) -> new dictionary initialized from a mapping object's
        (key, value) pairs
    dict(iterable) -> new dictionary initialized as if via:
        d = {}
        for k, v in iterable:
            d[k] = v
    dict(**kwargs) -> new dictionary initialized with the name=value pairs
        in the keyword argument list.  For example:  dict(one=1, two=2)
    """
    def clear(self): # real signature unknown; restored from __doc__
        """ D.clear() -> None.  Remove all items from D. """
        pass

    def copy(self): # real signature unknown; restored from __doc__
        """ D.copy() -> a shallow copy of D """
        pass

    @staticmethod # known case
    def fromkeys(*args, **kwargs): # real signature unknown
        """ Returns a new dict with keys from iterable and values equal to value. """
        pass

    def get(self, k, d=None): # real signature unknown; restored from __doc__
        """ D.get(k[,d]) -> D[k] if k in D, else d.  d defaults to None. """
        pass

    def items(self): # real signature unknown; restored from __doc__
        """ D.items() -> a set-like object providing a view on D's items """
        pass

    def keys(self): # real signature unknown; restored from __doc__
        """ D.keys() -> a set-like object providing a view on D's keys """
        pass

    def pop(self, k, d=None): # real signature unknown; restored from __doc__
        """
        D.pop(k[,d]) -> v, remove specified key and return the corresponding value.
        If key is not found, d is returned if given, otherwise KeyError is raised
        """
        pass

    def popitem(self): # real signature unknown; restored from __doc__
        """
        D.popitem() -> (k, v), remove and return some (key, value) pair as a
        2-tuple; but raise KeyError if D is empty.
        """
        pass

    def setdefault(self, k, d=None): # real signature unknown; restored from __doc__
        """ D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D """
        pass

    def update(self, E=None, **F): # known special case of dict.update
        """
        D.update([E, ]**F) -> None.  Update D from dict/iterable E and F.
        If E is present and has a .keys() method, then does:  for k in E: D[k] = E[k]
        If E is present and lacks a .keys() method, then does:  for k, v in E: D[k] = v
        In either case, this is followed by: for k in F:  D[k] = F[k]
        """
        pass

    def values(self): # real signature unknown; restored from __doc__
        """ D.values() -> an object providing a view on D's values """
        pass

    def __contains__(self, *args, **kwargs): # real signature unknown
        """ True if D has a key k, else False. """
        pass

    def __delitem__(self, *args, **kwargs): # real signature unknown
        """ Delete self[key]. """
        pass

    def __eq__(self, *args, **kwargs): # real signature unknown
        """ Return self==value. """
        pass

    def __getattribute__(self, *args, **kwargs): # real signature unknown
        """ Return getattr(self, name). """
        pass

    def __getitem__(self, y): # real signature unknown; restored from __doc__
        """ x.__getitem__(y) <==> x[y] """
        pass

    def __ge__(self, *args, **kwargs): # real signature unknown
        """ Return self>=value. """
        pass

    def __gt__(self, *args, **kwargs): # real signature unknown
        """ Return self>value. """
        pass

    def __init__(self, seq=None, **kwargs): # known special case of dict.__init__
        """
        dict() -> new empty dictionary
        dict(mapping) -> new dictionary initialized from a mapping object's
            (key, value) pairs
        dict(iterable) -> new dictionary initialized as if via:
            d = {}
            for k, v in iterable:
                d[k] = v
        dict(**kwargs) -> new dictionary initialized with the name=value pairs
            in the keyword argument list.  For example:  dict(one=1, two=2)
        # (copied from class doc)
        """
        pass

    def __iter__(self, *args, **kwargs): # real signature unknown
        """ Implement iter(self). """
        pass

    def __len__(self, *args, **kwargs): # real signature unknown
        """ Return len(self). """
        pass

    def __le__(self, *args, **kwargs): # real signature unknown
        """ Return self<=value. """
        pass

    def __lt__(self, *args, **kwargs): # real signature unknown
        """ Return self<value. """
        pass

    @staticmethod # known case of __new__
    def __new__(*args, **kwargs): # real signature unknown
        """ Create and return a new object.  See help(type) for accurate signature. """
        pass

    def __ne__(self, *args, **kwargs): # real signature unknown
        """ Return self!=value. """
        pass

    def __repr__(self, *args, **kwargs): # real signature unknown
        """ Return repr(self). """
        pass

    def __setitem__(self, *args, **kwargs): # real signature unknown
        """ Set self[key] to value. """
        pass

    def __sizeof__(self): # real signature unknown; restored from __doc__
        """ D.__sizeof__() -> size of D in memory, in bytes """
        pass

    __hash__ = None

__builtins__.dict

正是因为字典至关重要, Python对它的实现做了高度优化,而 散列表 则是字典类型性能出众的根本原因, set的实现也是依赖 散列表。

Python中list对象的存储结构采用的是线性表,因此其查询复杂度为O(n), 而dict对象的存储结构采用的是散列表(hash表),其在最优情况下查询复杂度为O(1)。 因此有时可以替换list优化代码,并实现类似算法。

 

Python的映射类型就是: dict  key=value

frozenset 不可变集合类型。

 

1. 列表推导式

2. 字典推导式

3. 集合推导式

l = [x for x in range(10)]    

case = {'a': 10, 'b': 34}
d = {b: a for a, b in case.items()} # 字典推导式,快速更换key和value

s = {x for x in range(10)}    # 集合推导式

print(l)
>>>
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

print(d)
>>>
{34: 'b', 10: 'a'}

print(s)
>>>
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
b = ((1, 11), (2, 22), (3, 33))
bb = {a:b for a,b in bbb}
print(bb)
>>>
{1: 11, 2: 22, 3: 33}

 

3.1 泛映射类型

collections内置模块中有Mapping, MutableMapping两个抽象基类, 他们的作用是为dict和其他类似的类型定义形式接口。然而非抽象基类一般不会直接继承这些抽象基类,他们会直接对dict或是collections.User.Dict进行扩展。这些抽象基类的主要作用是作为形式化的文档。

import collections

my_dict = {}
print(isinstance(my_dict, collections.Mapping))
print(isinstance(my_dict, collections.MutableMapping))

>>>
True
True

 

标准库里的所有映射类型都是利用dict来实现的, 因为他们有个共同限制,即只有 可散列 的数据类型才能作用这些映射里的键(值不需要可散列)。

什么是可散列的数据类型?

  如果一个对象是可散列的,那么在这个对象的生命周期中,它的散列值是不变的,而且这个对象需要实现 __hash__()方法。

  原子不可变数据类型(str, bytes, int)都是可散列类型。frozenset也是可散列的,因为根据其定义,frozenset里只能容纳可散列类型。 元组的话,只有当一个元组包含的所有元素都是可散列类型的情况下,它才是可散列的。

  一般来讲用户自定义的类型的对象都是可散列的,散列值就是他们的id()函数返回的值。

 

多种创建字典的方式:

a = dict(one=1, two=2, three=3)
b = dict(((1, 11), (2, 22), (3, 33)))
c = dict(zip([1,2,3], [4,5,6]))

print(a)
print(b)
print(c)

>>>
{'one': 1, 'two': 2, 'three': 3}
{1: 11, 2: 22, 3: 33}
{1: 4, 2: 5, 3: 6}

 

用setdefault处理找不到的键

self.registered_admins.setdefault(app_label, {}).update({model._meta.model_name: admin_class})
strings = ('puppy', 'kitten', 'puppy', 'puppy', 'weasel', 'puppy', 'kitten', 'puppy')
counts = {}
for kw in strings:
    counts[kw] = counts.setdefault(kw, 0) + 1
print(counts)
>>>
{'puppy': 5, 'kitten': 2, 'weasel': 1
my_dict.setdefault(key, []).append(new_value)
相当于
if key not in my_dict:
    my_dict[key] = []
my_dict[key].append(new_value)

 

3.4 映射的弹性键查询

有时候为了方便,就算某个键在映射(dict)里不存在,我们也希望在通过这个键读取值的时候能够得到一个默认值。 有两种方法: defaultdict 或者 自定义dict子类,在子类中实现__missing__方法。

示例,统计每个单词出现的频率:

strings = ('puppy', 'kitten', 'puppy', 'puppy',
           'weasel', 'puppy', 'kitten', 'puppy')
counts = {}
for kw in strings:
    counts[kw] += 1

print(counts)
>>>
KeyError: 'puppy'
import collections

strings = ('puppy', 'kitten', 'puppy', 'puppy', 'weasel', 'puppy', 'kitten', 'puppy')
counts = collections.defaultdict(int)    # 申明defaultdict为int类型
for kw in strings:
    counts[kw] += 1

print(counts)
>>>
defaultdict(<class 'int'>, {'puppy': 5, 'weasel': 1, 'kitten': 2})
import collections   # 使用collections.Counter计算更简单

strings = ('puppy', 'kitten', 'puppy', 'puppy', 'weasel', 'puppy', 'kitten', 'puppy')
print(collections.Counter(strings))

defaultdict类是如何实现的

通过上面的内容,想必大家已经了解了defaultdict类的用法,那么在defaultdict类中又是如何来实现默认值的功能呢?这其中的关键是使用了看__missing__()这个方法:

def __missing__(self, key): # real signature unknown; restored from __doc__
        """
        __missing__(key) # Called by __getitem__ for missing key; pseudo-code:
          if self.default_factory is None: raise KeyError((key,))
          self[key] = value = self.default_factory()
          return value
        """
        pass

通过查看__getitem__()方法访问一个不存在的键时会调用__missing__()方法获取默认值,并将该键添加到字典中去。 __missing__()方法只会被__getitem__()调用。

 

3.7 不可变映射类型

从python3.3开始,types模块中引入了一个封装类名叫MappingProxyType. 如果给这个类一个映射,它会返回一个只读的映射视图。虽然是个只读视图,但是它是动态的。这意味着如果对原映射做出了改动,我们通过这个视图可以观察到,但是无法通过这个视图对原映射做出修改。

from types import MappingProxyType

a = {1: "AA"}
a_proxy = MappingProxyType(a)
print(a_proxy)
>>>
{1: 'AA'}

a_proxy[1] = "BB"
TypeError: 'mappingproxy' object does not support item assignment

 

3.8 集合论

集合常用于去重 和 关系比较。

集合中的元素必须是可散列的,set类型本身是不可散列的,但是 frozenset可以。

如果是空集,那么必须写成 set() 形式,否则{} 会被当成字典。

3.9 dict的实现及其导致的结果

1、键必须是可散列的

2、字典在内存上开销巨大

  由于字典使用了散列表,而且散列表又必须是稀疏的(散列表其实是一个稀疏数组,总是有空白元素的数组称为稀疏数组),这导致它在空间上的效率低下,如果你需要存放数量巨大的记录,那么放在由元组或是有名元组构成的列表中会是比较好的选择。

3、键查询很快

  dict的实现就是典型的空间换时间。