python 将 vector 写到hdfs python vector变量

转载

mob64ca140b466e 2024-06-04 14:05:30

文章标签 2d 迭代字符串 文章分类 Python 后端开发

对象表示形式

每门面向对象的语言至少都有一种获取对象的字符串表示形式的标准方式。Python 提供了两种方式

repr()

　　以便于开发者理解的方式返回对象字符串表示形式

str()

以便于用户理解的方式返回对象的字符串表示形式。

正如你所知，我们要实现 __repr__ 和 __str__ 特殊方法，为 repr()和 str() 提供支持。

再谈向量类

为了说明用于生成对象表示形式的众多方法，我们将使用一个Vector2d 类。这一节和接下来的几节会不断实现这个类。我们期望 Vector2d 实例具有的基本行为如下所示。

Vector2d 实例有多种表示形式

>>> v1 = Vector2d(3, 4)
>>> print(v1.x, v1.y) 　　　　　　　　　　　　　#Vector2d实例的分量可以直接通过属性访问
3.0 4.0
>>> x, y = v1 　　　　　　　　　　　　　　　　　 #Vector2d实例可以拆包成变量元祖
>>> x, y
(3.0, 4.0)
>>> v1 　　　　　　　　　　　　　　　　　　　　　 #repr函数调用Vector2d实例，得到的结果类似于构建实例的源码
Vector2d(3.0, 4.0)
>>> v1_clone = eval（repr(v1)) 　　　　　　　 #这里使用eval函数，表明repr函数调用Vector2d实例得到的是对构造方法的准确表述
>>> v1 == v1_clone 　　　　　　　　　　　　　　#Vector2d实例支持使用==比较；这样便于测试
True
>>> print(v1) 　　　　　　　　　　　　　　　　　#print函数会调用str函数，对Vector2d来说，输出的是一个有序对
(3.0, 4.0)
>>> octets = bytes(v1) 　　　　　　　　　　　 #bytes函数会调用__bytes__方法，生成实例的二进制表示形式
>>> octets
b'd\\x00\\x00\\x00\\x00\\x00\\x00\\x08@\\x00\\x00\\x00\\x00\\x00\\x00\\x10@'
>>> abs(v1) 　　　　　　　　　　　　　　　　　　#abs函数会调用__abs__方法，返回Vector2d实例的模
5.0
>>> bool(v1), bool(Vector2d(0, 0)) 　　　　#bool函数会调用__bool__方法，如果Vector2d实例的模为零，则返回False,否则返回True

vector2d_v0.py实现的方式

1 from array import array
 2 import math
 3 
 4 
 5 class Vector2d:
 6     typecode = 'd'                                          #类属性
 7 
 8     def __init__(self, x, y):                               #构造函数，实例化接收两个参数,x和y，转成float类型
 9         self.x = float(x)
10         self.y = float(y)
11 
12     def __iter__(self):                                     #支持迭代，也就是支持外面的拆包操作 例如，x, y = my_vector
13         return (i for i in (self.x, self.y))
14 
15     def __repr__(self):                                     #__repr__ 方法使用 {!r} 获取各个分量的表示形式，然后插值，
16         class_name = type(self).__name__                    # 构成一个字符串；因为 Vector2d 实例是可迭代的对象，所以
17         return '{}({!r}, {!r})'.format(class_name, *self)   # *self 会把x 和 y 分量提供给 format 函数
18 
19     def __str__(self):                                      #从可迭代的 Vector2d 实例中可以轻松地得到一个元组，显示为一个有序对
20         return str(tuple(self))
21 
22     def __bytes__(self):
23         return (bytes([ord(self.typecode)])+                #为了生成字节序列，我们把 typecode 转换成字节序列
24                 bytes(array(self.typecode, self)))          #迭代 Vector2d 实例，得到一个数组，再把数组转换成字节序列
25 
26     def __eq__(self, other):                                #为了快速比较所有分量，在操作数中构建元组
27         return tuple(self) == tuple(other)
28 
29     def __abs__(self):                                      #模是 x 和 y 分量构成的直角三角形的斜边长
30         return math.hypot(self.x, self.y)
31 
32     def __bool__(self):                                     #__bool__ 方法使用 abs(self) 计算模，然后把结果转换成布尔值，因此，0.0 是 False，非零值是 True。
33         return bool(abs(self))

备选构造方法

我们可以把 Vector2d 实例转换成字节序列了；同理，也应该能从字节序列转换成 Vector2d 实例。使用之前我们用过得array.array 有个类方法 .frombytes。

🌰 只需要在我们刚创建的vector2d_v0.py中添加一个类方法即可

1   @classmethod                                              #类方法使用 classmethod 装饰器修饰
2     def frombytes(cls, octets):                             #不用传入 self 参数；相反，要通过 cls 传入类本身
3         typecode = chr(octets[0])                           #从第一个字节中读取 typecode
4         memv = memoryview(octets[1:]).cast(typecode)        #使用传入的 octets 字节序列创建一个 memoryview，然后使用typecode 转换。
5         return cls(*memv)                                   #拆包转换后的 memoryview，得到构造方法所需的一对参数

classmethod与staticmethod

先来看 classmethod。下面的🌰 展示了它的用法：定义操作类，而不是操作实例的方法。classmethod 改变了调用方法的方式，因此类方法的第一个参数是类本身，而不是实例。classmethod 最常见的用途是定义备选构造方法，例如上面 🌰 中的 frombytes。注意，frombytes的最后一行使用 cls 参数构建了一个新实例，即 cls(*memv)。按照约定，类方法的第一个参数名为 cls（但是 Python 不介意具体怎么命名）。

　　staticmethod 装饰器也会改变方法的调用方式，但是第一个参数不是特殊的值。其实，静态方法就是普通的函数，只是碰巧在类的定义体中，而不是在模块层定义。示例对classmethod 和staticmethod 的行为做了对比。

🌰 比较 classmethod 和 staticmethod 的行为

1 class Demo:
 2 
 3     @classmethod
 4     def klassmeth(*args):
 5         return args         #返回klassmeth所用的参数
 6 
 7     @staticmethod
 8     def statmeth(*args):
 9         return args         #statmeth的所有参数
10 
11 print(Demo.klassmeth())     #不管怎样调用 Demo.klassmeth，它的第一个参数始终是 Demo 类
12 print(Demo.klassmeth('spam'))
13 print('-'*40)
14 print(Demo.statmeth())      #Demo.statmeth 的行为与普通的函数相似
15 print(Demo.statmeth('spam'))

以上代码执行的结果为：

(<class '__main__.Demo'>,)
(<class '__main__.Demo'>, 'spam')
----------------------------------------
()
('spam',)

格式化显示

　　内置的 format() 函数和 str.format() 方法把各个类型的格式化方式委托给相应的 .__format__(format_spec) 方法。format_spec 是格式说明符，它是：

format(my_obj, format_spec)的第二个参数，或者
str.format()方法的字符串，{}里代替字段中冒号后面的部分

🌰 如下

>>> brl = 1/2.43
>>> brl
0.4115226337448559
>>> format(br1, '0.4f')　　　　　　　　　　　　　　　　　　　　　 #第一个参数为需要格式化的字符，第二个是格式话字符串的像是,0.4f保留小数点后4位，f是float类型
'0.4115'
>>> '1 BRL = {rate:0.2f} USD'.format(rate=brl)#{rate:02.f} #{}括号中:前面的rate是个命名参数，需要在后面的format里面传递给需要替换的字符，0.2f是保留小数点后两位
'1 BRL = 0.41 USD'

　　格式规范微语言为一些内置类型提供了专用的表示代码。比如，b 和 x分别表示二进制和十六进制的 int 类型，f 表示小数形式的 float 类型，而 % 表示百分数形式：

>>> format(42, 'b')
'101010'
>>> format(2/3, '.1%')
'66.7%'

　　格式规范微语言是可扩展的，因为各个类可以自行决定如何解释format_spec 参数。例如， datetime 模块中的类，它们的__format__ 方法使用的格式代码与 strftime() 函数一样。下面是内置的 format() 函数和 str.format() 方法的几个示例：

>>> from datetime import datetime
>>> now = datetime.now()
>>> now
datetime.datetime(2017, 8, 21, 14, 33, 46, 527811)
>>> format(now, '%H:%M:%S')
'14:33:46'
>>> "It's now {:%I:%M %p}".format(now)
"It's now 02:33 PM"

　　如果类没有定义 __format__ 方法，从 object 继承的方法会返回str(my_object)。我们为 Vector2d 类定义了 __str__ 方法，因此可以这样做：

>>> v1 = Vector2d(3, 4)
>>> format(v1)
'(3.0, 4.0)'

然而，如果传入格式说明符，object.__format__ 方法会抛出TypeError：

>>> format(v1, '.3f')
Traceback (most recent call last):
...
TypeError: non-empty format string passed to object.__format__

　　我们将实现自己的微语言来解决这个问题。首先，假设用户提供的格式说明符是用于格式化向量中各个浮点数分量的。我们想达到的效果是：

>>> v1 = Vector2d(3, 4)
>>> format(v1)
'(3.0, 4.0)'
>>> format(v1, '.2f')
'(3.00, 4.00)'
>>> format(v1, '.3e')
'(3.000e+00, 4.000e+00)'

🌰 Vector2d.__format__ 方法，第1版，实现这种输出的 __format__ 方法

1     def __format__(self, format_spec=''):
2         components = (format(c, format_spec) for c in self) #使用内置的 format 函数把 format_spec 应用到向量的各个分量上，构建一个可迭代的格式化字符串
3         return '({}, {})'.format(*components)               #把格式化字符串代入公式 '(x, y)' 中

　　下面要在微语言中添加一个自定义的格式代码：如果格式说明符以 'p'结尾，那么在极坐标中显示向量，即 <r, θ >，其中 r 是模，θ（西塔）是弧度；其他部分（'p' 之前的部分）像往常那样解释。

　　对极坐标来说，我们已经定义了计算模的 __abs__ 方法，因此还要定义一个简单的 angle 方法，使用 math.atan2() 函数计算角度。angle方法的代码如下：

def angle(self):                                        #计算极坐标
        return math.atan2(self.y, self.x)

🌰 Vector2d.__format__ 方法，第 2 版，现在能计算极坐标了

1     def __format__(self, format_spec=''):
 2         if format_spec.endswith('p'):                       #如果format_spec是格式最后一位是以p结尾，代表我们要计算极坐标
 3             format_spec = format_spec[:-1]                  #从format_spec中删除 'p' 后缀
 4             coords = (abs(self), self.angle())              #构建一个元组，表示极坐标：(magnitude, angle)
 5             outer_fmt = '<{}, {}>'                          #把外层格式设为一对尖括号
 6         else:                                               #如果不以 'p' 结尾，使用 self 的 x 和 y 分量构建直角坐标
 7             coords = self
 8             outer_fmt = '({}, {})'                          #把外层格式设为一对圆括号
 9         components = (format(c, format_spec) for c in coords)#使用各个分量生成可迭代的对象，构成格式化字符串
10         return outer_fmt.format(*components)                #把格式化字符串代入外层格式

上面代码执行的结果为：

>>> format(Vector2d(1, 1), 'p')
'<1.4142135623730951, 0.7853981633974483>'
>>> format(Vector2d(1, 1), '.3ep')
'<1.414e+00, 7.854e-01>'
>>> format(Vector2d(1, 1), '0.5fp')
'<1.41421, 0.78540>'

可散列的Vector2d

　　按照定义，目前 Vector2d 实例是不可散列的，因此不能放入集合（set）中：

>>> v1 = Vector2d(3, 4)
>>> hash(v1)
Traceback (most recent call last):
...
TypeError: unhashable type: 'Vector2d'
>>> set([v1])
Traceback (most recent call last):
...
TypeError: unhashable type: 'Vector2d'

Vector2d 实例变成可散列的，必须使用 __hash__ 方法（还需要 __eq__

目前，我们可以为分量赋新值，如 v1.x = 7，Vector2d 类的代码并不阻止这么做。我们想要的行为是这样的：

>>> v1.x, v1.y
(3.0, 4.0)
>>> v1.x = 7
Traceback (most recent call last):
...
AttributeError: can't set attribute

为此，我们要把 x 和 y 分量设为只读特性

1 class Vector2d:
 2     typecode = 'd'                                         #类属性
 3 
 4     def __init__(self, x, y):                              #构造函数，实例化接收两个参数,x和y，转成float类型
 5         self.__x = float(x)                                #使用两个前导线，把属相编程私有
 6         self.__y = float(y)
 7 
 8     @property                                              #@property 装饰器把读值方法标记为特性
 9     def x(self):                                           #读值方法与公开属性同名，都是 x
10         return self.__x                                    #直接返回 self.__x
11 
12     @property
13     def y(self):
14         return self.__y
15 
16     def __iter__(self):                                    #支持迭代，也就是支持外面的拆包操作 例如，x, y = my_vector
17         return (i for i in (self.x, self.y))

实现__hash__方法

def __hash__(self):
        return hash(self.x) ^ hash(self.y)                   #x和y的值做异或

添加 __hash__ 方法之后，向量变成可散列的了：

>>> v1 = Vector2d(3, 4)
>>> v2 = Vector2d(3.1, 4.2)
>>> hash(v1), hash(v2)
(7, 384307168202284039)
>>> set([v1, v2])
{Vector2d(3.1, 4.2), Vector2d(3.0, 4.0)}

vector2d_v3.py：完整版

1 from array import array
 2 import math
 3 
 4 
 5 class Vector2d:
 6     typecode = 'd'
 7 
 8     def __init__(self, x, y):
 9         self.__x = float(x)
10         self.__y = float(y)
11 
12     @property
13     def x(self):
14         return self.__x
15 
16     @property
17     def y(self):
18         return self.__y
19 
20     def __iter__(self):
21         return (i for i in (self.x, self.y))
22 
23     def __repr__(self):
24         class_name = type(self).__name__
25         return '{}({!r},{!r})'.format(class_name, *self)
26 
27     def __str__(self):
28         return str(tuple(self))
29 
30     def __bytes__(self):
31         return (bytes([ord(self.typecode)])+
32                 bytes(array(self.typecode, self)))
33 
34     def __eq__(self, other):
35         return tuple(self) == tuple(other)
36 
37     def __hash__(self):
38         return hash(self.x) ^ hash(self.y)
39 
40     def __abs__(self):
41         return math.hypot(self.x, self.y)
42 
43     def __bool__(self):
44         return bool(abs(self))
45 
46     def angle(self):
47         return math.atan2(self.y, self.x)
48 
49     def __format__(self, fmt_spec):
50         if fmt_spec.endswith('p'):
51             fmt_spec = fmt_spec[:-1]
52             coords = (abs(self), self.angle())
53             outer_fmt = '<{}, {}>'
54         else:
55             coords = self
56             outer_fmt = '({}, {})'
57         components = (format(c, fmt_spec) for c in coords)
58         return outer_fmt.format(*components)
59 
60     @classmethod
61     def frombytes(cls, octets):
62         typecode = chr(octets[0])
63         memv = memoryview(octets[1:]).cast(typecode)
64         return cls(*memv)

以上代码的测试结果如下：

"""
A two-dimensional vector class
>>> v1 = Vector2d(3, 4)
>>> print(v1.x, v1.y)
3.0 4.0
>>> x, y = v1
>>> x, y
(3.0, 4.0)
>>> v1
Vector2d(3.0, 4.0)
>>> v1_clone = eval（repr(v1))
>>> v1 == v1_clone
True
>>> print(v1)
(3.0, 4.0)
>>> octets = bytes(v1)
>>> octets
b'd\\x00\\x00\\x00\\x00\\x00\\x00\\x08@\\x00\\x00\\x00\\x00\\x00\\x00\\x10@'
>>> abs(v1)
5.0
>>> bool(v1), bool(Vector2d(0, 0))
(True, False)
Test of ``.frombytes()`` class method:
>>> v1_clone = Vector2d.frombytes(bytes(v1))
>>> v1_clone
Vector2d(3.0, 4.0)
>>> v1 == v1_clone
True
Tests of ``format()`` with Cartesian coordinates:
>>> format(v1)
'(3.0, 4.0)'
>>> format(v1, '.2f')
'(3.00, 4.00)'
>>> format(v1, '.3e')
'(3.000e+00, 4.000e+00)'
Tests of the ``angle`` method::
>>> Vector2d(0, 0).angle()
0.0
>>> Vector2d(1, 0).angle()
0.0
>>> epsilon = 10**-8
>>> abs(Vector2d(0, 1).angle() - math.pi/2) < epsilon
True
>>> abs(Vector2d(1, 1).angle() - math.pi/4) < epsilon
True
Tests of ``format()`` with polar coordinates:
>>> format(Vector2d(1, 1), 'p') # doctest:+ELLIPSIS
'<1.414213..., 0.785398...>'
>>> format(Vector2d(1, 1), '.3ep')
'<1.414e+00, 7.854e-01>'
>>> format(Vector2d(1, 1), '0.5fp')
'<1.41421, 0.78540>'
Tests of `x` and `y` read-only properties:
>>> v1.x, v1.y
(3.0, 4.0)
>>> v1.x = 123
Traceback (most recent call last):
...
AttributeError: can't set attribute
Tests of hashing:
>>> v1 = Vector2d(3, 4)
>>> v2 = Vector2d(3.1, 4.2)
>>> hash(v1), hash(v2)
(7, 384307168202284039)
>>> len(set([v1, v2]))
2
"""

Python的私有属性和“受保护的”属性

举个例子。有人编写了一个名为 Dog 的类，这个类的内部用到了 mood实例属性，但是没有将其开放。现在，你创建了 Dog 类的子类：Beagle。如果你在毫不知情的情况下又创建了名为 mood 的实例属性，那么在继承的方法中就会把 Dog 类的 mood 属性覆盖掉。这是个难以调试的问题。

为了避免这种情况，如果以 __mood 的形式（两个前导下划线，尾部没有或最多有一个下划线）命名实例属性，Python 会把属性名存入实例的__dict__ 属性中，而且会在前面加上一个下划线和类名。因此，对Dog 类来说，__mood 会变成 _Dog__mood；对 Beagle 类来说，会变成_Beagle__mood。这个语言特性叫名称改写（name mangling）。

🌰 私有属性的名称会被“改写”，在前面加上下划线和类名

>>> v1 = Vector2d(3, 4)
>>> v1.__dict__
{'_Vector2d__y': 4.0, '_Vector2d__x': 3.0}
>>> v1._Vector2d__x
3.0

Python 解释器不会对使用单个下划线的属性名做特殊处理，不过这是很多 Python 程序员严格遵守的约定，他们不会在类外部访问这种属性。遵守使用一个下划线标记对象的私有属性很容易，就像遵守使用全大写字母编写常量那样容易。

使用 __slots__ 类属性节省空间

　　默认情况下，Python 在各个实例中名为 __dict__ 的字典里存储实例属性。为了使用底层的散列表提升访问速度，字典会消耗大量内存。如果要处理数百万个属性不多的实例，通过 __slots__类属性，能节省大量内存，方法是让解释器在元组中存储实例属性，而不用字典。

注意：

Python 只会使用各个类中定义的 __slots__ 属性。

　　定义 __slots__ 的方式是，创建一个类属性，使用 __slots__ 这个名字，并把它的值设为一个字符串构成的可迭代对象，其中各个元素表示各个实例属性。我喜欢使用元组，因为这样定义的 __slots__ 中所含的信息不会变化。

举个🌰 vector2d_v3_slots.py：只在 Vector2d 类中添加了__slots__ 属性

class Vector2d:
    __slots__ = ('__x', '__y')

    typecode = 'd'

# 下面是各个方法（因排版需要而省略了）

在类中定义 __slots__ 属性的目的是告诉解释器：“这个类中的所有实例属性都在这儿了！”这样，Python 会在各个实例中使用类似元组的结构存储实例变量，从而避免使用消耗内存的 __dict__ 属性。如果有数百万个实例同时活动，这样做能节省大量内存。

注意：

　　在类中定义 __slots__ 属性之后，实例不能再有__slots__ 中所列名称之外的其他属性。这只是一个副作用，不是__slots__ 存在的真正原因。不要使用 __slots__ 属性禁止类的用户新增实例属性。__slots__ 是用于优化的，不是为了约束程序员。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：rabbitmq 获取队列名 java rabbitmq队列数量

下一篇：Android 白天夜间模式实现安卓夜间模式

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯

python 将 vector 写到hdfs python vector变量

python 将 vector 写到hdfs python vector变量

51CTO博客