unicode与字符编码
1、go使用的编码
Go 语言采用的字符编码方案从属于 Unicode 编码规范。更确切地说,Go 语言的代码正是由 Unicode 字符组成的。Go 语言的所有源代码,都必须按照 Unicode 编码规范中的 UTF-8 编码格式进行编码。
Go 语言的源码文件必须使用 UTF-8 编码格式进行存储。如果源码文件中出现了非 UTF-8 编码的字符,那么在构建、安装以及运行的时候,go 命令就会报告错误“illegal UTF-8 encoding”。
Go 语言不但拥有可以独立代表 Unicode 字符的类型rune,而且还有可以对字符串值进行 Unicode 字符拆分的for语句。
所以:go中的字符串都是unicode格式
example1:
func main() {
str := "Go爱好者"
fmt.Printf("The string: %q\n", str)
fmt.Printf(" => runes(char): %q\n", []rune(str))
fmt.Printf(" => runes(hex): %x\n", []rune(str))
fmt.Printf(" => runes(d): %d\n", []rune(str))
fmt.Printf(" => bytes(hex): [% x]\n", []byte(str))
}
output:
The string: "Go爱好者"
=> runes(char): ['G' 'o' '爱' '好' '者']
=> runes(hex): [47 6f 7231 597d 8005]
=> runes(d): [71 111 29233 22909 32773]
=> bytes(hex): [47 6f e7 88 b1 e5 a5 bd e8 80 85]
example2:
func main() {
str := "Go爱好者"
for i, c := range str {
fmt.Printf("%d: %q [% x]\n", i, c, []byte(string(c)))
}
}
string类型值会由若干个 Unicode 字符组成,每个 Unicode 字符都可以由一个rune类型的值来承载。
一个string类型的值在底层就是一个能够表达若干个 UTF-8 编码值的字节序列
for语句会先把被遍历的字符串值拆成一个字节序列,然后再试图找出这个字节序列中包含的每一个 UTF-8 编码值,或者说每一个 Unicode 字符。相邻的 Unicode 字符的索引值并不一定是连续的。这取决于前一个 Unicode 字符是否为单字节字符。
output:
0: 'G' [47]
1: 'o' [6f]
2: '爱' [e7 88 b1]
5: '好' [e5 a5 bd]
8: '者' [e8 80 85]
strings包与字符串操作
strings.Builder 和 strings.Reader
strings.Builder
与string值相比,strings.Builder类型的值有哪些优势?
- 与string值相比,Builder值的优势其实主要体现在字符串拼接方面
- 已存在的内容不可变,但可以拼接更多的内容;
- 减少了内存分配和内容拷贝的次数;
- 可将内容重置,可重用值。
Builder值中有一个用于承载内容的容器,它是一个以byte为元素类型的切片
Builder 结构
type Builder struct {
addr *Builder // of receiver, to detect copies by value
buf []byte
}
Builder值并不允许对内部元素进行任意修改,所以Builder值中的内容是不可变的,可以利用Builder值提供的方法拼接更多的内容,而丝毫不用担心这些方法会影响到已存在的内容。
可以通过Write、WriteByte、WriteRune和WriteString 进行拼接
Builder值会自动地对自身的内容容器进行扩容,自动扩容策略与切片的扩容策略一致
example:
func main() {
// 示例1。
var builder1 strings.Builder
builder1.WriteString("A Builder is used to efficiently build a string using Write methods.")
fmt.Printf("The first output(%d):\n%q\n", builder1.Len(), builder1.String())
fmt.Println()
builder1.WriteByte(' ')
builder1.WriteString("It minimizes memory copying. The zero value is ready to use.")
builder1.Write([]byte{'\n', '\n'})
builder1.WriteString("Do not copy a non-zero Builder.")
fmt.Printf("The second output(%d):\n\"%s\"\n", builder1.Len(), builder1.String())
fmt.Println()
// 示例2。
fmt.Println("Grow the builder ...")
builder1.Grow(10) //主动扩容
fmt.Printf("The length of contents in the builder is %d.\n", builder1.Len())
fmt.Println(builder1.Cap())
fmt.Println()
// 示例3。
fmt.Println("Reset the builder ...")
builder1.Reset() // 重置为空
fmt.Printf("The third output(%d):\n%q\n", builder1.Len(), builder1.String())
}
strings.Reader
Reader 结构:
type Reader struct {
s string
i int64 // current reading index
prevRune int // index of previous rune; or < 0
}
- reader.Read()方法 读取内容的时候会记录已读计数
- reader.ReadAt() 不会记录已读技术和修改
- reader.Seek() 会更索引位置 下次在read的会从该位置继续读取
seek 第二个参数,whence 值有三个,代表从当前reader的起始、当前、结束位置进行偏移,并返回最终修改后的索引值
- SeekStart = 0 // seek relative to the origin of the file
- SeekCurrent = 1 // seek relative to the current offset
- SeekEnd = 2 // seek relative to the end
Reader值实现高效读取的关键就在于它内部的已读计数。计数的值就代表着下一次读取的起始索引位置。它可以很容易地被计算出来。
Reader值的Seek方法可以直接设定该值中的已读计数值。
example:
func main() {
// 示例1。
reader1 := strings.NewReader(
"NewReader returns a new Reader reading from s. " +
"It is similar to bytes.NewBufferString but more efficient and read-only.")
fmt.Printf("The size of reader: %d\n", reader1.Size())
fmt.Printf("The len of reader: %d\n", reader1.Len())
fmt.Printf("The reading index in reader: %d\n",
reader1.Size()-int64(reader1.Len()))
buf1 := make([]byte, 47)
n, _ := reader1.Read(buf1) // 从reader中读出buf1大小的内容 返回读取的字节数
fmt.Printf("%d bytes were read. (call Read)\n", n)
fmt.Printf("The reading index in reader: %d\n",
reader1.Size()-int64(reader1.Len()))
fmt.Printf("buf1:%s\n",buf1)
fmt.Println(reader1)
fmt.Println()
// 示例2。
buf2 := make([]byte, 21)
offset1 := int64(64)
n, _ = reader1.ReadAt(buf2, offset1)
fmt.Printf("%d bytes were read. (call ReadAt, offset: %d)\n", n, offset1)
fmt.Printf("The reading index in reader: %d\n",
reader1.Size()-int64(reader1.Len()))
fmt.Printf("buf2:%s\n",buf2)
fmt.Println(reader1)
fmt.Println()
n, _ = reader1.Read(buf2)
fmt.Println(reader1)
fmt.Println()
// 示例3。
offset2 := int64(17)
expectedIndex := reader1.Size() - int64(reader1.Len()) + offset2
fmt.Printf("Seek with offset %d and whence %d ...\n", offset2, io.SeekCurrent)
readingIndex, _ := reader1.Seek(offset2, io.SeekCurrent)
fmt.Printf("The reading index in reader: %d (returned by Seek)\n", readingIndex)
fmt.Printf("The reading index in reader: %d (computed by me)\n", expectedIndex)
fmt.Println(reader1)
n, _ = reader1.Read(buf2)
fmt.Printf("%d bytes were read. (call Read)\n", n)
fmt.Printf("The reading index in reader: %d\n",
reader1.Size()-int64(reader1.Len()))
fmt.Println(reader1)
}
output:
The size of reader: 119
The len of reader: 119
The reading index in reader: 0
47 bytes were read. (call Read)
The reading index in reader: 47
buf1:NewReader returns a new Reader reading from s.
&{NewReader returns a new Reader reading from s. It is similar to bytes.NewBufferString but more efficient and read-only. 47 -1}
21 bytes were read. (call ReadAt, offset: 64)
The reading index in reader: 47
buf2:bytes.NewBufferString
&{NewReader returns a new Reader reading from s. It is similar to bytes.NewBufferString but more efficient and read-only. 47 -1}
&{NewReader returns a new Reader reading from s. It is similar to bytes.NewBufferString but more efficient and read-only. 68 -1}
Seek with offset 17 and whence 1 ...
The reading index in reader: 85 (returned by Seek)
The reading index in reader: 85 (computed by me)
&{NewReader returns a new Reader reading from s. It is similar to bytes.NewBufferString but more efficient and read-only. 85 -1}
21 bytes were read. (call Read)
The reading index in reader: 106
&{NewReader returns a new Reader reading from s. It is similar to bytes.NewBufferString but more efficient and read-only. 106 -1}
其他string函数待补充