索引技术总结

原创

Dave__Hu 2024-05-10 11:50:38 ©著作权

文章标签 索引 文章分类 MySQL 数据库

©著作权归作者所有：来自51CTO博客作者Dave__Hu的原创作品，请联系作者获取转载授权，否则将追究法律责任

整体分类

Hash
- Linear Hashing 线性哈希。可以动态扩容
- 多维哈希 Multi-dimensional Hashing (mah)。可以做任意字段组合的查询
- 一致性hash
有序索引
- B+Tree
  - B+Tree的一个节点就是一个Page, 一个Page内可以存多达500个值
  - 索引和数据的读取都以Page为单位
- LSM-tree
- 跳表
基于签名的索引 Signature-based Indexing
- 因为签名很小，所以可以放到单页里，根据签名过滤出数据可能出现在哪些页里进而减少需要检索的页
- Superimposed Codewords (SIMC)
- Concatenated Codewords (CATC)
多维索引
- kd-trees
- Quad-trees
- R-trees
全文检索

Hash Indexes

采用类似LSM-tree的内存存储最新的数据，超过阈值写入磁盘的设计
磁盘存segments
内存存每个segment对应的hash map, value是offset
异步合并segments

LSM-tree (Log-Structured Merge-Tree)

used by LevelDB, RocksDB, DocDB
LSM tree has 2 primary components - Memtable and SSTs
this principle of merging and compacting sorted files are often called LSM storage engines.
Sorted String Table (SSTables)
- 磁盘存segments, segment内key有序，内存存对应的sparse index
- 最新的数据写入内存的memtable(可以是红黑树)，超过阈值则写入到磁盘
- In order to serve a read request, first try to find the key in the memtable, then in the most recent on-disk segment, then in the next-older segment, etc.
- WAL to restore the memtable after a crash.
- Google’s Bigtable paper [9] (which introduced the terms SSTable and memtable).
Performance optimizations
- bloom filters
- compaction
  - size-tiered
    - write optimized
    - In size-tiered compaction, newer and smaller SSTables are successively merged into older and larger SSTables.
    - 需要预留50%的磁盘空间，读取可能需要访问所有的segments
    - 实现：HBase, Cassandra
  - leveled
    - read optimized
    - In leveled compaction, the key range is split up into smaller SSTables and older data is moved into separate “levels,”
    - 每一层都是分片的，越往下的层分片越多。每一层的数据整体有序，所以读取单key的时候每一层最多只需要访问1个sgement。
    - L0 压缩到L1的时候，需要写入L1的所有segments.
    - L1, 往后就可以只访问下一层的部分segments, 从而减少了文件的读取量，并可以并发合并
    - 实现：LevelDB, RocksDB, Cassandra
    - https://cassandra.apache.org/doc/stable/cassandra/operating/compaction/lcs.html
- 磁盘配置
  - HDD: one for the commit log and the other for everything else
  - SSD: 一个就可以

B-Trees

A four-level tree of 4 KB pages with a branching factor of 500 can store up to 256 TB
WAL used to restore the B-tree back to a con‐ sistent state
careful concurrency control is required
- 可以用读写锁，写锁需要锁定所有需要修改的page后，再开始修改操作
- 可以用Multiple-Granularity Locks尽量减少冲突

B-trees VS LSM-trees

LSM-trees are typically faster for writes, whereas B-trees are thought to be faster for reads
LSM-trees 的并发控制相对简单得多
B-trees attractive in databases that want to offer strong transactional semantics
B-tree 性能更稳定
B-tree 在数据库使用广泛且成熟

Value的存储

clustered index
- value存到索引里
- in MySQL’s InnoDB storage engine, the primary key of a table is always a clustered index, and
  - secondary indexes refer to the primary key (rather than a heap file location)
    - 这里location是会变化的，所以只能引用primary key
- In SQL Server, you can specify one clustered index per table [32].
nonclustered index
- value存到heap里，索引存value的引用
- 这个方案在数据的存储管理上不灵活，比如压缩数据意味着需要跟新所有索引里的地址信息。而加一个映射层又会带来额外的开销。因此主流应该是clustered index方案？