索引技术总结
原创
©著作权归作者所有:来自51CTO博客作者Dave__Hu的原创作品,请联系作者获取转载授权,否则将追究法律责任
整体分类
- Hash
- Linear Hashing 线性哈希。可以动态扩容
- 多维哈希 Multi-dimensional Hashing (mah)。可以做任意字段组合的查询
- 一致性hash
- 有序索引
- B+Tree
- B+Tree的一个节点就是一个Page, 一个Page内可以存多达500个值
- 索引和数据的读取都以Page为单位
- LSM-tree
- 跳表
- 基于签名的索引 Signature-based Indexing
- 因为签名很小,所以可以放到单页里,根据签名过滤出数据可能出现在哪些页里进而减少需要检索的页
- Superimposed Codewords (SIMC)
- Concatenated Codewords (CATC)
- 多维索引
- kd-trees
- Quad-trees
- R-trees
- 全文检索
Hash Indexes
- 采用类似LSM-tree的内存存储最新的数据,超过阈值写入磁盘的设计
- 磁盘存segments
- 内存存每个segment对应的hash map, value是offset
- 异步合并segments
LSM-tree (Log-Structured Merge-Tree)
- used by LevelDB, RocksDB, DocDB
- LSM tree has 2 primary components - Memtable and SSTs
- this principle of merging and compacting sorted files are often called LSM storage engines.
- Sorted String Table (SSTables)
- 磁盘存segments, segment内key有序,内存存对应的sparse index
- 最新的数据写入内存的memtable(可以是红黑树),超过阈值则写入到磁盘
- In order to serve a read request, first try to find the key in the memtable, then in the most recent on-disk segment, then in the next-older segment, etc.
- WAL to restore the memtable after a crash.
- Google’s Bigtable paper [9] (which introduced the terms SSTable and memtable).
- Performance optimizations
- bloom filters
- compaction
- size-tiered
- write optimized
- In size-tiered compaction, newer and smaller SSTables are successively merged into older and larger SSTables.
- 需要预留50%的磁盘空间,读取可能需要访问所有的segments
- 实现:HBase, Cassandra
- leveled
- read optimized
- In leveled compaction, the key range is split up into smaller SSTables and older data is moved into separate “levels,”
- 每一层都是分片的,越往下的层分片越多。每一层的数据整体有序,所以读取单key的时候每一层最多只需要访问1个sgement。
- L0 压缩到L1的时候,需要写入L1的所有segments.
- L1, 往后就可以只访问下一层的部分segments, 从而减少了文件的读取量,并可以并发合并
- 实现:LevelDB, RocksDB, Cassandra
- https://cassandra.apache.org/doc/stable/cassandra/operating/compaction/lcs.html
- 磁盘配置
- HDD: one for the commit log and the other for everything else
- SSD: 一个就可以
B-Trees
- A four-level tree of 4 KB pages with a branching factor of 500 can store up to 256 TB
- WAL used to restore the B-tree back to a con‐ sistent state
- careful concurrency control is required
- 可以用读写锁,写锁需要锁定所有需要修改的page后,再开始修改操作
- 可以用Multiple-Granularity Locks尽量减少冲突
B-trees VS LSM-trees
- LSM-trees are typically faster for writes, whereas B-trees are thought to be faster for reads
- LSM-trees 的并发控制相对简单得多
- B-trees attractive in databases that want to offer strong transactional semantics
- B-tree 性能更稳定
- B-tree 在数据库使用广泛且成熟
Value的存储
- clustered index
- value存到索引里
- in MySQL’s InnoDB storage engine, the primary key of a table is always a clustered index, and
- secondary indexes refer to the primary key (rather than a heap file location)
- 这里location是会变化的,所以只能引用primary key
- In SQL Server, you can specify one clustered index per table [32].
- nonclustered index
- value存到heap里,索引存value的引用
- 这个方案在数据的存储管理上不灵活,比如压缩数据意味着需要跟新所有索引里的地址信息。而加一个映射层又会带来额外的开销。因此主流应该是clustered index方案?