Clustered and Secondary Indexes(聚集索引和二级索引)

Every InnoDB table has a special index called the clustered index where the data for the rows is stored. Typically, the clustered index is synonymous with the primary key. To get the best performance from queries, inserts, and other database operations, you must understand how InnoDB uses the clustered index to optimize the most common lookup and DML operations for each table.

每张使用 InnoDB 作为存储引擎的表都有一个特殊的索引称为聚集索引,它保存着每一行的数据,通常,聚集索引就是主键索引。为了得到更高效的查询、插入以及其他的数据库操作的性能,你必须理解 InnoDB 引擎是如何使用聚集索引来优化常见的查找和 DML 操作。

  • When you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index. Define a primary key for each table that you create. If there is no logical unique and non-null column or set of columns, add a new auto-increment column, whose values are filled in automatically.
    如果你的表定义了一个主键,InnoDB 就使用它作为聚集索引。因此,尽可能的为你的表定义一个主键,如果实在没有一个数据列是唯一且非空的可以作为主键列,建议添加一个自动递增列作为主键列。

  • If you do not define a PRIMARY KEY for your table, MySQL locates the first UNIQUE index where all the key columns are NOT NULL and InnoDB uses it as the clustered index.
    如果你的表没有定义主键,InnoDB 会选择第一个唯一非空索引来作为聚集索引。

  • If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.
    如果你的表既没有主键,又没有合适的唯一索引,InnoDB 内部会生成一个隐式聚集索引 —— GEN_CLUST_INDEX,该索引建立在由 rowid 组成的合成列上。数据行根据 InnoDB 分配的 rowid 排序,rowid 是一个 6 字节的字段,随着数据插入而单调递增。也就是说,数据行根据 rowid 排序实际上是根据插入顺序排序。

How the Clustered Index Speeds Up Queries(聚集索引如何提升查询效率)

Accessing a row through the clustered index is fast because the index search leads directly to the page with all the row data. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record.
通过聚集索引来访问一行数据是非常快的,这是因为所有的行数据和索引在同一页上。如果表特别大,相较于行数据和索引在不同页上存储结构(比如 myisam 引擎),这将大大节省磁盘 I/O 资源。

How Secondary Indexes Relate to the Clustered Index(二级索引和聚集索引如何关联)

All indexes other than the clustered index are known as secondary indexes. In InnoDB, each record in a secondary index contains the primary key columns for the row, as well as the columns specified for the secondary index. InnoDB uses this primary key value to search for the row in the clustered index.
除了聚集索引外的其他索引类型都属于二级索引。在 InnoDB 中,二级索引中的每个记录都包含该行的主键列,以及二级索引指定的列;聚集索引中,InnoDB 通过主键值来查询数据行。

If the primary key is long, the secondary indexes use more space, so it is advantageous to have a short primary key.
如果主键过长,二级索引就需要更大的空间,因此,使用短的主键列是很有利的。

对于二级索引,叶子节点并不包含行记录的全部数据。叶子节点除了包含键值以外,每个叶子节点中的索引行中还包含了一个书签 —— 相应行数据的聚集索引键。

如果在一棵高度为 3 的二级索引树中查找数据,那需要对这颗二级索引树遍历3次找到指定聚集索引键。如果聚集索引树的高度同样为 3 ,那么还需要对聚集索引树进行 3 次查找,最终找到一个完整的行数据所在的页,因此一共需要 6 次逻辑 IO 访问以得到最终的一个数据页。