sql Server数据库索引的实际作用数据库索引sql语句

转载

话不是这么说的 2024-05-24 22:15:34

数据库索引最主要的作用是可以提高检索数据的速度，但是索引也不是越多越好。

因为索引会增加数据库的存储空间，查询数据是要花较多的时间。

1、创建索引

SQL语句如下：

1 2 3 4	`CREATE` `INDEX` `idx_commodity` `ON` `commodity //表名` `USING btree //用B树实现` `(commodity_id); //作用的具体列`

2、删除索引

1	`DROP` `index` `idx_commodity;`

3、增加索引的优势：

创建索引可以大大提高系统的性能。

第一，最主要的原因是可以大大加快数据的检索速度；

第二，通过创建唯一性索引，可以保证数据库表中每一行数据的唯一性；

第三，可以加速表和表之间的连接，特别是在实现数据的参考完整性方面特别有意义；

第四，在使用分组和排序子句进行数据检索时，同样可以显著减少查询中分组和排序的时间；

第五，通过使用索引，可以在查询的过程中，使用优化隐藏器，提高系统的性能。

4、增加索引的劣势：

第一，创建索引和维护索引要花费时间，且随着数据量的增加时间也会增加；

第二，索引会占物理空间，除了数据表占数据空间之外，每一个索引还要占一定的物理空间；'

第三，当对表中的数据进行增加、删除和修改的时候，索引也要动态的维护，这样就降低了数据的维护速度。

5、索引的选择

一般来说，应该在这些列上创建索引：

第一、在经常需要搜索的列上，可以加快搜索的速度；

第二、在作为主键的列上，强制该列的唯一性和组织表中数据的排列结构；

第三、在经常用在连接的列上，这些列主要是一些外键，可以加快连接的速度；

第四、在经常需要根据范围进行搜索的列上创建索引，因为索引已经排序，其指定的范围是连续的；

第五、在经常需要排序的列上创建索引，因为索引已经排序，这样查询可以利用索引的排序，加快排序查询时间；

第六、在WHERE子句的列上面创建索引，加快条件的判断速度。

一般来说，不应该创建索引的的这些列具有下列特点：

第一，对于那些在查询中很少使用或者参考的列不应该创建索引。这是因为，既然这些列很少使用到，因此有索引或者无索引，并不能提高查询速度。相反，由于增加了索引，反而降低了系统的维护速度和增大了空间需求。

第二，对于那些只有很少数据值的列也不应该增加索引。这是因为，由于这些列的取值很少，在查询的结果中，结果集的数据行占了表中数据行的很大比例，即需要在表中搜索的数据行的比例很大。增加索引，并不能明显加快检索速度。

第三，对于那些定义为text, image和bit数据类型的列不应该增加索引。这是因为，这些列的数据量要么相当大，要么取值很少。

第四，当修改性能远远大于检索性能时，不应该创建索引。这是因为，修改性能和检索性能是互相矛盾的。当增加索引时，会提高检索性能，但是会降低修改性能。当减少索引时，会提高修改性能，降低检索性能。因此，当修改性能远远大于检索性能时，不应该创建索引。

补充：PostgreSQL索引分类及使用

1.索引方式

PostgreSQL数据库支持单列index,多列复合 index, 部分index, 唯一index, 表达式index,隐含 index, 和并发index。

2.索引方法

PostgreSQL 支持 B-tree, hash, GiST, and GIN index methods。

3.索引使用范围

1).B-tree

B-tree可以有效使用当一个查询包含等号(=)和范围操作符 (<, <=, >, >=, BETWEEN, and IN)。

2).hash

一个等号操作符(=)，不适合范围操作符。

3).GiST

适用于自定义复杂类型,包括rtree_gist, btree_gist, intarray,tsearch, ltree 和 cube。

4).GIN

GIN比GiST占用多三倍多空间，适合复杂like，例如like ‘%ABC12%'。

4.索引使用注意事项

1).当一个表有很多行时，对一个表列进行索引是很重要的。

2).当检索数据时，应该选择一个好的备选列作为索引，外键，或者取最大最小值的键，列的选择性对索引有效性很重要。

3).为了更好的性能要移除不使用的索引，为了清除无法利用的行每隔一月重建所有索引。

4).如果有非常大量的数据，使用表分区索引。

5）当列中包含NULL值时，可以考虑建立一个不包含NULL的条件索引。

PostgreSQL的B-tree索引用法详解

结构

B-tree索引适合用于存储排序的数据。对于这种数据类型需要定义大于、大于等于、小于、小于等于操作符。

通常情况下，B-tree的索引记录存储在数据页中。

叶子页中的记录包含索引数据（keys）以及指向heap tuple记录（即表的行记录TIDs）的指针。

内部页中的记录包含指向索引子页的指针和子页中最小值。

B-tree有几点重要的特性：

1、B-tree是平衡树，即每个叶子页到root页中间有相同个数的内部页。因此查询任何一个值的时间是相同的。

2、B-tree中一个节点有多个分支，即每页（通常8KB）具有许多TIDs。因此B-tree的高度比较低，通常4到5层就可以存储大量行记录。

3、索引中的数据以非递减的顺序存储（页之间以及页内都是这种顺序），同级的数据页由双向链表连接。因此不需要每次都返回root，通过遍历链表就可以获取一个有序的数据集。

sql Server数据库索引的实际作用数据库索引sql语句_搜索

该索引最顶层的页是元数据页，该数据页存储索引root页的相关信息。

内部节点位于root下面，叶子页位于最下面一层。

向下的箭头表示由叶子节点指向表记录（TIDs）。

等值查询

例如通过"indexed-field = expression"形式的条件查询49这个值。

sql Server数据库索引的实际作用数据库索引sql语句_子节点_02

root节点有三个记录：(4,32,64)。从root节点开始进行搜索，由于32≤ 49 < 64，所以选择32这个值进入其子节点。通过同样的方法继续向下进行搜索一直到叶子节点，最后查询到49这个值。

实际上，查询算法远不止看上去的这么简单。比如，该索引是非唯一索引时，允许存在许多相同值的记录，并且这些相同的记录不止存放在一个页中。此时该如何查询？我们返回到上面的的例子，定位到第二层节点(32,43,49)。如果选择49这个值并向下进入其子节点搜索，就会跳过前一个叶子页中的49这个值。因此，在内部节点进行等值查询49时，定位到49这个值，然后选择49的前一个值43，向下进入其子节点进行搜索。最后，在底层节点中从左到右进行搜索。

(另外一个复杂的地方是，查询的过程中树结构可能会改变，比如分裂)

非等值查询

通过"indexed-field ≤ expression" (or "indexed-field ≥ expression")查询时，

首先通过"indexed-field = expression"形式进行等值（如果存在该值）查询，定位到叶子节点后，再向左或向右进行遍历检索。

下图是查询 n ≤ 35的示意图：

sql Server数据库索引的实际作用数据库索引sql语句_子节点_03

大于和小于可以通过同样的方法进行查询。查询时需要排除等值查询出的值。

范围查询

范围查询"expression1 ≤ indexed-field ≤ expression2"时，需要通过 "expression1 ≤ indexed-field =expression2"找到一匹配值，然后在叶子节点从左到右进行检索，一直到不满足"indexed-field ≤ expression2" 的条件为止；或者反过来，首先通过第二个表达式进行检索，在叶子节点定位到该值后，再从右向左进行检索，一直到不满足第一个表达式的条件为止。

下图是23 ≤ n ≤ 64的查询示意图:

sql Server数据库索引的实际作用数据库索引sql语句_子节点_04

案例

下面是一个查询计划的实例。通过demo database中的aircraft表进行介绍。该表有9行数据，由于整个表只有一个数据页，所以执行计划不会使用索引。为了解释说明问题，我们使用整个表进行说明。

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15	`demo=#` `select` `*` `from` `aircrafts;` `aircraft_code \| model \| range` `---------------+---------------------+-------` `773 \| Boeing 777-300 \| 11100` `763 \| Boeing 767-300 \| 7900` `SU9 \| Sukhoi SuperJet-100 \| 3000` `320 \| Airbus A320-200 \| 5700` `321 \| Airbus A321-200 \| 5600` `319 \| Airbus A319-100 \| 6700` `733 \| Boeing 737-300 \| 4200` `CN1 \| Cessna 208 Caravan \| 1200` `CR2 \| Bombardier CRJ-200 \| 2700` `(9` `rows)` `demo=#` `create` `index` `on` `aircrafts(range);` `demo=#` `set` `enable_seqscan =` `off;`

（更准确的方式：create index on aircrafts using btree(range)，创建索引时默认构建B-tree索引。）

等值查询的执行计划：

1 2 3 4 5 6	`demo=# explain(costs` `off)` `select` `*` `from` `aircrafts` `where` `range = 3000;` `QUERY PLAN` `---------------------------------------------------` `Index` `Scan using aircrafts_range_idx` `on` `aircrafts` `Index` `Cond: (range = 3000)` `(2` `rows)`

非等值查询的执行计划：

1 2 3 4 5 6	`demo=# explain(costs` `off)` `select` `*` `from` `aircrafts` `where` `range < 3000;` `QUERY PLAN` `---------------------------------------------------` `Index` `Scan using aircrafts_range_idx` `on` `aircrafts` `Index` `Cond: (range < 3000)` `(2` `rows)`

范围查询的执行计划：

1 2 3 4 5 6 7	`demo=# explain(costs` `off)` `select` `*` `from` `aircrafts` `where` `range` `between` `3000` `and` `5000;` `QUERY PLAN` `-----------------------------------------------------` `Index` `Scan using aircrafts_range_idx` `on` `aircrafts` `Index` `Cond: ((range >= 3000)` `AND` `(range <= 5000))` `(2` `rows)`

排序

再次强调，通过index、index-only或bitmap扫描，btree访问方法可以返回有序的数据。因此如果表的排序条件上有索引，优化器会考虑以下方式：表的索引扫描；表的顺序扫描然后对结果集进行排序。

排序顺序

当创建索引时可以明确指定排序顺序。如下所示，在range列上建立一个索引，并且排序顺序为降序：

1	`demo=#` `create` `index` `on` `aircrafts(range` `desc);`

本案例中，大值会出现在树的左边，小值出现在右边。为什么有这样的需求？这样做是为了多列索引。创建aircraft的一个视图，通过range分成3部分：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22	`demo=#` `create` `view` `aircrafts_v` `as` `select` `model,` `case` `when` `range < 4000` `then` `1` `when` `range < 10000` `then` `2` `else` `3` `end` `as` `class` `from` `aircrafts;` `demo=#` `select` `*` `from` `aircrafts_v;` `model \| class` `---------------------+-------` `Boeing 777-300 \| 3` `Boeing 767-300 \| 2` `Sukhoi SuperJet-100 \| 1` `Airbus A320-200 \| 2` `Airbus A321-200 \| 2` `Airbus A319-100 \| 2` `Boeing 737-300 \| 2` `Cessna 208 Caravan \| 1` `Bombardier CRJ-200 \| 1` `(9` `rows)`

然后创建一个索引（使用下面表达式）：

1 2 3	`demo=#` `create` `index` `on` `aircrafts(` `(case` `when` `range < 4000` `then` `1` `when` `range < 10000` `then` `2` `else` `3` `end),` `model);`

现在，可以通过索引以升序的方式获取排序的数据：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20	`demo=#` `select` `class, model` `from` `aircrafts_v` `order` `by` `class, model;` `class \| model` `-------+---------------------` `1 \| Bombardier CRJ-200` `1 \| Cessna 208 Caravan` `1 \| Sukhoi SuperJet-100` `2 \| Airbus A319-100` `2 \| Airbus A320-200` `2 \| Airbus A321-200` `2 \| Boeing 737-300` `2 \| Boeing 767-300` `3 \| Boeing 777-300` `(9` `rows)` `demo=# explain(costs` `off)` `select` `class, model` `from` `aircrafts_v` `order` `by` `class, model;` `QUERY PLAN` `--------------------------------------------------------` `Index` `Scan using aircrafts_case_model_idx` `on` `aircrafts` `(1 row)`

同样，可以以降序的方式获取排序的数据：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19	`demo=#` `select` `class, model` `from` `aircrafts_v` `order` `by` `class` `desc, model` `desc;` `class \| model` `-------+---------------------` `3 \| Boeing 777-300` `2 \| Boeing 767-300` `2 \| Boeing 737-300` `2 \| Airbus A321-200` `2 \| Airbus A320-200` `2 \| Airbus A319-100` `1 \| Sukhoi SuperJet-100` `1 \| Cessna 208 Caravan` `1 \| Bombardier CRJ-200` `(9` `rows)` `demo=# explain(costs` `off)` `select` `class, model` `from` `aircrafts_v` `order` `by` `class` `desc, model` `desc;` `QUERY PLAN` `-----------------------------------------------------------------` `Index` `Scan BACKWARD using aircrafts_case_model_idx` `on` `aircrafts` `(1 row)`

然而，如果一列以升序一列以降序的方式获取排序的数据的话，就不能使用索引，只能单独排序：

1 2 3 4 5 6 7 8	`demo=# explain(costs` `off)` `select` `class, model` `from` `aircrafts_v` `order` `by` `class` `ASC, model` `DESC;` `QUERY PLAN` `-------------------------------------------------` `Sort` `Sort` `Key: (CASE` `...` `END), aircrafts.model` `DESC` `-> Seq Scan` `on` `aircrafts` `(3` `rows)`

（注意，最终执行计划会选择顺序扫描，忽略之前设置的enable_seqscan = off。因为这个设置并不会放弃表扫描，只是设置他的成本----查看costs on的执行计划）

若有使用索引，创建索引时指定排序的方向：

1 2 3 4 5 6 7 8 9 10 11 12 13 14	`demo=#` `create` `index` `aircrafts_case_asc_model_desc_idx` `on` `aircrafts(` `(case` `when` `range < 4000` `then` `1` `when` `range < 10000` `then` `2` `else` `3` `end)` `ASC,` `model` `DESC);` `demo=# explain(costs` `off)` `select` `class, model` `from` `aircrafts_v` `order` `by` `class` `ASC, model` `DESC;` `QUERY PLAN` `-----------------------------------------------------------------` `Index` `Scan using aircrafts_case_asc_model_desc_idx` `on` `aircrafts` `(1 row)`

列的顺序

当使用多列索引时与列的顺序有关的问题会显示出来。对于B-tree，这个顺序非常重要：页中的数据先以第一个字段进行排序，然后再第二个字段，以此类推。

下图是在range和model列上构建的索引：

sql Server数据库索引的实际作用数据库索引sql语句_子节点_05

当然，上图这么小的索引在一个root页足以存放。但是为了清晰起见，特意将其分成几页。

从图中可见，通过类似的谓词class = 3（仅按第一个字段进行搜索）或者class = 3 and model = 'Boeing 777-300'（按两个字段进行搜索）将非常高效。

然而，通过谓词model = 'Boeing 777-300'进行搜索的效率将大大降低：从root开始，判断不出选择哪个子节点进行向下搜索，因此会遍历所有子节点向下进行搜索。这并不意味着永远无法使用这样的索引----它的效率有问题。例如，如果aircraft有3个classes值，每个class类中有许多model值，此时不得不扫描索引1/3的数据，这可能比全表扫描更有效。

但是，当创建如下索引时：

1 2 3	`demo=#` `create` `index` `on` `aircrafts(` `model,` `(case` `when` `range < 4000` `then` `1` `when` `range < 10000` `then` `2` `else` `3` `end));`

索引字段的顺序会改变：

sql Server数据库索引的实际作用数据库索引sql语句_子节点_06

通过这个索引，model = 'Boeing 777-300'将会很有效，但class = 3则没这么高效。

NULLs

PostgreSQL的B-tree支持在NULLs上创建索引，可以通过IS NULL或者IS NOT NULL的条件进行查询。

考虑flights表，允许NULLs：

1 2 3 4 5 6 7 8 9	`demo=#` `create` `index` `on` `flights(actual_arrival);` `demo=# explain(costs` `off)` `select` `*` `from` `flights` `where` `actual_arrival` `is` `null;` `QUERY PLAN` `-------------------------------------------------------` `Bitmap Heap Scan` `on` `flights` `Recheck Cond: (actual_arrival` `IS` `NULL)` `-> Bitmap` `Index` `Scan` `on` `flights_actual_arrival_idx` `Index` `Cond: (actual_arrival` `IS` `NULL)` `(4` `rows)`

NULLs位于叶子节点的一端或另一端，这依赖于索引的创建方式（NULLS FIRST或NULLS LAST）。如果查询中包含排序，这就显得很重要了：如果SELECT语句在ORDER BY子句中指定NULLs的顺序索引构建的顺序一样（NULLS FIRST或NULLS LAST），就可以使用整个索引。

下面的例子中，他们的顺序相同，因此可以使用索引：

1 2 3 4 5 6	`demo=# explain(costs` `off)` `select` `*` `from` `flights` `order` `by` `actual_arrival NULLS` `LAST;` `QUERY PLAN` `--------------------------------------------------------` `Index` `Scan using flights_actual_arrival_idx` `on` `flights` `(1 row)`

下面的例子，顺序不同，优化器选择顺序扫描然后进行排序：

1 2 3 4 5 6 7 8	`demo=# explain(costs` `off)` `select` `*` `from` `flights` `order` `by` `actual_arrival NULLS` `FIRST;` `QUERY PLAN` `----------------------------------------` `Sort` `Sort` `Key: actual_arrival NULLS` `FIRST` `-> Seq Scan` `on` `flights` `(3` `rows)`

NULLs必须位于开头才能使用索引：

1 2 3 4 5 6 7	`demo=#` `create` `index` `flights_nulls_first_idx` `on` `flights(actual_arrival NULLS` `FIRST);` `demo=# explain(costs` `off)` `select` `*` `from` `flights` `order` `by` `actual_arrival NULLS` `FIRST;` `QUERY PLAN` `-----------------------------------------------------` `Index` `Scan using flights_nulls_first_idx` `on` `flights` `(1 row)`

像这样的问题是由NULLs引起的而不是无法排序，也就是说NULL和其他这比较的结果无法预知：

1 2 3 4 5 6	`demo=# \pset` `null` `NULL` `demo=#` `select` `null` `< 42;` `?column?` `----------` `NULL` `(1 row)`

这和B-tree的概念背道而驰并且不符合一般的模式。然而NULLs在数据库中扮演者很重要的角色，因此不得不为NULL做特殊设置。

由于NULLs可以被索引，因此即使表上没有任何标记也可以使用索引。（因为这个索引包含表航记录的所有信息）。如果查询需要排序的数据，而且索引确保了所需的顺序，那么这可能是由意义的。这种情况下，查询计划更倾向于通过索引获取数据。

属性

下面介绍btree访问方法的特性。

1 2 3 4 5 6	`amname \|` `name` `\| pg_indexam_has_property` `--------+---------------+-------------------------` `btree \| can_order \| t` `btree \| can_unique \| t` `btree \| can_multi_col \| t` `btree \| can_exclude \| t`

可以看到，B-tree能够排序数据并且支持唯一性。同时还支持多列索引，但是其他访问方法也支持这种索引。我们将在下次讨论EXCLUDE条件。

1 2 3 4 5 6	`name` `\| pg_index_has_property` `---------------+-----------------------` `clusterable \| t` `index_scan \| t` `bitmap_scan \| t` `backward_scan \| t`

Btree访问方法可以通过以下两种方式获取数据：index scan以及bitmap scan。可以看到，通过tree可以向前和向后进行遍历。

1 2 3 4 5 6 7 8 9 10 11	`name` `\| pg_index_column_has_property` `--------------------+------------------------------` `asc` `\| t` `desc` `\| f` `nulls_first \| f` `nulls_last \| t` `orderable \| t` `distance_orderable \| f` `returnable \| t` `search_array \| t` `search_nulls \| t`

前四种特性指定了特定列如何精确的排序。本案例中，值以升序（asc）进行排序并且NULLs在后面（nulls_last）。也可以有其他组合。

search_array的特性支持向这样的表达式：

1 2 3 4 5 6 7	`demo=# explain(costs` `off)` `select` `*` `from` `aircrafts` `where` `aircraft_code` `in` `('733','763','773');` `QUERY PLAN` `-----------------------------------------------------------------` `Index` `Scan using aircrafts_pkey` `on` `aircrafts` `Index` `Cond: (aircraft_code =` `ANY` `('{733,763,773}'::bpchar[]))` `(2` `rows)`

returnable属性支持index-only scan，由于索引本身也存储索引值所以这是合理的。下面简单介绍基于B-tree的覆盖索引。

具有额外列的唯一索引

前面讨论了：

覆盖索引包含查询所需的所有值，需不要再回表。

唯一索引可以成为覆盖索引。

假设我们查询所需要的列添加到唯一索引，新的组合唯一键可能不再唯一，同一列上将需要2个索引：一个唯一，支持完整性约束；另一个是非唯一，为了覆盖索引。这当然是低效的。

在我们公司 Anastasiya Lubennikova @ lubennikovaav 改进了btree，额外的非唯一列可以包含在唯一索引中。我们希望这个补丁可以被社区采纳。实际上PostgreSQL11已经合了该补丁。

考虑表bookings：d

1 2 3 4 5 6 7 8 9 10 11	`demo=# \d bookings` `Table` `"bookings.bookings"` `Column` `\| Type \| Modifiers` `--------------+--------------------------+-----------` `book_ref \|` `character(6) \|` `not` `null` `book_date \|` `timestamp` `with` `time` `zone \|` `not` `null` `total_amount \|` `numeric(10,2) \|` `not` `null` `Indexes:` `"bookings_pkey"` `PRIMARY` `KEY, btree (book_ref)` `Referenced` `by:` `TABLE` `"tickets"` `CONSTRAINT` `"tickets_book_ref_fkey"` `FOREIGN` `KEY` `(book_ref)` `REFERENCES` `bookings(book_ref)`

这个表中，主键（book_ref,booking code）通过常规的btree索引提供，下面创建一个由额外列的唯一索引：

1	`demo=#` `create` `unique` `index` `bookings_pkey2` `on` `bookings(book_ref) INCLUDE (book_date);`

然后使用新索引替代现有索引：

1 2 3 4 5	`demo=#` `begin;` `demo=#` `alter` `table` `bookings` `drop` `constraint` `bookings_pkey` `cascade;` `demo=#` `alter` `table` `bookings` `add` `primary` `key` `using` `index` `bookings_pkey2;` `demo=#` `alter` `table` `tickets` `add` `foreign` `key` `(book_ref)` `references` `bookings (book_ref);` `demo=#` `commit;`

然后表结构：

1 2 3 4 5 6 7 8 9 10 11	`demo=# \d bookings` `Table` `"bookings.bookings"` `Column` `\| Type \| Modifiers` `--------------+--------------------------+-----------` `book_ref \|` `character(6) \|` `not` `null` `book_date \|` `timestamp` `with` `time` `zone \|` `not` `null` `total_amount \|` `numeric(10,2) \|` `not` `null` `Indexes:` `"bookings_pkey2"` `PRIMARY` `KEY, btree (book_ref) INCLUDE (book_date)` `Referenced` `by:` `TABLE` `"tickets"` `CONSTRAINT` `"tickets_book_ref_fkey"` `FOREIGN` `KEY` `(book_ref)` `REFERENCES` `bookings(book_ref)`

此时，这个索引可以作为唯一索引工作也可以作为覆盖索引：

1 2 3 4 5 6 7	`demo=# explain(costs` `off)` `select` `book_ref, book_date` `from` `bookings` `where` `book_ref =` `'059FC4';` `QUERY PLAN` `--------------------------------------------------` `Index` `Only` `Scan using bookings_pkey2` `on` `bookings` `Index` `Cond: (book_ref =` `'059FC4'::bpchar)` `(2` `rows)`

创建索引

众所周知，对于大表，加载数据时最好不要带索引；加载完成后再创建索引。这样做不仅提升效率还能节省空间。

创建B-tree索引比向索引中插入数据更高效。所有的数据大致上都已排序，并且数据的叶子页已创建好，然后只需构建内部页直到root页构建成一个完整的B-tree。

这种方法的速度依赖于RAM的大小，受限于参数maintenance_work_mem。因此增大该参数值可以提升速度。对于唯一索引，除了分配maintenance_work_mem的内存外，还分配了work_mem的大小的内存。

比较

前面，提到PG需要知道对于不同类型的值调用哪个函数，并且这个关联方法存储在哈希访问方法中。同样，系统必须找出如何排序。这在排序、分组（有时）、merge join中会涉及。PG不会将自身绑定到操作符名称，因为用户可以自定义他们的数据类型并给出对应不同的操作符名称。

例如bool_ops操作符集中的比较操作符：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18	`postgres=#` `select` `amop.amopopr::regoperator` `as` `opfamily_operator,` `amop.amopstrategy` `from` `pg_am am,` `pg_opfamily opf,` `pg_amop amop` `where` `opf.opfmethod = am.oid` `and` `amop.amopfamily = opf.oid` `and` `am.amname =` `'btree'` `and` `opf.opfname =` `'bool_ops'` `order` `by` `amopstrategy;` `opfamily_operator \| amopstrategy` `---------------------+--------------` `<(boolean,boolean) \| 1` `<=(boolean,boolean) \| 2` `=(boolean,boolean) \| 3` `>=(boolean,boolean) \| 4` `>(boolean,boolean) \| 5` `(5` `rows)`

这里可以看到有5种操作符，但是不应该依赖于他们的名字。为了指定哪种操作符做什么操作，引入策略的概念。为了描述操作符语义，定义了5种策略：

1 — less

2 — less or equal

3 — equal

4 — greater or equal

5 — greater

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22	`postgres=#` `select` `amop.amopopr::regoperator` `as` `opfamily_operator` `from` `pg_am am,` `pg_opfamily opf,` `pg_amop amop` `where` `opf.opfmethod = am.oid` `and` `amop.amopfamily = opf.oid` `and` `am.amname =` `'btree'` `and` `opf.opfname =` `'integer_ops'` `and` `amop.amopstrategy = 1` `order` `by` `opfamily_operator;` `pfamily_operator` `----------------------` `<(integer,bigint)` `<(smallint,smallint)` `<(integer,integer)` `<(bigint,bigint)` `<(bigint,integer)` `<(smallint,integer)` `<(integer,smallint)` `<(smallint,bigint)` `<(bigint,smallint)` `(9` `rows)`

一些操作符族可以包含几种操作符，例如integer_ops包含策略1的几种操作符：

正因如此，当比较类型在一个操作符族中时，不同类型值的比较，优化器可以避免类型转换。

索引支持的新数据类型

文档中提供了一个创建符合数值的新数据类型，以及对这种类型数据进行排序的操作符类。该案例使用C语言完成。但不妨碍我们使用纯SQL进行对比试验。

创建一个新的组合类型：包含real和imaginary两个字段

1	`postgres=#` `create` `type complex` `as` `(re` `float, im` `float);`

创建一个包含该新组合类型字段的表：

1 2	`postgres=#` `create` `table` `numbers(x complex);` `postgres=#` `insert` `into` `numbers` `values` `((0.0, 10.0)), ((1.0, 3.0)), ((1.0, 1.0));`

现在有个疑问，如果在数学上没有为他们定义顺序关系，如何进行排序？

已经定义好了比较运算符：

1 2 3 4 5 6 7	`postgres=#` `select` `*` `from` `numbers` `order` `by` `x;` `x` `--------` `(0,10)` `(1,1)` `(1,3)` `(3` `rows)`

默认情况下，对于组合类型排序是分开的：首先比较第一个字段然后第二个字段，与文本字符串比较方法大致相同。但是我们也可以定义其他的排序方式，例如组合数字可以当做一个向量，通过模值进行排序。为了定义这样的顺序，我们需要创建一个函数：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25	`postgres=#` `create` `function` `modulus(a complex)` `returns` `float` `as` `$$` `select` `sqrt(a.rea.re + a.ima.im);` `$$ immutable language sql;` `//此时，使用整个函数系统的定义5种操作符：` `postgres=#` `create` `function` `complex_lt(a complex, b complex)` `returns` `boolean` `as` `$$` `select` `modulus(a) < modulus(b);` `$$ immutable language sql;` `postgres=#` `create` `function` `complex_le(a complex, b complex)` `returns` `boolean` `as` `$$` `select` `modulus(a) <= modulus(b);` `$$ immutable language sql;` `postgres=#` `create` `function` `complex_eq(a complex, b complex)` `returns` `boolean` `as` `$$` `select` `modulus(a) = modulus(b);` `$$ immutable language sql;` `postgres=#` `create` `function` `complex_ge(a complex, b complex)` `returns` `boolean` `as` `$$` `select` `modulus(a) >= modulus(b);` `$$ immutable language sql;` `postgres=#` `create` `function` `complex_gt(a complex, b complex)` `returns` `boolean` `as` `$$` `select` `modulus(a) > modulus(b);` `$$ immutable language sql;`

然后创建对应的操作符：

1 2 3 4 5	`postgres=#` `create` `operator #<#(leftarg=complex, rightarg=complex,` `procedure=complex_lt);` `postgres=#` `create` `operator #<=#(leftarg=complex, rightarg=complex,` `procedure=complex_le);` `postgres=#` `create` `operator #=#(leftarg=complex, rightarg=complex,` `procedure=complex_eq);` `postgres=#` `create` `operator #>=#(leftarg=complex, rightarg=complex,` `procedure=complex_ge);` `postgres=#` `create` `operator #>#(leftarg=complex, rightarg=complex,` `procedure=complex_gt);`

此时，可以比较数字：

1 2 3 4 5	`postgres=#` `select` `(1.0,1.0)::complex #<# (1.0,3.0)::complex;` `?column?` `----------` `t` `(1 row)`

除了整个5个操作符，还需要定义函数：小于返回-1；等于返回0；大于返回1。其他访问方法可能需要定义其他函数：

`postgres=#` `create` `function` `complex_cmp(a complex, b complex)` `returns` `integer` `as` `$$` `select` `case` `when` `modulus(a) < modulus(b)` `then` `-1` `when` `modulus(a) > modulus(b)` `then` `1` `else` `0` `end;` `$$ language sql;`

创建一个操作符类：

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36	`postgres=#` `create` `operator class complex_ops` `default` `for` `type complex` `using btree` `as` `operator 1 #<#,` `operator 2 #<=#,` `operator 3 #=#,` `operator 4 #>=#,` `operator 5 #>#,` `function` `1 complex_cmp(complex,complex);` `//排序结果：` `postgres=#` `select` `*` `from` `numbers` `order` `by` `x;` `x` `--------` `(1,1)` `(1,3)` `(0,10)` `(3` `rows)` `//可以使用此查询获取支持的函数：` `postgres=#` `select` `amp.amprocnum,` `amp.amproc,` `amp.amproclefttype::regtype,` `amp.amprocrighttype::regtype` `from` `pg_opfamily opf,` `pg_am am,` `pg_amproc amp` `where` `opf.opfname =` `'complex_ops'` `and` `opf.opfmethod = am.oid` `and` `am.amname =` `'btree'` `and` `amp.amprocfamily = opf.oid;` `amprocnum \| amproc \| amproclefttype \| amprocrighttype` `-----------+-------------+----------------+-----------------` `1 \| complex_cmp \| complex \| complex` `(1 row)`

内部结构

使用pageinspect插件观察B-tree结构：

1	`demo=#` `create` `extension pageinspect;`

索引的元数据页：

1 2 3 4 5	`demo=#` `select` `*` `from` `bt_metap('ticket_flights_pkey');` `magic \| version \| root \|` `level` `\| fastroot \| fastlevel` `--------+---------+------+-------+----------+-----------` `340322 \| 2 \| 164 \| 2 \| 164 \| 2` `(1 row)`

值得关注的是索引level：不包括root，有一百万行记录的表其索引只需要2层就可以了。

Root页，即164号页面的统计信息：

1 2 3 4 5 6	`demo=#` `select` `type, live_items, dead_items, avg_item_size, page_size, free_size` `from` `bt_page_stats('ticket_flights_pkey',164);` `type \| live_items \| dead_items \| avg_item_size \| page_size \| free_size` `------+------------+------------+---------------+-----------+-----------` `r \| 33 \| 0 \| 31 \| 8192 \| 6984` `(1 row)`

该页中数据：

1 2 3 4 5 6 7 8 9 10	`demo=#` `select` `itemoffset, ctid, itemlen,` `left(data,56)` `as` `data` `from` `bt_page_items('ticket_flights_pkey',164) limit 5;` `itemoffset \| ctid \| itemlen \| data` `------------+---------+---------+----------------------------------------------------------` `1 \| (3,1) \| 8 \|` `2 \| (163,1) \| 32 \| 1d 30 30 30 35 34 33 32 33 30 35 37 37 31 00 00 ff 5f 00` `3 \| (323,1) \| 32 \| 1d 30 30 30 35 34 33 32 34 32 33 36 36 32 00 00 4f 78 00` `4 \| (482,1) \| 32 \| 1d 30 30 30 35 34 33 32 35 33 30 38 39 33 00 00 4d 1e 00` `5 \| (641,1) \| 32 \| 1d 30 30 30 35 34 33 32 36 35 35 37 38 35 00 00 2b 09 00` `(5` `rows)`

第一个tuple指定该页的最大值，真正的数据从第二个tuple开始。很明显最左边子节点的页号是163，然后是323。反过来，可以使用相同的函数搜索。

PG10版本提供了"amcheck"插件，该插件可以检测B-tree数据的逻辑一致性，使我们提前探知故障。

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。

上一篇：cuda开发云平台 cuda开源库

下一篇：java 静态绑定 java静态联编

提问和评论都可以，用心的回复会被更多人看到评论

发布评论

相关文章

官方博客	全部文章	热门标签	班级博客
了解我们	网站地图	意见反馈

鸿蒙开发者社区	51CTO学堂
51CTO	软考资讯