2003 年,一种访问远程数据的规范,称为外部数据的 SQL 管理 (SQL/MED),被添加到 SQL 标准中。 从 9.1 版开始,PostgreSQL 一直在开发此功能以实现 SQL/MED 的一部分。In 2003, a specification to access remote data, called SQL Management of External Data (SQL/MED), was added to the SQL standard. This feature has been developing by PostgreSQL to realize a portion of SQL/MED since version 9.1. 在 SQL/MED 中,远程服务器上的表称为外表。 PostgreSQL 外部数据包装器 (FDW) 是使用 SQL/MED 来管理类似于本地表的外部表。In SQL/MED, a table on a remote server is called a foreign table. PostgreSQL’s Foreign Data Wrappers (FDW) are that use SQL/MED to manage foreign tables which are similar to local tables.
安装必要的扩展并进行适当的设置后,您可以访问远程服务器上的外部表。 例如,假设有两个远程服务器postgresql 和 mysql,它们分别有 foreign_pg_tbl 表和 foreign_my_tbl 表。 在此示例中,您可以通过发出 SELECT 查询从本地服务器访问外部表,如下所示。After installing the necessary extension and making the appropriate settings, you can access the foreign tables on the remote servers. For example, suppose there are two remote servers, namaly, postgresql and mysql, which have foreign_pg_tbl table and foreign_my_tbl table, respectively. In this exmaple, you can access the foreign tables from the local server by issuing the SELECT queries as shown below.
此外,您可以对存储在不同服务器中的外部表执行连接操作,这些外部表类似于本地表。
Postgres wiki 中已开发并列出了许多 FDW 扩展。 然而,几乎所有的扩展都没有得到妥善维护,除了 postgres_fdw,它是由 PostgreSQL 全球开发组正式开发和维护的,作为访问远程 PostgreSQL 服务器的扩展。
PostgreSQL 的 FDW 将在以下部分详细介绍。 4.1.1 节概述了 PostgreSQL 中的 FDW。 4.1.2 节描述了 postgres_fdw 扩展是如何执行的。
Many FDW extensions have been developed and listed in Postgres wiki. However, almost all extensions are not properly maintained except for postgres_fdw, which is officially developed and maintained by the PostgreSQL Global Development Group as an extension to access a remote PostgreSQL server.
PostgreSQL’s FDW is described in detail in the following sections. Section 4.1.1 presents an overview of the FDW in PostgreSQL. Section 4.1.2 describes how the postgres_fdw extension performs.
Overview
要使用FDW功能,需要安装相应的扩展并执行设置命令,例如CREATE FOREIGN TABLE、CREATE SERVER和CREATE USER MAPPING(具体请参考官方文档)。提供适当的设置后,在查询处理期间调用扩展中定义的函数来访问外部表。图 4.2 简要描述了 FDW 在 PostgreSQL 中的表现。
To use the FDW feature, you need to install the appropriate extension and execute setup commands, such as CREATE FOREIGN TABLE, CREATE SERVER and CREATE USER MAPPING (for details, refer to the official document).
After providing the appropriate setting, the functions defined in the extension are invoked during query processing to access the foreign tables.
Fig.4.2 briefly describes how FDW performs in PostgreSQL.
(1) 分析器/分析器创建输入 SQL 的查询树。
(2) 计划者(或执行者)连接到远程服务器。
(3) 如果 use_remote_estimate 选项打开(默认关闭),则规划器执行 EXPLAIN 命令来估计每个规划路径的成本。
(4) 规划器从内部称为 deparesing 的规划树创建纯文本 SQL 语句。
(5) 执行器将明文SQL语句发送到远程服务器并接收结果。
(1) The analyzer/analyser creates the query tree of the input SQL.
(2) The planner (or executor) connects to the remote server.
(3) If the use_remote_estimate option is on (the default is off), the planner executes EXPLAIN commands for estimating the cost of each plan path.
(4) The planner creates the plain text SQL statement from the plan tree which is internally called deparesing.
(5) The executor sends the plain text SQL statement to the remote server and receives the result.
如有必要,执行器然后处理接收到的数据。 例如,如果执行多表查询,则执行器将接收到的数据与其他表进行连接处理。每个处理的细节在以下部分中描述。
The executor then processes the received data if necessary. For example, if the multi-table query is executed, the executor perfoms the join processing of the received data and other tables.The details of each processing are described in the following sections.
Creating a Query Tree
analyzer/analyser使用外部表的定义创建输入 SQL 的查询树,外部表使用 CREATE FOREIGN TABLE 或 IMPORT FOREIGN SCHEMA 命令存储在 pg_catalog.pg_class 和 pg_catalog.pg_foreign_table 目录中。
The analyzer/analyser creates the query tree of the input SQL using the definitions of the foreign tables, which are stored in the pg_catalog.pg_class and pg_catalog.pg_foreign_table catalogs using the CREATE FOREIGN TABLE or IMPORT FOREIGN SCHEMA command.
Connecting to the Remote Server
要连接到远程服务器,规划器(或执行器)使用特定库连接到远程数据库服务器。 例如,要连接到远程 PostgreSQL 服务器,postgres_fdw 使用 libpq。 EnterpriseDB开发的mysql_fdw使用libmysqlclient连接mysql服务器。连接参数,例如用户名、服务器的 IP 地址和端口号,使用 CREATE USER MAPPING 和 CREATE SERVER 命令存储在 pg_catalog.pg_user_mapping 和 pg_catalog.pg_foreign_server 目录中。
To connect to the remote server, the planner (or executor) uses the specific library to connect to the remote database server. For example, to connect to the remote PostgreSQL server, postgres_fdw uses the libpq. To connect to the mysql server, mysql_fdw, which is developed by EnterpriseDB, uses the libmysqlclient.
The connection parameters, such as username, server’s IP address and port number, are stored in the pg_catalog.pg_user_mapping and pg_catalog.pg_foreign_server catalogs using the CREATE USER MAPPING and CREATE SERVER commands.
Creating a Plan Tree Using EXPLAIN Commands (Optional)
PostgreSQL 的 FDW 支持获取外部表的统计信息以估计查询的计划树的功能,这些功能被一些 FDW 扩展使用,例如 postgres_fdw、mysql_fdw、tds_fdw 和 jdbc2_fdw。如果使用 ALTER SERVER 命令将 use_remote_estimate 选项设置为 on,则计划程序通过执行 EXPLAIN 命令向远程服务器查询计划成本; 否则,默认使用嵌入的常量值。
PostgreSQL’s FDW supports the feature to obtain statistics of the foreign tables to estimate the plan tree of a query, which are used by some FDW extensions, such as postgres_fdw, mysql_fdw, tds_fdw and jdbc2_fdw.
If the use_remote_estimate option is set to on using the ALTER SERVER command, the planner queries the cost of plans to the remote server by executing the EXPLAIN command; otherwise, the embedded constant values are used by default.
虽然,一些扩展使用 EXPLAIN 命令的值,但只有 postgres_fdw 可以反映 EXPLAIN 命令的结果,因为 PostgreSQL 的 EXPLAIN 命令返回启动和总成本。其他 DBMS fdw 扩展无法使用 EXPLAIN 命令的结果进行规划。 例如mysql的EXPLAIN命令只返回估计的行数; 但是,PostgreSQL 的规划器需要更多信息来估算成本,如第 3 章所述。
Although, some extensions use the values of the EXPLAIN command, only postgres_fdw can reflect the results of the EXPLAIN commands because the PostgreSQL’s EXPLAIN command returns both the start-up and total costs.
The results of the EXPLAIN command could not be used by other DBMS fdw extensions for planning. For example, mysql’s EXPLAIN command only returns the estimated number of rows; however, PostgreSQL’s planner needs more information to estimate the cost as described in Chapter 3.
Deparesing
为了生成计划树,计划器从计划树的外部表扫描路径创建纯文本 SQL 语句。 例如,图 4.3 显示了以下 SELECT 语句的计划树。To generate the plan tree, the planner creates the plain text SQL statement from the plan tree’s scan paths of the foreign tables. For example, Fig. 4.3 shows the plan tree of the following SELECT statement.
图 4.3 显示了从 PlannedStmt 的计划树链接的 ForeignScan 节点存储了一个纯 SELECT 文本。 这里,postgres_fdw 从通过解析和分析创建的查询树重新创建一个纯 SELECT 文本,这在 PostgreSQL 中称为 deparsing。Fig.4.3 shows that the ForeignScan node, which is linked from the plan tree of the PlannedStmt, stores a plain SELECT text. Here, postgres_fdw recreates a plain SELECT text from the query tree that has been created by parsing and analysing, which is called deparsing in PostgreSQL.
mysql_fdw 的使用从查询树中为 MySQL 重新创建一个 SELECT 文本。 使用 redis_fdw 或 rw_redis_fdw 创建一个 SELECT 命令。The use of mysql_fdw recreates a SELECT text for MySQL from the query tree. The use of redis_fdw or rw_redis_fdw creates a SELECT command.
Sending SQL Statements and Receiving Result
解析后,执行器将解析后的 SQL 语句发送到远程服务器并接收结果。将 SQL 语句发送到远程服务器的方法取决于每个扩展的开发人员。 例如,mysql_fdw 在不使用事务的情况下发送 SQL 语句。 在 mysql_fdw 中执行 SELECT 查询的典型 SQL 语句序列如下所示(图 4.4)。
(5-1) 将 SQL_MODE 设置为“ANSI_QUOTES”。
(5-2) 向远程服务器发送 SELECT 语句。
(5-3) 从远程服务器接收结果。
在这里,mysql_fdw 将结果转换为 PostgreSQL 可读的数据。所有 FDW 扩展都实现了将结果转换为 PostgreSQL 可读数据的功能。
(5-1) Set the SQL_MODE to ‘ANSI_QUOTES’.
(5-2) Send a SELECT statement to the remote server.
(5-3) Receive the result from the remote server.
Here, mysql_fdw converts the result to readable data by PostgreSQL.All FDW extensions implemented the feature that converts the result to PostgreSQL readable data.
远程服务器的实际日志可以在这里找到; 显示远程服务器收到的语句。在 postgres_fdw 中,SQL 命令的顺序很复杂。 在 postgres_fdw 中执行 SELECT 查询的典型 SQL 语句序列如下所示(图 4.5)。The actual log of the remote server can be found here; the statements received by the remote server are shown.In postgres_fdw, the sequence of SQL commands is complicated. The typical sequence of SQL statements to execute a SELECT query in postgres_fdw is shown below (Fig. 4.5).
(5-1) 启动远程事务。
默认的远程事务隔离级别是 REPEATABLE READ; 如果本地事务的隔离级别设置为 SERIALIZABLE,则远程事务也设置为 SERIALIZABLE。
(5-2)-(5-4) 声明一个游标。
SQL 语句基本上是作为游标执行的。
(5-5) 执行 FETCH 命令获取结果。
默认情况下,FETCH 命令获取 100 行。
(5-6) 从远程服务器接收结果。
(5-7) 关闭光标。
(5-8) 提交远程事务。
(5-1) Start the remote transaction.
The default remote transaction isolation level is REPEATABLE READ; if the isolation level of the local transaction is set to SERIALIZABLE, the remote transaction is also set to SERIALIZABLE.
(5-2)-(5-4) Declare a cursor.
The SQL statement is basically executed as a cursor.
(5-5) Execute FETCH commands to obtain the result.
By default, 100 rows are fetched by the FETCH command.
(5-6) Receive the result from the remote server.
(5-7) Close the cursor.
(5-8) Commit the remote transaction.
postgres_fdw 中的默认远程事务隔离级别。
关于为什么默认的远程事务隔离级别为 REPEATABLE READ 的解释在官方文档中有说明。
当本地事务具有 SERIALIZABLE 隔离级别时,远程事务使用 SERIALIZABLE 隔离级别; 否则它使用 REPEATABLE READ 隔离级别。 此选择可确保如果查询在远程服务器上执行多个表扫描,它将获得所有扫描的快照一致的结果。 结果是单个事务中的连续查询将看到来自远程服务器的相同数据,即使由于其他活动而在远程服务器上发生并发更新。
The explanation for why the default remote transaction isolation level is REPEATABLE READ is provided in the official document.
The remote transaction uses the SERIALIZABLE isolation level when the local transaction has the SERIALIZABLE isolation level; otherwise it uses the REPEATABLE READ isolation level. This choice ensures that if a query performs multiple table scans on the remote server, it will get snapshot-consistent results for all the scans. A consequence is that successive queries within a single transaction will see the same data from the remote server, even if concurrent updates are occurring on the remote server due to other activities.
How the Postgres_fdw Extension Performs
postgres_fdw 扩展是由 PostgreSQL 全球开发组官方维护的特殊模块,其源代码包含在 PostgreSQL 源代码树中。postgres_fdw 逐渐完善。 表 4.1 给出了官方文档中与 postgres_fdw 相关的发行说明。
postgres_fdw extension is a special module that is officially maintained by the PostgreSQL Global Development Group and its source code is included in the PostgreSQL source code tree.postgres_fdw is gradually improved. Table 4.1 presents the release notes related to postgres_fdw from the official document.
鉴于上一节描述了 postgres_fdw 如何处理单表查询,以下小节介绍了 postgres_fdw 如何处理多表查询、排序操作和聚合函数。本小节重点介绍 SELECT 语句; 但是,postgres_fdw 也可以处理其他 DML(INSERT、UPDATE 和 DELETE)语句,如下所示。
Given that the previous section describes how postgres_fdw processes a single-table query the following subsection how postgres_fdw processes a multi-table query, sort operation and aggregate functions.This subsection focuses on the SELECT statement; however, postgres_fdw can also process other DML (INSERT, UPDATE, and DELETE) statements as shown below.PostgreSQL’s FDW does not detect deadlock.
postgres_fdw 和 FDW 特性不支持分布式锁管理器和分布式死锁检测特性。 因此,很容易产生死锁。 例如,如果 Client_A 更新本地表 ‘tbl_local’ 和外部表 ‘tbl_remote’ 并且 Client_B 更新 ‘tbl_remote’ 和 ‘tbl_local’,这两个事务处于死锁状态,但 PostgreSQL 无法检测到。 因此无法提交这些事务。
postgres_fdw and the FDW feature do not support the distributed lock manager and the distributed deadlock detection feature. Therefore, a deadlock can be easily generated. For example, if Client_A updates a local table ‘tbl_local’ and a foreign table ‘tbl_remote’ and Client_B updates ‘tbl_remote’ and ‘tbl_local’, these two transactions are in deadlock but could not be detected by PostgreSQL. Therefore these transactions could not be committed.
Multi-Table Query
要执行多表查询,postgres_fdw 使用单表 SELECT 语句获取每个外部表,然后将它们在本地服务器上进行join。在 9.5 或更早的版本中,即使外部表存储在同一个远程服务器中,postgres_fdw 也会单独获取它们并连接它们。在 9.6 或更高版本中,postgres_fdw 进行了改进,当外部表在同一台服务器上并且 use_remote_estimate 选项打开时,可以在远程服务器上执行远程连接操作。执行细节描述如下。
To execute a multi-table query, postgres_fdw fetches each foreign table using a single-table SELECT statement and then join them on the local server.In version 9.5 or earlier, even if the foreign tables are stored in the same remote server, postgres_fdw fetches them individually and joins them.In version 9.6 or later, postgres_fdw has been improved and can execute the remote join operation on the remote server when the foreign tables are on the same server and the use_remote_estimate option is on.The execution details are described as follows.
Version 9.5 or earlier:
让我们探索 PostgreSQL 如何处理以下连接两个外部表的查询:tbl_a 和 tbl_b。
查询的 EXPLAIN 命令的结果如下所示。
The result shows that the executor selects the merge join and is processed as the following steps:
Line 8: The executor fetches the table tbl_a using the foreign table scan.
Line 6: The executor sorts the fetched rows of tbl_a on the local server.
Line 11: The executor fetches the table tbl_b using the foreign table scan.
Line 9: The executor sorts the fetched rows of tbl_b on the local server.
Line 4: The executor carries out a merge join operation on the local server.
The following describes how the executor fetches the rows (Fig. 4.6).
(5-1) 启动远程事务。
(5-2) 声明游标c1,其SELECT语句如下图:SELECT id,data FROM public.tbl_a WHERE (id < 200)
(5-3) 执行 FETCH 命令获取游标 1 的结果。
(5-4) 声明游标c2,其SELECT语句如下图:SELECT id,data FROM public.tbl_b
。注意原双表查询的WHERE子句为“tbl_a.id = tbl_b.id AND tbl_a.id < 200”; 因此,可以在逻辑上将 WHERE 子句“tbl_b.id < 200”添加到 SELECT 语句中,如前所示。 但是, postgres_fdw 无法执行此推断; 因此,执行程序必须执行不包含任何 WHERE 子句的 SELECT 语句,并且必须获取外部表 tbl_b 的所有行。此过程效率低下,因为必须通过网络从远程服务器读取不必要的行。 此外,必须对接收到的行进行排序以执行合并连接。Note that the WHERE clause of the original double-table query is “tbl_a.id = tbl_b.id AND tbl_a.id < 200”; therefore, a WHERE clause “tbl_b.id < 200” can be logically added to the SELECT statement as shown previously. However, postgres_fdw cannot perform this inference; therefore, the executor has to execute the SELECT statement that does not contain any WHERE clauses and has to fetch all rows of the foreign table tbl_b.This process is inefficient because unnecessary rows must be read from the remote server via the network. Furthermore, the received rows must be sorted to execute the merge join.
(5-5) 执行 FETCH 命令获取游标 2 的结果。
(5-6) 关闭光标c1。
(5-7) 关闭光标c2。
(5-8) 提交事务。
executor收到行后,对收到的tbl_a和tbl_b行进行排序,然后对排序后的行执行merge join操作。
Version 9.6 or later:
如果 use_remote_estimate 选项打开(默认关闭),postgres_fdw 发送几个 EXPLAIN 命令来获取与外表相关的所有计划的成本。为了发送 EXPLAIN 命令,postgres_fdw 发送每个单表查询的 EXPLAIN 命令和 SELECT 语句的 EXPLAIN 命令来执行远程连接操作。 在这个例子中,以下七个 EXPLAIN 命令被发送到远程服务器以获得每个 SELECT 语句的估计成本; 然后计划者选择最便宜的计划。
If the use_remote_estimate option is on (the default is off), postgres_fdw sends several EXPLAIN commands to obtain the costs of all plans related to the foreign tables.To send the EXPLAIN commands, postgres_fdw sends both the EXPLAIN command of each single-table query and the EXPLAIN commands of the SELECT statements to execute remote join operations. In this example, the following seven EXPLAIN commands are sent to the remote server to obtain the estimated costs of each SELECT statement; the planner then selects the cheapest plan.
让我们在本地服务器上执行EXPLAIN命令,观察planner选择了什么plan。Let us execute the EXPLAIN command on the local server to observe what plan is selected by the planner.
结果表明planner选择了在远程服务器上处理的inner join查询,效率很高。下面描述 postgres_fdw 是如何执行的(图 4.7)。
(3-1) 启动远程事务。
(3-2) 执行 EXPLAIN 命令以估计每个计划路径的成本。在此示例中,执行了七个 EXPLAIN 命令。 然后,规划器使用执行的 EXPLAIN 命令的结果选择 SELECT 查询的成本最低的。
(5-1) 声明游标c1,其SELECT语句如下图:
(5-2) 从远程服务器接收结果。
(5-3) 关闭光标c1。
(5-4) 提交事务。
请注意,如果 use_remote_estimate 选项关闭(默认情况下),则很少选择远程连接查询,因为使用非常大的嵌入值估计成本。
Sort Operations
在 9.5 或更早的版本中,排序操作(例如 ORDER BY)在本地服务器上处理,即本地服务器在排序操作之前从远程服务器获取所有目标行。 让我们探讨如何使用 EXPLAIN 命令处理包含 ORDER BY 子句的简单查询。
第 6 行:执行器将以下查询发送到远程服务器,然后获取查询结果。SELECT id, data FROM public.tbl_a WHERE ((id < 200))
第 4 行:执行器对本地服务器上获取的 tbl_a 行进行排序。
在 9.6 或更高版本中,如果可能,postgres_fdw 可以在远程服务器上执行带有 ORDER BY 子句的 SELECT 语句。
第 4 行:执行器将包含 ORDER BY 子句的以下查询发送到远程服务器,然后获取已排序的查询结果。SELECT id, data FROM public.tbl_a WHERE ((id < 200)) ORDER BY id ASC NULLS LAST
Aggregate Functions
在 9.6 或更早版本中,类似于上一小节中提到的排序操作,AVG() 和 cont() 等聚合函数在本地服务器上按以下步骤处理。
第 5 行:执行器将以下查询发送到远程服务器,然后获取查询结果。SELECT id, data FROM public.tbl_a WHERE ((id < 200))
第 4 行:执行器计算本地服务器上获取的 tbl_a 行的平均值。
这个过程成本很高,因为发送大量行会消耗大量网络流量并且需要很长时间。在版本 10 或更高版本中,如果可能,postgres_fdw 在远程服务器上使用聚合函数执行 SELECT 语句。
第 4 行:执行器将包含 AVG() 函数的以下查询发送到远程服务器,然后获取查询结果。SELECT avg(data) FROM public.tbl_a WHERE ((id < 200))
这个过程显然是高效的,因为远程服务器计算平均值并且只发送一行作为结果。
与给定示例类似,下推是本地服务器允许远程服务器处理某些操作(例如聚合过程)的操作。
PostgreSQL之Foreign Data Wrappers使用指南