PostgreSQL 17：在性能、邏輯複製等方面的一大步

教學

PostgreSQL

在2024年8月顺利进行第3次測試後，PostgreSQL開發團隊在9月26日发布了GA版本。最近，我在部落格中撰寫了有关您在PostgreSQL 17中將看到的關鍵邏輯複製功能的文章。在本文中，我將介紹Postgres 17中的一些新的性能特點，以及我在本系列早期的部落格中未曾覆盖的另一項重要的邏輯複製功能。

PostgreSQL在這些年頭以來已經顯著成長，並且隨著每次主要版本發布，它已經成為一個更健壯、可靠和 responsive 的數據庫，無論是對於關鍵任務中午及非關鍵任務的企業應用。全球性和充滿活力的PostgreSQL社區正在為PostgreSQL的有成貢獻，勤勉地確保所有變動在加入項目源代碼之前都被仔細检驗和審核。看到如Microsoft、Google、Apple等大型技術名稱投資於Postgres，通過發展內部專業知識並回饋給開源社區，這也是非常鼓舞人心的。

邏輯複製的改善正在為將分佈式PostgreSQL支持添加到核心功能奠定基礎。分佈式PostgreSQL指的是在分佈式架構中實現PostgreSQL，允許在多個節點上進行 enhanced scalability、fault tolerance以及 Improved performance。

現在，不再多說了，讓我們來討論一些PostgreSQL 17的性能特點。

使用 Materialized CTEs 提高查詢性能

常見表表達式（CTEs）是PostgreSQL中的暂时性結果集，可以在SELECT、INSERT、UPDATE或DELETE語句中引用。它們有助於提高複雜查詢的可讀性和組織性，並且可以遞歸，使其特別適合於分层數據。CTE查詢的基本語法如下：

WITH cte_names AS  
(– QUERY here ) 
Select * from cte_names;

在查詢中包含WITH關鍵詞以創建CTE；母查詢（定義結果集）在CTE名稱之後跟隨著AS語句。在定義CTE後，您可以通過名稱引用CTE的結果集，並在同一個查詢中对結果集進行進一步操作。

PostgreSQL 17繼續提升CTEs的性能和功能，包括在查詢計劃和執行方面的改進。早期版本的Postgres將CTE視為優化圍欄，意味著計劃器不能將谓詞推進它們中。然而，從PostgreSQL 12開始，您可以定義更有效的執行計劃。當性能至关重要的時候，您應該總是分析您的查詢並考慮執行計劃。

性能提示：如果你將多次引用同一結果集，請使用 MATERIALIZED 關鍵詞創建 CTE。當你創建 materialized CTE 時，Postgres 會計算並存放父查询的結果。然後，如果多次引用 CTE，後續查询就不需要多次進行複雜的計算。

從 CTE 引用中提取列統計信息，Postgres 17 改善了 materialized CTE

一個 materialized CTE 基本上作為優化圍欄，意味著一旦選擇了子查询的计划，外层查询不會影響子查询的计划。外层查询可以看到 CTE 結果集的估計宽度和行數，因此將列統計信息從子查询傳播到外层查询的计划是合理的。外层查询可以利用可用的任何信息，使得列統計信息傳播到外层查询计划，但不會下降到 CTE 计划。

這個問題向社區報告包含了一個簡單的測試案例，可以示範這種改進對查询計劃的影響。

示例：比較 Postgres 16 行為與 Postgres 17

首先，在 Postgres 16 中創建我們的和工作區，並對兩個表和索引运行 ANALYZE；

postgres=# create table t1(a int);
CREATE TABLE
postgres=# create table t2(b int);
CREATE TABLE
postgres=# create index my_index on t1 using btree (a);
CREATE INDEX
postgres=# insert into t1 select generate_series(1, 100000) from generate_series(1, 3);
INSERT 0 300000
postgres=# insert into t2 select generate_series(1, 100) from generate_series(1, 10);
INSERT 0 1000
postgres=# analyze t1;
ANALYZE
postgres=# analyze t2;
ANALYZE

Then, we create our materialized CTE:

postgres=# explain analyze with my_cte as materialized (select b from t2) select *
from t1 where t1.a in (select b from my_cte);

The query plan from our Postgres 16 code sample contains:

QUERY PLAN         
----------------------------------------------------------------------
 Nested Loop  (cost=37.92..856.50 rows=2966 width=4) (actual time=0.574..0.722 rows=300 loops=1)
   CTE my_cte
     ->  Seq Scan on t2  (cost=0.00..15.00 rows=1000 width=4) (actual time=0.038..0.161 rows=1000 loops=1)
   ->  HashAggregate  (cost=22.50..24.50 rows=200 width=4) (actual time=0.449..0.461 rows=100 loops=1)
         Group Key: my_cte.b
         Batches: 1  Memory Usage: 40kB
         ->  CTE Scan on my_cte  (cost=0.00..20.00 rows=1000 width=4) (actual time=0.046..0.322 rows=1000 loops=1)
   ->  Index Only Scan using my_index on t1  (cost=0.42..4.06 rows=3 width=4) (actual time=0.002..0.002 rows=3 loops=1
00)
         Index Cond: (a = my_cte.b)
         Heap Fetches: 0
 Planning Time: 1.242 ms
 Execution Time: 1.051 ms
(12 rows)

在查詢計劃中，你可以看到子查詢的200行列統計數據不正確，這影響了整體計劃。

   ->  HashAggregate  (cost=22.50..24.50 rows=200 width=4) (actual time=0.449..0.461 rows=100 loops=1)
         Group Key: my_cte.b

然後，我們對PostgreSQL 17進行了相同的設定和查詢測試：

postgres=# explain analyze with my_cte as materialized (select b from t2) select *
from t1 where t1.a in (select b from my_cte);
                                                            QUERY PLAN                           
                                 
-------------------------------------------------------------------------------------------------
---------------------------------
 Merge Join  (cost=42.25..54.29 rows=302 width=4) (actual time=0.627..0.712 rows=300 loops=1)
   Merge Cond: (t1.a = my_cte.b)
   CTE my_cte
     ->  Seq Scan on t2  (cost=0.00..15.00 rows=1000 width=4) (actual time=0.031..0.134 rows=1000
 loops=1)
   ->  Index Only Scan using my_index on t1  (cost=0.42..7800.42 rows=300000 width=4) (actual tim
e=0.027..0.049 rows=301 loops=1)
         Heap Fetches: 0
   ->  Sort  (cost=26.82..27.07 rows=100 width=4) (actual time=0.598..0.604 rows=100 loops=1)
         Sort Key: my_cte.b
         Sort Method: quicksort  Memory: 25kB
         ->  HashAggregate  (cost=22.50..23.50 rows=100 width=4) (actual time=0.484..0.494 rows=1
00 loops=1)
               Group Key: my_cte.b
               Batches: 1  Memory Usage: 24kB
               ->  CTE Scan on my_cte  (cost=0.00..20.00 rows=1000 width=4) (actual time=0.033..0
.324 rows=1000 loops=1)
 Planning Time: 1.066 ms
 Execution Time: 0.946 ms
(15 rows)

如你所見，在Postgres 17的查詢計劃中，子查詢的列統計數據已正確傳遞到外部查詢的上層計劃器。這幫助PostgreSQL選擇了更好的計劃，從而提高了查詢的執行時間。

這是一個簡單的查詢，但對於更大和更複雜的查詢，這種更改可能會導致重大的性能差異。

從CTE傳遞路徑键到外部查詢

Postgres 17中CTE功能的另一个有趣改進是將路徑鍵從子查詢傳遞到外部查詢。在PostgreSQL中，路徑鍵是 Query Execution Planning 過程的一部分，主要用於對需要有序結果的查询（如帶有 ORDER BY 子句的查询）進行排序和排序行，或在需要排序的其他操作（如合併連接）中。

在Postgres 17之前，即使排序順序由索引扫瞄節點或排序節點保證，物料化的CTE子查詢的排序順序也不会與外部查詢共享。沒有保證的排序順序讓PostgreSQL計劃器選擇較未優化的計劃，而擁有保證的排序順序則使其更可能會選擇優化的計劃。

在 PostgreSQL 17 中，如果一個公用表 expressions (CTE) 被實例化並且有特定的排序順序，規劃器可以重用這個信息在外的查詢中，通過避免重複排序或者启用更高效的 Join 方法來提高性能。如 Tom Lane 在提交注释中所提到的：

“將路徑键 (pathkeys) 提升到外部查詢的代碼已經存在於一般的 RTE_SUBQUERY 子查詢中，但它們尚未用於 CTEs，這可能是因为要保持 CTE 和外部查詢之間的優化圍栏。”

對 Postgres 源代碼的這個簡單修改應該會為涉及複雜 CTEs 的查詢帶來性能提升，特別是那些可以基於 CTE 結果的固有順序進行排序或合併 Join 优化的情況。

以下是一個使用 PostgreSQL 回归數據的示例：

postgres=# CREATE TABLE tenk1 (
postgres(#         unique1         int4,
postgres(#         unique2         int4,
postgres(#         two                     int4,
postgres(#         four            int4,
postgres(#         ten                     int4,
postgres(#         twenty          int4,
postgres(#         hundred         int4,
postgres(#         thousand        int4,
postgres(#         twothousand     int4,
postgres(#         fivethous       int4,
postgres(#         tenthous        int4,
postgres(#         odd                     int4,
postgres(#         even            int4,
postgres(#         stringu1        name,
postgres(#         stringu2        name,
postgres(#         string4         name
postgres(# );
CREATE TABLE
postgres=# CREATE INDEX tenk1_unique1 ON tenk1 USING btree(unique1 int4_ops);
CREATE INDEX
postgres=# \copy tenk1 FROM '~/projects/postgres/src/test/regress/data/tenk.data';
COPY 10000
postgres=# VACUUM ANALYZE tenk1;
VACUUM

我們的 Postgres 16 代碼樣本中的查詢計劃如下：

postgres=# explain analyze with x as materialized (select unique1 from tenk1 b order by unique1)
select count(*) from tenk1 a
  where unique1 in (select * from x);
                                                                  QUERY PLAN                                                               
   
-------------------------------------------------------------------------------------------------------------------------------------------
---
 Aggregate  (cost=764.29..764.30 rows=1 width=8) (actual time=21.592..21.593 rows=1 loops=1)
   CTE x
     ->  Index Only Scan using tenk1_unique1 on tenk1 b  (cost=0.29..306.29 rows=10000 width=4) (actual time=0.046..1.415 rows=10000 loops=
1)
           Heap Fetches: 0
   ->  Nested Loop  (cost=225.28..445.50 rows=5000 width=0) (actual time=7.545..20.911 rows=10000 loops=1)
         ->  HashAggregate  (cost=225.00..227.00 rows=200 width=4) (actual time=7.535..9.051 rows=10000 loops=1)
               Group Key: x.unique1
               Batches: 1  Memory Usage: 929kB
               ->  CTE Scan on x  (cost=0.00..200.00 rows=10000 width=4) (actual time=0.070..3.933 rows=10000 loops=1)
         ->  Index Only Scan using tenk1_unique1 on tenk1 a  (cost=0.29..1.08 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=10000)
               Index Cond: (unique1 = x.unique1)
               Heap Fetches: 0
 Planning Time: 0.806 ms
 Execution Time: 21.890 ms
(14 rows)

我們的 Postgres 17 代碼樣本中的查詢計劃如下：

postgres=# explain analyze with x as materialized (select unique1 from tenk1 b order by unique1)
select count(*) from tenk1 a
  where unique1 in (select * from x);
                                                                    
QUERY PLAN                                                                    
------------------------------------------------------------------------------------------------------------------------------- 聚合  (成本=987.55..987.56 行=1 寬度=8) (實際時間=8.777..8.778 行=1 圈數=1)
   CTE x
     ->  Index Only Scan using tenk1_unique1 on tenk1 b  (cost=0.29..306.29 rows=10000 width=4) (actual time=0.010..1.095 rows=100
00 loops=1)
           Heap Fetches: 0
   ->  Merge Semi Join  (cost=0.31..656.26 rows=10000 width=0) (actual time=0.037..8.024 rows=10000 loops=1)
         Merge Cond: (a.unique1 = x.unique1)
         ->  Index Only Scan using tenk1_unique1 on tenk1 a  (cost=0.29..306.29 rows=10000 width=4) (actual time=0.013..1.262 rows
=10000 loops=1)
               Heap Fetches: 0
         ->  CTE Scan on x  (cost=0.00..200.00 rows=10000 width=4) (actual time=0.016..3.678 rows=10000 loops=1)
 Planning Time: 0.800 ms
 Execution Time: 8.899 ms
(11 rows)

PostgreSQL 16 和 PostgreSQL 17 的查詢计划在這個版本17的增強影響下有顯著的不同。这是一个小例子；你可以在更大的查询中看到性能提升會非常顯著。请注意，这个增強只有在 CTE 子查询中包含 ORDER BY 子句時才有效。

對串列的快速 B-Tree 索引扫瞄

在 PostgreSQL 中，ScalarArrayOpExpr 是执行动态计划时处理涉及 IN 或 ANY 操作数组或列表值的节点类型。它特别适用于比较列与一组值的情况，例如：SELECT * FROM table WHERE column = ANY(ARRAY[1, 2, 3]);。

ScalarArrayOpExpr 允许 PostgreSQL 优化涉及多个使用 IN 或 ANY 的比较查询。PostgreSQL 17 引入了新的性能提升，使得这些操作更加快速。

在 PostgreSQL 17 中，对 B-tree 索引扫瞄进行了重大改进，这优化了性能，特别是对于具有大 IN 列表或 ANY 条件的查询。这些改进减少了系统执行的索引扫瞄次数，从而降低了 CPU 和缓冲区页面的争用，导致查询执行更快。

关键改善之一是在处理标量數组操作表達式（SAOP），這讓對於B-樹索引的遍歷更加高效，特別是對於多維查询。例如，當您有多个索引列（每个都有自己的IN列表）时，PostgreSQL 17现在可以更高效地在单次索引扫描中处理这些操作，而不是像早期版本那样进行多次扫描。这可以在页面访问之前是瓶頸的CPU绑定于工作负载中带来20-30%的性能提升。

此外，PostgreSQL 17引入了更好的内部锁管理，进一步提高了高并发工作负载的性能，尤其是在B-树索引内扫描多个维度时。

我们可以用一个简单的例子来演示这一点。我们将使用与Postgres回归套件中上一个例子相同的tenk1表和数据。

我们的例子，首先在Postgres 16上运行：

CREATE TABLE tenk1 (
postgres(#         unique1         int4,
postgres(#         unique2         int4,
postgres(#         two                     int4,
postgres(#         four            int4,
postgres(#         ten                     int4,
postgres(#         twenty          int4,
postgres(#         hundred         int4,
postgres(#         thousand        int4,
postgres(#         twothousand     int4,
postgres(#         fivethous       int4,
postgres(#         tenthous        int4,
postgres(#         odd                     int4,
postgres(#         even            int4,
postgres(#         stringu1        name,
postgres(#         stringu2        name,
postgres(#         string4         name
postgres(# );
CREATE TABLE
postgres=# CREATE INDEX tenk1_unique1 ON tenk1 USING btree(unique1 int4_ops);
CREATE INDEX
postgres=# \copy tenk1 FROM '~/projects/postgres/src/test/regress/data/tenk.data';
COPY 10000
postgres=# EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM tenk1 WHERE unique1 IN (1, 2, 3);
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tenk1  (cost=14.20..330.12 rows=176 width=244) (actual time=0.138..0.153 rows=3 loops=1)
   Recheck Cond: (unique1 = ANY ('{1,2,3}'::integer[]))
   Heap Blocks: exact=3
   Buffers: shared hit=9
   ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..14.16 rows=176 width=0) (actual time=0.102..0.102 rows=3 loops=1)
         Index Cond: (unique1 = ANY ('{1,2,3}'::integer[]))
         Buffers: shared hit=6
 Planning:
   Buffers: shared hit=2
 Planning Time: 0.900 ms
 Execution Time: 0.242 ms
(11 rows)
postgres=# SELECT idx_scan, idx_tup_fetch FROM pg_stat_user_tables WHERE relname = 'tenk1';
 idx_scan | idx_tup_fetch 
----------+---------------
        3 |             3
(1 row)

在之前的查询中，您可以看到IN查询的共享缓冲区命中数为9，并且需要3次索引扫描才能从索引扫描中得到结果。在PostgreSQL中，共享命中这个术语是指与缓冲区管理相关的一种特定的缓存命中。共享命中发生在PostgreSQL从共享缓冲区池访问数据块或页面，而不是从磁盘上，从而提高查询性能。

同一个例子，这次在Postgres 17上运行：

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM tenk1 WHERE unique1 IN (1, 2, 3);
                                                      QUERY PLAN                                                       
-----------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on tenk1  (cost=12.88..24.08 rows=3 width=244) (actual time=0.043..0.054 rows=3 loops=1)
   Recheck Cond: (unique1 = ANY ('{1,2,3}'::integer[]))
   Heap Blocks: exact=3
   Buffers: shared hit=5
   ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..12.88 rows=3 width=0) (actual time=0.026..0.026 rows=3 loops=1)
         Index Cond: (unique1 = ANY ('{1,2,3}'::integer[]))
         Buffers: shared hit=2
 Planning:
   Buffers: shared hit=59
 Planning Time: 0.479 ms
 Execution Time: 0.116 ms
(11 rows)

postgres=# SELECT idx_scan, idx_tup_fetch FROM pg_stat_user_tables WHERE relname = 'tenk1';
 idx_scan | idx_tup_fetch 
----------+---------------
        1 |             3
(1 row)

如你所見，在 Postgres 17 中共享緩衝区的命中率減少到 5，最重要的是它只進行一次索引掃瞄（與 Postgres 16 中的 3 次掃瞄相比）。隨著 Postgres 17 的這種改進，單量陣列操作的性能得到了大幅提高，並且 Postgres 能夠選擇更好的優化查詢計劃。

在升級過程中保留邏輯複製槽位和訂閱

在主要升級過程中保留邏輯複製槽位並迁移訂閱依賴是 PostgreSQL 17 新增的另一個邏輯複製功能。請注意，這個功能只有在從 PostgreSQL 17 升級到較新版本時才有效，對於在 Postgres 17 之前的升級是不支持的。複製槽位和複製起源是在建立邏輯複製環境時生成的。然而，此信息是針對節點特定的，以記錄複製狀態、應用狀態和 WAL 傳輸狀態，因此它們不會作為升級過程的一部分進行升級。一旦發布節點升級完成，用戶需要手動構建這些物件。

在 PostgreSQL 17 中，pg_upgrade 過程已經改進以引用和重建這些內置物件；這種功能使得在升級具有邏輯複製的節點時，複製可以自動恢復。 previously, when performing a major version upgrade, users had to drop logical replication slots, requiring them to re-synchronize data with the subscribers after the upgrade. This added complexity and increased downtime during upgrades. 在此之前，當執行主要版本升級時，用戶必須刪除邏輯複製槽位，要求他们在升級後與訂閱者重新對齊數據。這增加了升級过程中的複雜性並延長了宕機時間。

在升級發布節點 clusters 时，您需要按照以下步驟進行：

確保任何對發布節點的訂閱已被暂时禁用，通過執行 ALTER SUBSCRIPTION….DISABLE 命令。這些在升級過程完成後會被啟用。
將新節點的 wal_level 設定為 logical。
新節點上的 max_replication_slots 必須設定為大於或等於舊節點上的复製插槽數。
用作插槽的输出插件必須已在新節點上安裝。
在升級之前，所有旧節點的更改已經複製到目標節點。
舊節點上的所有插槽必須可用；您可以通過檢查 pg_replication_slots 視圖中的衝突列來確保這一點。對於舊節點上的所有插槽，衝突應該為假。
新節點上不应有任何 pg_replication_slots 視圖中 Temporary 列的值為 false 的插槽。新節點上不應該有永久的邏輯複製插槽。

如果未滿足上述的先決條件，則升級複製插槽的 pg_upgrade 過程將導致錯誤。

結論

在 PostgreSQL 17 中，社群的焦点依旧放在讓 PostgreSQL 變得更加性能優化、可伸縮、安全，以及企業級ready。Postgres 17 也通過新增與兼容性相關的功能，以及讓既有功能更強大和健壯，來提升開發者體驗。

在 17 版本以外，PostgreSQL 將繼續成長、進步，並通過添加分片功能來提高水平可伸縮性，以迎合需要更可伸縮數據庫的企業應用程序。這些年來，水平及垂直可伸縮性都有所改進，但肯定还有提升水平能力的空間。我們將看到更多邏輯複製的改進，並且在 DDL 複製或複製缺少的物件（如序列）和更好的節點管理方面還會有所作為。社群也認識到需要讓 PostgreSQL 更兼容，因此 Postgres 17 中有 MERGE 命令的改進，並且计划在 Postgres 17 之後引入更多兼容性功能。

Source:
https://dzone.com/articles/postgresql-17-a-major-step-forward-in-performance