postgres主备切换之文件触发方式详解

原作者: [db:作者] 来自: [db:来源] 收藏邀请

本文测试参考PostgresSQL实战一书。

本文档测试环境：

主库IP：192.168.40.130 主机名：postgres 端口：5442

备库IP: 192.168.40.131 主机名：postgreshot 端口：5442

PostgreSQL9.0版本流复制主备切换只能通过创建触发文件方式进行，这一小节将介绍这种主备切换方式，测试环境为一主一备异步流复制环境，postgres上的数据库为主库，postgreshot上的数据库为备库，文件触发方式的手工主备切换主要步骤如下：

1）配置备库recovery.conf文件trigger_file参数，设置激活备库的触发文件路径和名称。

2）关闭主库，建议使用-m fast模式关闭。

3）在备库上创建触发文件激活备库，如果recovery.conf变成recovery.done表示备库已经切换成主库。

4）这时需要将老的主库切换成备库，在老的主库的$PGDATA目录下创建recovery.conf文件（如果此目录下不存在recovery.conf文件，可以根据$PGHOME/share/recovery.conf.sample模板文件复制一个，如果此目录下存在recovery.done文件，需将recovery.done文件重命名为recovery.conf），配置和老的从库一样，只是primary_conninfo参数中的IP换成对端IP。

5）启动老的主库，这时观察主、备进程是否正常，如果正常表示主备切换成功。

1、首先在备库上配置recovery.conf，如下所示：

[postgres@postgreshot pg11]$ cat recovery.conf | grep -v '^#'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.40.130 port=5442 user=replica application_name=pg1'  # e.g. 'host=localhost port=5432'
trigger_file = '/home/postgres/pg11/trigger'
[postgres@postgreshot pg11]$

trigger_file可以配置成普通文件或隐藏文件，调整以上参数后需重启备库使配置参数生效。

2、关闭主库，如下所示：

[postgres@postgres pg11]$ pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped
[postgres@postgres pg11]$

3、在备库上创建触发文件激活备库，如下所示：

[postgres@postgreshot pg11]$ ll recovery.conf 
-rwx------ 1 postgres postgres 5.9K Mar 26 18:47 recovery.conf
[postgres@postgreshot pg11]$ 
[postgres@postgreshot pg11]$ touch /home/postgres/pg11/trigger
[postgres@postgreshot pg11]$ ll recovery*
-rwx------ 1 postgres postgres 5.9K Mar 26 18:47 recovery.done
[postgres@postgreshot pg11]$

触发器文件名称和路径需和recovery.conf配置文件trigger_file保持一致，再次查看recovery文件时，发现后辍由原来的.conf变成了.done

查看备库数据库日志，如下所示：

2019-03-26 23:30:19.399 EDT [93162] LOG: replication terminated by primary server
2019-03-26 23:30:19.399 EDT [93162] DETAIL: End of WAL reached on timeline 3 at 0/50003D0.
2019-03-26 23:30:19.399 EDT [93162] FATAL: could not send end-of-streaming message to primary: no COPY in progress
2019-03-26 23:30:19.399 EDT [93158] LOG: invalid record length at 0/50003D0: wanted 24, got 0
2019-03-26 23:30:19.405 EDT [125172] FATAL: could not connect to the primary server: server closed the connection unexpectedly
  This probably means the server terminated abnormally
  before or while processing the request.
2019-03-26 23:30:24.410 EDT [125179] FATAL: could not connect to the primary server: could not connect to server: Connection refused
  Is the server running on host "192.168.40.130" and accepting
  TCP/IP connections on port 5442?
2019-03-26 23:31:49.505 EDT [93158] LOG: trigger file found: /home/postgres/pg11/trigger
2019-03-26 23:31:49.506 EDT [93158] LOG: redo done at 0/5000360
2019-03-26 23:31:49.506 EDT [93158] LOG: last completed transaction was at log time 2019-03-26 19:03:11.202845-04
2019-03-26 23:31:49.516 EDT [93158] LOG: selected new timeline ID: 4
2019-03-26 23:31:50.063 EDT [93158] LOG: archive recovery complete
2019-03-26 23:31:50.083 EDT [93157] LOG: database system is ready to accept connections

根据备库以上信息，由于关闭了主库，首先日志显示连接不上主库，接着显示发现了触发文件，之后显示恢复成功，数据库切换成读写模式。

这时根据pg_controldata输出进行验证，如下所示：

[postgres@postgreshot ~]$ pg_controldata | grep cluster
Database cluster state:  in production
[postgres@postgreshot ~]$

以上显示数据库角色已经是主库角色，在postgreshot上创建一张名为test_alived的表并插入数据，如下所示：

postgres=# CREATE TABLE test_alived2(id int4);
CREATE TABLE
postgres=# INSERT INTO test_alived2 VALUES(1);
INSERT 0 1
postgres=#

4、准备将老的主库切换成备库角色，在老的主库上配置recovery.conf，如下所示：

[postgres@postgres pg11]$ cat recovery.conf | grep -v '^#'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.40.131 port=5442 user=replica application_name=pg2'  # e.g. 'host=localhost port=5432'
trigger_file = '/home/postgres/pg11/trigger'
[postgres@postgres pg11]$

以上配置和postgreshot上的recovery.done配置文件基本一致，只是primary_conninfo参数的host选项配置成对端主机IP。

之后在postgres主机用户家目录创建~/.pgpass文件，如下所示：

[postgres@pghost1 ~]$ touch ~/.pgpass
[postgres@pghost1 ~]$ chmod 600 ~/.pgpass

并在~/.pgpass文件中插入以下内容：

[postgres@postgres ~]$ cat .pgpass
192.168.40.130:5442:replication:replica:replica
192.168.40.131:5442:replication:replica:replica
[postgres@postgres ~]

之后启动postgres上的数据库，如下所示：

[postgres@postgres ~]$ pg_ctl start
waiting for server to start....2019-03-26 23:38:50.424 EDT [55380] LOG: listening on IPv4 address "0.0.0.0", port 5442
2019-03-26 23:38:50.424 EDT [55380] LOG: listening on IPv6 address "::", port 5442
2019-03-26 23:38:50.443 EDT [55380] LOG: listening on Unix socket "/tmp/.s.PGSQL.5442"
2019-03-26 23:38:50.465 EDT [55381] LOG: database system was shut down in recovery at 2019-03-26 23:38:20 EDT
2019-03-26 23:38:50.465 EDT [55381] LOG: entering standby mode
2019-03-26 23:38:50.483 EDT [55381] LOG: consistent recovery state reached at 0/50003D0
2019-03-26 23:38:50.483 EDT [55381] LOG: invalid record length at 0/50003D0: wanted 24, got 0
2019-03-26 23:38:50.483 EDT [55380] LOG: database system is ready to accept read only connections
 done
server started
[postgres@postgres ~]$ 2019-03-26 23:38:50.565 EDT [55385] LOG: fetching timeline history file for timeline 4 from primary server
2019-03-26 23:38:50.588 EDT [55385] LOG: started streaming WAL from primary at 0/5000000 on timeline 3
2019-03-26 23:38:50.589 EDT [55385] LOG: replication terminated by primary server
2019-03-26 23:38:50.589 EDT [55385] DETAIL: End of WAL reached on timeline 3 at 0/50003D0.
2019-03-26 23:38:50.592 EDT [55381] LOG: new target timeline is 4
2019-03-26 23:38:50.594 EDT [55385] LOG: restarted WAL streaming at 0/5000000 on timeline 4
2019-03-26 23:38:50.717 EDT [55381] LOG: redo starts at 0/50003D0
 
[postgres@postgres ~]$ pg_controldata | grep cluster
Database cluster state:  in archive recovery
[postgres@postgres ~]$ 
 
postgres=# select * from test_alived2;
 id 
----
 1
(1 row)
 
postgres=#

同时，postgres上已经有了WAL接收进程，postgreshot上有了WAL发送进程，说明老的主库已经成功切换成备库，以上是主备切换的所有步骤。

为什么在步骤2中需要干净地关闭主库？数据库关闭时首先做一次checkpoint，完成之后通知WAL发送进程要关闭了，WAL发送进程会将截止此次checkpoint的WAL日志流发送给备库的WAL接收进程，备节点接收到主库最后发送来的WAL日志流后应用WAL，从而达到了和主库一致的状态。

另一个需要注意的问题是假如主库主机异常宕机了，如果激活备库，备库的数据完全和主库一致吗？此环境为一主一备异步流复制环境，备库和主库是异步同步方式，存在延时，这时主库上已提交事务的WAL有可能还没来得及发送给备库，主库主机就已经宕机了，因此异步流复制备库可能存在事务丢失的风险。

主备切换之pg_ctl promote方式

上面介绍了以文件触发方式进行主备切换，PostgreSQL9.1版本开始支持pg_ctl promote触发方式，相比文件触发方式操作更方便，promote命令语法如下：

pg_ctl promote [-D datadir]

-D是指数据目录，如果不指定会使用环境变量$PGDATA设置的值。promote命令发出后，运行中的备库将停止恢复模式并切换成读写模式的主库。

pg_ctl promote主备切换步骤和文件触发方式大体相同，只是步骤1中不需要配置recovery.conf配置文件中的trigger_file参数，并且步骤3中换成以pg_ctl promote方式进行主备切换，如下：

1）关闭主库，建议使用-m fast模式关闭。

2）在备库上执行pg_ctl promote命令激活备库，如果recovery.conf变成recovery.done表示备库已切换成为主库。

3）这时需要将老的主库切换成备库，在老的主库的$PGDATA目录下创建recovery.conf文件（如果此目录下不存在recovery.conf文件，可以根据$PGHOME/share/recovery.conf.sample模板文件复制一个，如果此目录下存在recovery.done文件，需将recovery.done文件重命名为recovery.conf），配置和老的从库一样，只是primary_conninfo参数中的IP换成对端IP。

4）启动老的主库，这时观察主、备进程是否正常，如果正常表示主备切换成功。以上是pg_ctl promote主备切换的主要步骤，这一小节不进行演示了，下一小节介绍pg_rewind工具时会给出使用pg_ctl promote进行主备切换的示例

pg_rewind

pg_rewind是流复制维护时一个非常好的数据同步工具，在上一节介绍流复制主备切换内容中讲到了主要有五个步骤进行主备切换，其中步骤2是在激活备库前先关闭主库，如果不做步骤2会出现什么样的情况？下面我们举例进行演示，测试环境为一主一备异步流复制环境，postgres上的数据库为主库，postgreshot上的数据库为备库。

主备切换

--备节点 recovery.conf 配置: postgreshot 上操作

备库recovery.conf配置如下所示：

[postgres@postgreshot pg11]$ cat recovery.conf | grep -v '^#'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.40.130 port=5442 user=replica application_name=pg1'  # e.g. 'host=localhost port=5432'
trigger_file = '/home/postgres/pg11/trigger'
[postgres@postgreshot pg11]$

--激活备节点: postgreshot 上操作

检查流复制状态，确保正常后在备库主机上执行以下命令激活备库，如下所示

[postgres@postgreshot pg11]$ pg_ctl promote -D $PGDATA
waiting for server to promote.... done
server promoted
[postgres@postgreshot pg11]$ 
[postgres@postgreshot pg11]$

查看备库数据库日志，能够看到数据库正常打开接收外部连接的信息，这说明激活成功，检查postgreshot上的数据库角色，如下所示:

[postgres@postgreshot pg11]$ pg_controldata | grep cluster
Database cluster state:  in production
[postgres@postgreshot pg11]$

从pg_controldata输出也可以看到postgreshot上的数据库已成为主库，说明postgreshot上的数据库已经切换成主库，这时老的主库（postgres上的数据库）依然还在运行中，我们计划将postgres上的角色转换成备库，先查看postgres上的数据库角色，如下所示

[postgres@postgres pg11]$ pg_controldata | grep cluster
Database cluster state:  in production
[postgres@postgres pg11]$

--备节点激活后，创建一张测试表并插入数据

postgres=# create table test_1(id int4);
CREATE TABLE
postgres=# insert into test_1(id) select n from generate_series(1,10) n;
INSERT 0 10
postgres=#

--停原来主节点: postgres 上操作

[postgres@postgres pg11]$ pg_controldata | grep cluster
Database cluster state:  in production
[postgres@postgres pg11]$ 
[postgres@postgres pg11]$ pg_ctl stop -m fast -D $PGDATA
2019-03-27 01:10:46.714 EDT [64858] LOG: received fast shutdown request
waiting for server to shut down....2019-03-27 01:10:46.716 EDT [64858] LOG: aborting any active transactions
2019-03-27 01:10:46.717 EDT [64858] LOG: background worker "logical replication launcher" (PID 64865) exited with exit code 1
2019-03-27 01:10:46.718 EDT [64860] LOG: shutting down
2019-03-27 01:10:46.731 EDT [64858] LOG: database system is shut down
 done
server stopped
[postgres@postgres pg11]$

--pg_rewind: postgres 上操作

[postgres@postgreshot pg11]$ pg_rewind --target-pgdata $PGDATA --source-server='host=192.168.40.131 port=5442 user=replica password=replica'
 
target server needs to use either data checksums or " = on"
Failure, exiting
[postgres@postgreshot pg11]$

备注：数据库在 initdb 时需要开启 checksums 或者设置 "wal_log_hints = on"，接着设置主，备节点的 wal_log_hints 参数并重启数据库。

[postgres@postgres pg11]$ pg_rewind --target-pgdata $PGDATA --source-server='host=192.168.40.131 port=5442 user=replica password=replica'
servers diverged at WAL location 0/70001E8 on timeline 5
rewinding from last common checkpoint at 0/6000098 on timeline 5
Done!
[postgres@postgres pg11]$ 
[postgres@postgres pg11]$

备注：pg_rewind 成功。

--调整 recovery.conf 文件: postgres 操作

[postgres@postgres pg11]$ mv recovery.done recovery.conf
[postgres@postgres pg11]$ 
[postgres@postgres pg11]$ cat recovery.conf | grep -v '^#'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.40.131 port=5442 user=replica application_name=pg2'  # e.g. 'host=localhost port=5432'
trigger_file = '/home/postgres/pg11/trigger'
[postgres@postgres pg11]$

--启动原主库， postgres 上操作

[postgres@postgres pg11]$ pg_ctl start -D $PGDATA
waiting for server to start....2019-03-27 01:14:48.028 EDT [66323] LOG: listening on IPv4 address "0.0.0.0", port 5442
2019-03-27 01:14:48.028 EDT [66323] LOG: listening on IPv6 address "::", port 5442
2019-03-27 01:14:48.031 EDT [66323] LOG: listening on Unix socket "/tmp/.s.PGSQL.5442"
2019-03-27 01:14:48.045 EDT [66324] LOG: database system was interrupted while in recovery at log time 2019-03-27 01:08:08 EDT
2019-03-27 01:14:48.045 EDT [66324] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2019-03-27 01:14:48.084 EDT [66324] LOG: entering standby mode
2019-03-27 01:14:48.089 EDT [66324] LOG: redo starts at 0/6000060
2019-03-27 01:14:48.091 EDT [66324] LOG: invalid record length at 0/7024C98: wanted 24, got 0
2019-03-27 01:14:48.096 EDT [66331] LOG: started streaming WAL from primary at 0/7000000 on timeline 6
2019-03-27 01:14:48.109 EDT [66324] LOG: consistent recovery state reached at 0/7024CD0
2019-03-27 01:14:48.110 EDT [66323] LOG: database system is ready to accept read only connections
 done
server started
[postgres@postgres pg11]$ 
[postgres@postgres pg11]$ pg_controldata | grep cluster
Database cluster state:  in archive recovery
[postgres@postgres pg11]$

--数据验证, postgres 上操作

[postgres@postgres pg11]$ p
psql (11.1)
Type "help" for help.
 
postgres=# select count(*) from test_1;
 count 
-------
 10
(1 row)
 
postgres=#

备注：pg_rewind 成功，原主库现在是以备库角色启动，而且数据表 test_1 也同步过来了。

pg_rewind 原理

The basic idea is to copy everything from the new cluster to the old cluster, except for the blocks that we know to be the same.

1)Scan the WAL log of the old cluster, starting from the last checkpoint before the point where the new cluster's timeline history forked off from the old cluster. For each WAL record, make a note of the data blocks that were touched. This yields a list of all the data blocks that were changed in the old cluster, after the new cluster forked off.

2)Copy all those changed blocks from the new cluster to the old cluster.

3)Copy all other files like clog, conf files etc. from the new cluster to old cluster. Everything except the relation files.

4) Apply the WAL from the new cluster, starting from the checkpoint created at failover. (Strictly speaking, pg_rewind doesn't apply the WAL, it just creates a backup label file indicating that when PostgreSQL is started, it will start replay from that checkpoint and apply all the required WAL.)

补充：postgres主备搭建时踩坑点

搭建pg主备流复制时的踩坑集锦

1: socket 路径问题报错如下

你好！这是你第一次使用 **Markdown编辑器** 所展示的欢迎页。如果你想学习如何使用Markdown编辑器,仔细阅读这篇文章，了解一下Markdown的基本语法知识。解决方法：修改postgres.conf中unix_socket_permissions = ‘*' 路径修改为上述报错中的路径重启即可

2:搭建主备时备库的data目录一定一定一定要使用主库基础备份出来的数据。可采用pg_basebackup 的方式，也可以采用tar包打包解包的方式进行基础备份

如果备库不小心已经初始化过请删除data目录下的* 并使用主库的基础备份重新启动

3:备库启动时报错 FATAL: no pg_hba.conf entry for replication connection from host “172.20.0.16”, user “repl” 之类的问题

例如 master：IP： *.1 standby：IP *.2 主备账号repl

那么在pg_hba.cnf中单单指明 host replication repl *.2 md5 是不行的

还需在此条记录前面添加 host all all *.2 md5

首先要能访问主库才会资格使用repl账号进行同步的步骤

以上为个人经验，希望能给大家一个参考，也希望大家多多支持极客世界。如有错误或未考虑完全的地方，望不吝赐教。