在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
PostgreSQL流复制实现HA主备切换环境说明和主机规划
基础环境配置(所有主机操作) 配置HOSTS
配置统一的时间(若已配置,请忽略) yum install -y ntpdate && ntpdate ntp1.aliyun.com echo -e "# sync time from ntp1.aliyun.com\n5 * * * * /usr/sbin/ntpdate ntp1.aliyun.com > /dev/null 2>&1 " >> /var/spool/cron/root # 写入定时任务,执行一次即可 创建postgres用户
配置免密钥登陆 su - postgres ssh-keygen -t rsa -f /home/postgres/.ssh/id_rsa -P "" cd ~/.ssh/ ssh-copy-id postgres@master # 三台主机执行 scp authorized_keys postgres@slave:~/.ssh # 只在master主机执行 scp authorized_keys postgres@pool:~/.ssh # 只在master主机执行 安装Postgresql数据库(PG9.6)
创建统一的目录结构
修改系统变量 vi /etc/profile #增加以下内容 export PGHOME=/usr/pgsql-9.6/ export PGDATA=/data1/pg_data export PGPORT=54321 export PATH=$PATH:$PGHOME/bin # 生效 source /etc/profile PostgreSQL流复制结构(master和slave主机操作)master主机操作 初始化系统
修改postgresql-9.6.service 内容如下: # Include the default config: .include /usr/lib/systemd/system/postgresql-9.6.service [Service] Environment=PGDATA=/data1/pg_data 重启PG服务 systemctl daemon-reload su - postgres -c '/usr/pgsql-9.6/bin/initdb -D /data1/pg_data' systemctl restart postgresql-9.6 systemctl enable postgresql-9.6.service 修改系统配置(以下用postgres用户操作) cp /data1/pg_data/pg_hba.conf{,.bak} cat >/data1/pg_data/pg_hba.conf<<EOF local all all trust host all all 10.0.0.11/32 trust host all all 10.0.0.12/32 trust host all all 0.0.0.0/0 md5 host all all ::1/128 trust host replication stream_replication 0.0.0.0/0 md5 EOF #host replication stream_replication 0.0.0.0/0 md5 为流复制用户 64G cp /data1/pg_data/postgresql.conf{,.bak} cat >/data1/pg_data/postgresql.conf<<EOF listen_addresses = '*' port = 54321 max_connections = 256 shared_buffers = 16GB effective_cache_size = 48GB work_mem = 64MB maintenance_work_mem = 2GB min_wal_size = 2GB max_wal_size = 4GB checkpoint_completion_target = 0.9 wal_buffers = 16MB default_statistics_target = 100 wal_level = hot_standby wal_log_hints = on max_wal_senders = 1 hot_standby = on logging_collector = on log_directory = 'pg_log' EOF #操作完记得重启 pg_ctl restart 128G listen_addresses = '*' port = 54321 max_connections = 256 shared_buffers = 32GB effective_cache_size = 96GB work_mem = 128MB maintenance_work_mem = 2GB min_wal_size = 2GB max_wal_size = 4GB checkpoint_completion_target = 0.9 wal_buffers = 16MB default_statistics_target = 100 wal_level = hot_standby wal_log_hints = on max_wal_senders = 1 hot_standby = on logging_collector = on log_directory = 'pg_log' 在主库中创建流复制用户(stream_replication)和PGPool用户(srcheck)
修改主库pg_hba.conf文件(已操作见cat >/data1/pg_data/pg_hba.conf<<EOF)
slave主机操作 初始化系统
修改postgresql-9.6.service 内容如下: # Include the default config: .include /usr/lib/systemd/system/postgresql-9.6.service [Service] Environment=PGDATA=/data1/pg_data 重启PG服务
基础备份复制到备库服务器
修改备库配置信息
增加以下内容 standby_mode='on' primary_conninfo = 'host=master port=54321 user=stream_replication password=your_password' restore_command = '' recovery_target_timeline = 'latest' # 重启PG服务 systemctl restart postgresql-9.6 systemctl enable postgresql-9.6.service 验证 主节点执行 create table test (id int4, create_time timestamp(0) without time zone); insert into test values (1, now()); select * from test; 备节点执行
其他查询 进入测试数据库test,主库上执行如下命令返回f,备库上返回t。 select pg_is_in_recovery(); 执行如下命令查看快照,它返回主库记录点、备库记录点;主库每增加一条写入,记录点的值就会加1。
执行如下命令可以查看主备同步状态。
字段state显示的同步状态有:startup(连接中)、catchup(同步中)、streaming(同步);字段sync_state显示的模式有:async(异步)、sync(同步)、potential(虽然现在是异步模式,但是有可能升级到同步模式)。 主备切换假设主库崩溃了,备库如何从只读状态切换为读写状态呢?只要把备库的postgresql.conf中hot_standby修改为off,并且删除recovery.conf,然后重启库就可以提供服务了。 PGPool2(pool主机操作) 安装PGPool2 yum install -y http://www.pgpool.net/yum/rpms/3.6/redhat/rhel-7-x86_64/pgpool-II-release-3.6-1.noarch.rpm yum -y install pgpool-II-pg96 pgpool-II-pg96-debuginfo pgpool-II-pg96-devel pgpool-II-pg96-extensions systemctl enable pgpool.service #开启自动启动 添加Pgpool-II运行用户 useradd postgres # 环境准备时已操作 chown -R postgres.postgres /etc/pgpool-II chown -R postgres.postgres /var/run/pgpool/ 配置pool_hba.conf
增加内容
配置pcp.conf 主节点登陆后执行: postgres=# select rolname,rolpassword from pg_authid; rolname | rolpassword --------------------+------------------------------------- pg_signal_backend | srcheck | md5662c10f61b27a9ab38ce69157186b25f postgres | md5d3612d57ee8d4c147cf27b11e3a0974d stream_replication | md59279ef6b904bc483e4f85e6d44cfc0ed (4 rows)
增加SQL执行结果的内容,形式为$rolname:$rolpassword例如:
或者:
配置pgpool.conf
内容如下: # CONNECTIONS listen_addresses = '*' port = 54321 socket_dir = '/var/run/pgpool' pcp_listen_addresses = '*' pcp_port = 9898 pcp_socket_dir = '/var/run/pgpool' # - Backend Connection Settings - backend_hostname0 = 'master' backend_port0 = 54321 backend_weight0 = 1 backend_data_directory0 = '/data1/pg_data' backend_flag0 = 'ALLOW_TO_FAILOVER' backend_hostname1 = 'slave' backend_port1 = 54321 backend_weight1 = 1 backend_data_directory1 = '/data1/pg_data' backend_flag1 = 'ALLOW_TO_FAILOVER' # - Authentication - enable_pool_hba = on pool_passwd = 'pool_passwd' # FILE LOCATIONS pid_file_name = '/var/run/pgpool/pgpool.pid' logdir = '/data1/pg_logs' replication_mode = off load_balance_mode = on master_slave_mode = on master_slave_sub_mode = 'stream' sr_check_period = 5 sr_check_user = 'srcheck' sr_check_password = '123456' sr_check_database = 'postgres' # HEALTH CHECK 健康检查 health_check_period = 10 health_check_timeout = 20 health_check_user = 'srcheck' health_check_password = '123456' health_check_database = 'postgres' # FAILOVER AND FAILBACK failover_command = '/data1/pg_bin/failover_stream.sh %H' failover_stream.sh脚本 vim /data1/pg_bin/failover_stream.sh chmod 777 /data1/pg_bin/failover_stream.sh chmod u+s /sbin/ifconfig chmod u+s /usr/sbin pgpool -n -d -D > /data1/pg_logs/pgpool.log 2>&1 & ## 启动 pgpool -m fast stop ## 关闭 failover_stream.sh内容: #! /bin/sh # Failover command for streaming replication. # Arguments: $1: new master hostname. new_master=$1 trigger_command="$PGHOME/bin/pg_ctl promote -D $PGDATA" # Prompte standby database. /usr/bin/ssh -T $new_master $trigger_command exit 0; 登陆设置 当执行pgpool -n -d -D > /data1/pg_logs/pgpool.log 2>&1 &后可查看集群状态: [postgres@pool pgpool-II]$ psql -p 54321 -h 10.0.0.13 -U srcheck -d postgres postgres=# show pool_nodes; node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay ---------+----------+-------+--------+-----------+---------+------------+-------------------+------------------- 0 | master | 54321 | up | 0.500000 | primary | 0 | false | 0 1 | slave | 54321 | up | 0.500000 | standby | 0 | true | 0 (2 rows) 如果未发现集群状态,请在master和slave主机分别执行以下操作: [postgres@pool ~]$ pcp_attach_node -d -U postgres -h pool -p 9898 -n 0 [postgres@pool ~]$ pcp_attach_node -d -U postgres -h pool -p 9898 -n 1 #详情查询命令pcp_attach_node HA切换模拟master主机宕机 Master端: [postgres@master ~]$ pg_ctl stop waiting for server to shut down.... done server stopped 当前集群状态 [postgres@pool ~]$ psql -p 54321 -h 10.0.0.13 -U srcheck -d postgres psql (9.6.1) Type "help" for help. postgres=# show pool_nodes; node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay ---------+----------+------+--------+-----------+---------+------------+-------------------+------------------- 0 | master | 5432 | down| 0.500000 | standby | 0 | false | 0 1 | slave | 5432 | up | 0.500000 | primary | 0 | true | 0 (2 rows) 发现master已经是standby了,且down机了 修改master,启动 当master主机宕机后,此时slave主机PG数据库成为主库,修改master成为slave的从库即可 [postgres@master ~]$ vim recovery.conf standby_mode='on' primary_conninfo = 'host=slave port=54321 user=stream_replication password=your_password' restore_command = '' recovery_target_timeline = 'latest' 同步时间线 #如果报时间线冲突落后,先停掉pg服务,然后执行同步时间线,否知直接看状态 [postgres@master ~]$ pg_rewind --target-pgdata=/data1/pg_data --source-server='host=slave port=54321 user=postgres dbname=postgres' servers diverged at WAL position 0/5000098 on timeline 1 rewinding from last common checkpoint at 0/5000028 on timeline 1 Done! # 重新启动数据库 [postgres@master ~]$ pg_ctl start 再次查看当前状态 [postgres@pool ~]$ psql -p 54321 -h 10.0.0.13 -U srcheck -d postgres postgres=# show pool_nodes; node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay ---------+----------+------+--------+-----------+---------+------------+-------------------+------------------- 0 | master | 5432 | down| 0.500000 | standby | 0 | false | 0 1 | slave | 5432 | up | 0.500000 | primary | 0 | true | 0 (2 rows) #注意虽然master已经启动了,但是还是down,需要手动将master节点添加进pgpool,master的node_id是0,所以-n 0 [postgres@pool ~]$ pcp_attach_node -d -U postgres -h pool -p 54321 -n 0 #提示输入密码,输入pcp管理密码 #查看当前状态 [postgres@pool ~]$ psql -p 54321 -h 10.0.0.13 -U srcheck -d postgres postgres=# show pool_nodes; node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay ---------+----------+------+--------+-----------+---------+------------+-------------------+------------------- 0 | master | 5432 | up | 0.500000 | standby | 0 | false | 0 1 | slave | 5432 | up | 0.500000 | primary | 0 | true | 0 (2 rows) 现在两个节点都是up了。 主从两节点pgpool健康检查脚本(pgpool_check.sh) 说明:此脚本是基于PGpool只安装到master和slave两个主机上的情况下使用,在master主机有了pgpool进程后,可在slave主机执行sh pgpool_check.sh & 即可 #! /bin/bash # Check Master host pgpool-process while true do pgcount=$(nmap 10.0.0.11|egrep '9898|9999'|wc -l) if [ $pgcount -eq 2 ] ; then echo 'Master host pgpool is GOOD!!!' > /dev/null 2>&1 else echo -e "Master host pgpool is \033[31m BAD!!! \033[0m" echo -e "Master host pgpool is \033[31m BAD!!! \033[0m" echo -e "Master host pgpool is \033[31m BAD!!! \033[0m" echo -e "SYSTEM WILL DO THE SHELL : \033[34m su - postgres -c 'pgpool -n -d -D > /data1/pg_logs/pgpool.log 2>&1 &' \033[0m" su - postgres -c 'pgpool -n -d -D > /data1/pg_logs/pgpool.log 2>&1 &' pgport=$(netstat -lntup|egrep '9898|9999'|wc -l) [ $pgport -gt 0 ] && echo -e "Slave host pgpool is \033[32m RUNNING!!! \033[0m" exit 0 fi done 以上为个人经验,希望能给大家一个参考,也希望大家多多支持极客世界。如有错误或未考虑完全的地方,望不吝赐教。 |
请发表评论