0

postgresql高可用测试系列之patroni etcd haproxy keepalived 离线部署(四) -利来国际app

张玉龙 2021-10-31
294


说明:本系列文章仅用于共享我的学习成果,未经过生产系统考验,对于知识点和一些组件的使用会不定时更新,仅供参考,如有错误的地方,欢迎留言共同学习。

本高可用系列测试不说理论知识,如有需要自行百度,因生产环境大多数是内网环境,无法连接互联网,为模拟生产环境安装,postgresql高可用测试均采用离线部署

所需软件包均以打包上传百度网盘,如有需要自行下载: 提取码:n9w2 文件名:postgresql_ha.tar.gz

第一章: 介绍测试环境
第二章: postgresql replication 部署
第三章: etcd 部署和管理
第四章: patroni 部署和管理
第五章: haproxy keepalived 部署和管理
第六章: 高可用模拟故障测试用例
第七章: prometheus grafana 监控部署
第八章: 高可用管理

  • patroni 基于 python 开发的模板,结合 dcs (例如 zookeeper, etcd, consul )可以定制 postgresql 高可用方案。
  • patroni 并不是一套拿来即用的 postgresql 高可用组件,涉及较多的配置和定制工作。
  • patroni 接管 postgresql 数据库的启停,同时监控本地的 postgresql 数据库,并将本地的 postgresql 数据库信息写入dcs。
  • patroni 的主备端是通过是否能获得 leader key 来控制的,获取到了 leader key 的 patroni 为主节点,其它的为备节点。
  • patroni 支持级联复制,支持同步和异步模式,支持 failover、switchovers、重新初始化集群等。
  • patroni 官方文档:

1. 所有节点安装python3

patroni 基于 python 开发的模板,需要运行在 python 环境下。

# 解压并安装python3 [[email protected] ~]# cd /enmo/soft [[email protected] soft]# tar -zxvf python3-rpm.tar.gz # 配置yum源 [[email protected] ~]# cat > /etc/yum.repos.d/enmo.repo << eof [server] name=server baseurl=file:///media/cdrom enabled=yes gpgcheck=0 [python3] name=python3 baseurl=file:///enmo/soft/python3-rpm enabled=yes gpgcheck=0 eof # 执行yum安装 [[email protected] ~]# yum clean all [[email protected] ~]# yum install python3 python3-devel -y # 修改软链接 [[email protected] ~]# rm -f /usr/bin/python [[email protected] ~]# ln -s /usr/bin/python3 /usr/bin/python # 查看版本,确认安装成功 [[email protected] ~]# python -v python 3.6.8 # 全词匹配替换,python3安装后’yum’命令执行会报错,需要修改以下配置 [[email protected] ~]# sed -i "s:\:python2:g" /usr/bin/yum [[email protected] ~]# sed -i "s:\:python2:g" /usr/libexec/urlgrabber-ext-down

2. 所有节点使用pip3安装patroni

在线连接互联网安装

# pip3 install patroni[etcd] -i https://mirrors.aliyun.com/pypi/simple/ # pip3 install psycopg2-binary -i https://mirrors.aliyun.com/pypi/simple/ # pip3 install psycopg2 -i https://mirrors.aliyun.com/pypi/simple/

以下使用离线安装
软件包下载,网盘中postgresql_ha.tar.gz文件中已包含以下安装包(patroni_etcd_2.1.1.tar.gz)

psutil-5.8.0.tar.gz https://mirrors.aliyun.com/pypi/simple/psutil/ ydiff-1.2.tar.gz https://mirrors.aliyun.com/pypi/simple/ydiff/ click-8.0.3-py3-none-any.whl https://mirrors.aliyun.com/pypi/simple/click/ dnspython-2.1.0-py3-none-any.whl https://mirrors.aliyun.com/pypi/simple/dnspython/ importlib_metadata-4.8.1-py3-none-any.whl https://mirrors.aliyun.com/pypi/simple/importlib-metadata/ patroni-2.1.1-py3-none-any.whl https://mirrors.aliyun.com/pypi/simple/patroni/ prettytable-2.2.1-py3-none-any.whl https://mirrors.aliyun.com/pypi/simple/prettytable/ python_dateutil-2.8.2-py2.py3-none-any.whl https://mirrors.aliyun.com/pypi/simple/python-dateutil/ pyyaml-6.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl https://mirrors.aliyun.com/pypi/simple/pyyaml/ six-1.16.0-py2.py3-none-any.whl https://mirrors.aliyun.com/pypi/simple/six/ typing_extensions-3.10.0.2-py3-none-any.whl https://mirrors.aliyun.com/pypi/simple/typing-extensions/ urllib3-1.26.7-py2.py3-none-any.whl https://mirrors.aliyun.com/pypi/simple/urllib3/ wcwidth-0.2.5-py2.py3-none-any.whl https://mirrors.aliyun.com/pypi/simple/wcwidth/ zipp-3.6.0-py3-none-any.whl https://mirrors.aliyun.com/pypi/simple/zipp/ python-etcd-0.4.5.tar.gz https://mirrors.aliyun.com/pypi/simple/python-etcd/ psycopg2-binary-2.9.1.tar.gz https://mirrors.aliyun.com/pypi/simple/psycopg2-binary/ psycopg2-2.9.1.tar.gz https://mirrors.aliyun.com/pypi/simple/psycopg2/

解压安装包并使用pip3安装patroni

[[email protected] ~]# cd /enmo/soft [[email protected] soft]# tar -zxvf patroni_etcd_2.1.1.tar.gz [[email protected] soft]# cd patroni_etcd_2.1.1 # 按顺序执行安装 [[email protected] patroni_etcd_2.1.1]# pip3 install psutil-5.8.0.tar.gz [[email protected] patroni_etcd_2.1.1]# pip3 install ydiff-1.2.tar.gz [[email protected] patroni_etcd_2.1.1]# pip3 install *.whl [[email protected] patroni_etcd_2.1.1]# pip3 install python-etcd-0.4.5.tar.gz [[email protected] patroni_etcd_2.1.1]# pip3 install psycopg2-binary-2.9.1.tar.gz [[email protected] patroni_etcd_2.1.1]# pip3 install psycopg2-2.9.1.tar.gz # 查看已安装的包 [[email protected] patroni_etcd_2.1.1]# pip3 list deprecation: the default format will switch to columns in the future. you can use --format=(legacy|columns) (or define a format=(legacy|columns) in your pip.conf under the [list] section) to disable this warning. click (8.0.3) dnspython (2.1.0) importlib-metadata (4.8.1) patroni (2.1.1) pip (9.0.3) prettytable (2.2.1) psutil (5.8.0) psycopg2 (2.9.1) psycopg2-binary (2.9.1) python-dateutil (2.8.2) python-etcd (0.4.5) pyyaml (6.0) setuptools (39.2.0) six (1.16.0) typing-extensions (3.10.0.2) urllib3 (1.26.7) wcwidth (0.2.5) ydiff (1.2) zipp (3.6.0)

2. 所有节点配置patroni的参数文件

2.1 创建参数文件和日志文件的存放路径

[[email protected] ~]# mkdir /enmo/app/patroni

2.2 主节点创建文件 patroni_config.yml

需要注意python的yml文件格式,有严格的缩进要求,且以空格进行缩进,不要使用tab键,缩进控制不好的话,参数配置将出现各种问题。

[[email protected] ~]# vi /enmo/app/patroni/patroni_config.yml scope: pg_cluster namespace: /service name: pgtest1 log: level: info traceback_level: error dir: /enmo/app/patroni file_num: 10 file_size: 104857600 restapi: listen: 192.168.58.10:8008 connect_address: 192.168.58.10:8008 etcd: host: 192.168.58.10:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 max_timelines_history: 0 master_start_timeout: 300 master_stop_timeout: 0 synchronous_mode: false postgresql: use_pg_rewind: true use_slots: true parameters: max_connections: 3000 superuser_reserved_connections: 100 max_locks_per_transaction: 64 max_worker_processes: 2 max_prepared_transactions: 0 wal_level: logical wal_log_hints: on track_commit_timestamp: off max_wal_senders: 10 max_replication_slots: 10 wal_keep_size: 4096mb # wal_keep_size = wal_keep_segments * wal_segment_size = 128 * 32mb hot_standby: "on" listen_addresses: "*" port: 5432 cluster_name: "pg_cluster" archive_mode: on archive_command: "cp %p /enmo/pgarch/%f" postgresql: listen: 0.0.0.0:5432 connect_address: 192.168.58.10:5432 data_dir: /enmo/pgdata pgpass: /home/postgres/.pgpass pg_ctl_timeout: 60 use_pg_rewind: true remove_data_directory_on_rewind_failure: false remove_data_directory_on_diverged_timelines: true authentication: replication: username: replica password: replica superuser: username: postgres password: postgres tags: nofailover: false noloadbalance: false clonefrom: false nosync: false

2.3 备节点1创建文件 patroni_config.yml

[[email protected] ~]# vi /enmo/app/patroni/patroni_config.yml scope: pg_cluster namespace: /service name: pgtest2 log: level: info traceback_level: error dir: /enmo/app/patroni file_num: 10 file_size: 104857600 restapi: listen: 192.168.58.11:8008 connect_address: 192.168.58.11:8008 etcd: host: 192.168.58.11:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 max_timelines_history: 0 master_start_timeout: 300 master_stop_timeout: 0 synchronous_mode: false postgresql: use_pg_rewind: true use_slots: true parameters: max_connections: 3000 superuser_reserved_connections: 100 max_locks_per_transaction: 64 max_worker_processes: 2 max_prepared_transactions: 0 wal_level: logical wal_log_hints: on track_commit_timestamp: off max_wal_senders: 10 max_replication_slots: 10 wal_keep_size: 4096mb # wal_keep_size = wal_keep_segments * wal_segment_size = 128 * 32mb hot_standby: "on" listen_addresses: "*" port: 5432 cluster_name: "pg_cluster" archive_mode: on archive_command: "cp %p /enmo/pgarch/%f" postgresql: listen: 0.0.0.0:5432 connect_address: 192.168.58.11:5432 data_dir: /enmo/pgdata pgpass: /home/postgres/.pgpass pg_ctl_timeout: 60 use_pg_rewind: true remove_data_directory_on_rewind_failure: false remove_data_directory_on_diverged_timelines: true authentication: replication: username: replica password: replica superuser: username: postgres password: postgres tags: nofailover: false noloadbalance: false clonefrom: false nosync: false

2.4 备节点2创建文件 patroni_config.yml

[[email protected] ~]# vi /enmo/app/patroni/patroni_config.yml scope: pg_cluster namespace: /service name: pgtest3 log: level: info traceback_level: error dir: /enmo/app/patroni file_num: 10 file_size: 104857600 restapi: listen: 192.168.58.12:8008 connect_address: 192.168.58.12:8008 etcd: host: 192.168.58.12:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 max_timelines_history: 0 master_start_timeout: 300 master_stop_timeout: 0 synchronous_mode: false postgresql: use_pg_rewind: true use_slots: true parameters: max_connections: 3000 superuser_reserved_connections: 100 max_locks_per_transaction: 64 max_worker_processes: 2 max_prepared_transactions: 0 wal_level: logical wal_log_hints: on track_commit_timestamp: off max_wal_senders: 10 max_replication_slots: 10 wal_keep_size: 4096mb # wal_keep_size = wal_keep_segments * wal_segment_size = 128 * 32mb hot_standby: "on" listen_addresses: "*" port: 5432 cluster_name: "pg_cluster" archive_mode: on archive_command: "cp %p /enmo/pgarch/%f" postgresql: listen: 0.0.0.0:5432 connect_address: 192.168.58.12:5432 data_dir: /enmo/pgdata pgpass: /home/postgres/.pgpass pg_ctl_timeout: 60 use_pg_rewind: true remove_data_directory_on_rewind_failure: false remove_data_directory_on_diverged_timelines: true authentication: replication: username: replica password: replica superuser: username: postgres password: postgres tags: nofailover: false noloadbalance: false clonefrom: false nosync: false

2.5 建议由 patroni 托管的 postgresql 参数:

  1. 主库和备库必须保持相同的值的参数,对于那些,在本地 patoni 配置文件中或通过环境变量设置的值不起作用。要更改或设置它们的值,必须更改 dcs 中的共享配置。
max_connections: 3000 max_locks_per_transaction: 64 max_worker_processes: 2 max_prepared_transactions: 0 wal_level: logical wal_log_hints: on track_commit_timestamp: off
  1. 对于下面的参数,postgresql 不要求主库和备库必须保持相同的值。但是,考虑到一个备库随时可能成为主库的可能性,将它们设置为不同的值实际上没有意义;因此,patroni 将其值设置为动态配置。
max_wal_senders: 10 max_replication_slots: 10 wal_keep_segments: 8 # postgresql 13 版本将 wal_keep_segments 重新定义为 wal_keep_size,决定了为备库保留的wal量。版本13采用字节大小表示,不再采用保留文件的个数,可通过下述公式换算:wal_keep_size = wal_keep_segments * wal_segment_size wal_keep_size: 4096mb
  1. 还有一些其他 postgres 参数由 patroni 控制
listen_addresses: "*" port: 5432 cluster_name: "pg_cluster" hot_standby: "on"

为了安全起见,上述列表中的参数不会写入 postgresql.conf,而是作为参数列表传递给 pg_ctl start,这赋予它们最高优先级,甚至高于 alter system

参数文件的优先顺序

  • the parameters would be applied in the following order (run-time are given the highest priority):
  • load parameters from file postgresql.base.conf (or from a custom_conf file, if set)
  • load parameters from file postgresql.conf
  • load parameters from file postgresql.auto.conf
  • run-time parameter using -o –name=value

2.6 修改目录权限

# chown -r postgres.postgres /enmo/app/patroni

2.7 参数解释

参考官方文档:

[[email protected] patroni]# cat patroni_config.yml # 集群名称 scope: pg_cluster # 集群名称 namespace: /service # patroni 将在其中保存有关集群的信息,这个路径是 etcd 存储数据的路径, default value: "/service" name: pgtest1 # 主机名,对于集群必须是唯一的 log: level: info # 日志级别 traceback_level: error dir: /enmo/app/patroni/ # 日志写入的目录 file_num: 10 # 要保留的日志数量 file_size: 104857600 # 触发​​日志滚动的 patoni.log 文件的大小(以字节为单位) restapi: listen: 192.168.58.10:8008 connect_address: 192.168.58.10:8008 etcd: # provide host to do the initial discovery of the cluster topology: # 必须指定host、hosts、url、proxy或 srv 之一 host: 192.168.58.10:2379 # etcd 端点的 host:port bootstrap: # this section will be written into etcd:///config after initializing new cluster # and all other cluster members will use it as a `global configuration` dcs: # 动态配置(dynamic configuration)的参数设置,动态配置存储在 dcs(分布式配置存储)中并应用于所有集群节点 ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 # 如果从库落后于主库超过一定数量的字节,则此设置可确保不会发生故障转移 maximum_lag_on_syncnode: -1 max_timelines_history: 0 # dcs 中保存的最大时间线历史项目数。默认值:0。当设置为 0 时,它会在 dcs 中保留完整的历史记录。 master_start_timeout: 300 # 在触发 failover 之前允许主服务器从故障中恢复的时间(单位:秒) # master 故障的最坏情况故障转移时间是: loop_wait master_start_timeout loop_wait master_stop_timeout: 0 # patroni 停止 postgres 时允许等待的秒数,仅在启用 synchronous_mode 时有效。超过参数值,则 patroni 会向 postmaster 发送 sigkill。 synchronous_mode: false # 打开同步复制模式。在此模式下,一个从库将被选择为同步模式的从库,只有最新的领导者和同步从库才能参与领导者选举。 synchronous_mode_strict # 如果没有可用的同步副本,则防止禁用同步复制,从而阻止所有客户端写入主服务器。 # https://patroni.readthedocs.io/en/latest/replication_modes.html postgresql: use_pg_rewind: true # 是否使用 pg_rewind use_slots: true # 是否使用复制槽 parameters: max_connections: 3000 superuser_reserved_connections: 100 # patroni 需要使用超级用户访问数据库才能正常运行 max_locks_per_transaction: 64 max_worker_processes: 2 max_prepared_transactions: 0 wal_level: logical wal_log_hints: on track_commit_timestamp: off max_wal_senders: 10 max_replication_slots: 10 wal_keep_size: 4096mb # wal_keep_size = wal_keep_segments * wal_segment_size = 128 * 32mb hot_standby: "on" listen_addresses: "*" port: 5432 cluster_name: "pg_cluster" archive_mode: on archive_command: "cp %p /enmo/pgarch/%f" # standby_cluster: # 如果定义了这个部分,我们想要引导一个备用集群 # host: 127.0.0.1 # 远程主机的地址 # port: 1111 # 远程主机的端口 # primary_slot_name: patroni # 远程主服务器上用于复制的插槽。此参数是可选的,默认值来自实例名称 # create_replica_methods # restore_command # archive_cleanup_command # recovery_min_apply_delay initdb: # 列出要传递给 initdb 的选项 - data-checksums # 在9.3上需要 pg_rewind 时必须启用 - encoding: utf8 # 新数据库的默认编码 - locale: utf8 # 新数据库的默认语言环境 - wal-segsize: 32 - allow-group-access - pgdata: /enmo/pgdata - waldir: /enmo/pgwal pg_hba: # 应该添加到 pg_hba.conf 的行列表 - host all all 0.0.0.0/0 md5 - host replication replicator 127.0.0.1/32 md5 users: # 初始化新集群后需要创建的一些额外用户 admin: # 用户名是admin password: zalando options: # create user 语句的选项列表 - createrole - createdb postgresql: listen: 0.0.0.0:5432 # postgres 监听的 ip 地址 端口 connect_address: 192.168.58.10:5432 data_dir: /enmo/pgdata # postgres 数据目录的位置 # bin_dir: /software/pgsql/bin # postgresql 二进制文件的路径,默认值是一个空字符串,这意味着 path 环境变量将用于查找可执行文件 # config_dir: /software/pgsql/data # postgres 配置目录的位置,默认为 data_dir pgpass: /home/postgres/.pgpass # 密码文件的路径 pg_ctl_timeout: 60 # pg_ctl 执行 start,stop 或 restart 时应等待多长时间. 默认值为 60 秒。 use_pg_rewind: true # 当它作为副本加入集群时,尝试在前领导者上使用 pg_rewind remove_data_directory_on_rewind_failure: false # 如果启用此选项,patroni 将删除 postgresql 数据目录并重新创建副本。否则它会尝试跟随新的领导者。默认值为false remove_data_directory_on_diverged_timelines: false # 如果 patroni 注意到时间线正在发散并且以前的 master 无法从新 master 开始流式传输,则 patroni 将删除 postgresql 数据目录并重新创建副本。此选项在 pg_rewind 无法使用时很有用。默认值为false。 authentication: # 用户验证 replication: # 复制用户 username: replica password: replica superuser: # 超级用户 username: postgres password: postgres # rewind: # pg_rewind 用户 # username: # password: tags: nofailover: false # 控制是否允许此节点参与领导者竞赛并成为领导者 noloadbalance: false clonefrom: false nosync: false

3. 所有节点创建 patroni 服务并启动

# 创建服务,注意需要配置环境变量,否则启动服务会出现这种报错(fatal: patroni requires psycopg2>=2.5.4 or psycopg2-binary) # requires,强制依赖 etcd.service 启动成功,可以视情况而定,etcd不启动,patroni起来后不会自动拉起数据库 [[email protected] ~]# vi /usr/lib/systemd/system/patroni.service [unit] description=patroni after=network.target remote-fs.target nss-lookup.target etcd.service requires=etcd.service [service] type=forking user=postgres group=postgres environment="pghome=/enmo/app/pgsql/13.3" environment="pgdata=/enmo/pgdata" environment="pgport=5432" environment="ld_library_path=/enmo/app/pgsql/13.3/lib" environment="path=/enmo/app/pgsql/13.3/bin:/usr/local/bin" execstart=/bin/bash -c "patroni /enmo/app/patroni/patroni_config.yml >> /enmo/app/patroni/patroni.log 2>&1 &" execreload=/bin/kill -s hup $mainpid execstop=/usr/bin/killall patroni killmode=process timeoutsec=30 restart=no [install] wantedby=multi-user.target # 启动服务 [[email protected] ~]# systemctl daemon-reload [[email protected] ~]# systemctl restart patroni.service [[email protected] ~]# systemctl enable patroni.service # 因为 patroni 会检测 postgresql 数据库是否正常运行,如果没有运行,会自动启动postgresql 数据库,所以可以考虑禁用 postgresql 服务,由 patroni 托管pg的启停 [[email protected] ~]# systemctl disable postgres-13.service

注意:kill patroni 进程是否会宕库?

# 1. kill -9 patroni进程,不会宕库 [[email protected] ~]# ps -ef |grep patroni root 22695 22009 0 15:09 pts/1 00:00:00 tail -f /enmo/app/patroni/patroni.log postgres 22754 1 0 15:13 ? 00:00:00 /usr/bin/python3 /usr/local/bin/patroni /enmo/app/patroni/patroni_config.yml root 22790 22720 0 15:18 pts/3 00:00:00 grep --color=auto patroni [[email protected] ~]# kill -9 22754 # 2. killall patroni进程,会宕节点所在的数据库 [[email protected] ~]# /usr/bin/killall patroni # 3. 关闭 auto failover 功能,killall patroni进程,不会宕库 [[email protected] ~]# patronictl -c /enmo/app/patroni/patroni_config.yml pause success: cluster management is paused [[email protected] ~]# /usr/bin/killall patroni

在启动patroni之后,建议先使用patronictl禁掉auto failover功能,当启动完毕调试正常后再选择性的启动auto failover功能,因为启用auto failover功能后,killall patroni进程,会导致当前节点的数据库宕掉,如果主库处于生产状态,后果不堪设想。

# disable auto failover # 如果没有启动 patroni,执行patronictl pause 会失败 [[email protected] ~]# patronictl -c /enmo/app/patroni/patroni_config.yml pause success: cluster management is paused # resume auto failover [[email protected] ~]# patronictl -c /enmo/app/patroni/patroni_config.yml resume success: cluster management is resumed

4. 所有节点设置patronictl别名,方便维护

[[email protected] ~]# cat >> /etc/profile << eof alias patronictl='patronictl -c /enmo/app/patroni/patroni_config.yml' eof [[email protected] ~]# source /etc/profile

5. 查询 patroni 集群状态

[[email protected] ~]# patronictl list --------- --------------- --------- --------- ---- ----------- ----------------- | member | host | role | state | tl | lag in mb | pending restart | cluster: pg_cluster (7025023477017500881) -- ---- ----------- ----------------- | pgtest1 | 192.168.58.10 | leader | running | 5 | | | | pgtest2 | 192.168.58.11 | replica | running | 4 | 64 | * | | pgtest3 | 192.168.58.12 | replica | running | 4 | 64 | * | --------- --------------- --------- --------- ---- ----------- -----------------

至此,patroni部署完成,后面是patroni的管理部分,可选择性阅读

6. patroni 动态配置参数调整

6.1 使用 etcdctl 查看存储在etcd中的 patroni 动态配置参数

etcdctl 返回的结果是json格式,为了方便查看,可以下载安装jq命令
jq 的下载地址: jq 1.6 binaries for 64-bit or 32-bit.
上传服务器,修改可执行权限,chmod x /enmo/soft/jq-linux64

[[email protected] ~]# export etcdctl_api=2 [[email protected] ~]# etcdctl get /service/pg_cluster/config |/enmo/soft/jq-linux64 . { "ttl": 30, "loop_wait": 10, "retry_timeout": 10, "maximum_lag_on_failover": 1048576, "max_timelines_history": 0, "master_start_timeout": 300, "master_stop_timeout": 0, "synchronous_mode": false, "postgresql": { "use_pg_rewind": true, "use_slots": true, "parameters": { "max_connections": 3000, "superuser_reserved_connections": 100, "max_locks_per_transaction": 64, "max_worker_processes": 2, "max_prepared_transactions": 0, "wal_level": "logical", "wal_log_hints": true, "track_commit_timestamp": false, "max_wal_senders": 10, "max_replication_slots": 10, "wal_keep_size": "4096mb", "hot_standby": "on", "listen_addresses": "*", "port": 5432, "cluster_name": "pg_cluster", "archive_mode": true, "archive_command": "cp %p /enmo/pgarch/%f" } } }

6.2 patronictl 查看 patroni 动态配置参数

[[email protected] ~]# patronictl show-config loop_wait: 10 master_start_timeout: 300 master_stop_timeout: 0 max_timelines_history: 0 maximum_lag_on_failover: 1048576 postgresql: parameters: archive_command: cp %p /enmo/pgarch/%f archive_mode: true cluster_name: pg_cluster hot_standby: 'on' listen_addresses: '*' max_connections: '3000' max_locks_per_transaction: 64 max_prepared_transactions: 0 max_replication_slots: 10 max_wal_senders: 10 max_worker_processes: 2 port: 5432 superuser_reserved_connections: 100 track_commit_timestamp: false wal_keep_size: 4096mb wal_level: logical wal_log_hints: true use_pg_rewind: true use_slots: true retry_timeout: 10 synchronous_mode: false ttl: 30

6.3 通过 patronictl 调整配置参数,在其中一个节点调整,其他节点也会自动调整,并且 patroni 自动进行 reload 操作

[[email protected] ~]$ patronictl edit-config # 编辑文本 apply these changes? [y/n]: y configuration changed

6.4 对于需要重启数据库生效的参数,为了减少对生产的影响,可以逐个节点重启,也可以在停机窗口通过 patronictl restart 对整个集群进行重启

# 仅重启当前节点(--any) [[email protected] ~]# patronictl restart pg_cluster --any --------- --------------- --------- --------- ---- ----------- | member | host | role | state | tl | lag in mb | cluster: pg_cluster (7025023477017500881) -- ---- ----------- | pgtest1 | 192.168.58.10 | replica | running | 6 | 0 | | pgtest2 | 192.168.58.11 | leader | running | 6 | | | pgtest3 | 192.168.58.12 | replica | running | 6 | 0 | --------- --------------- --------- --------- ---- ----------- when should the restart take place (e.g. 2021-10-31t21:00) [now]: are you sure you want to restart members pgtest1, pgtest2, pgtest3? [y/n]: y restart if the postgresql version is less than provided (e.g. 9.5.2) []: success: restart on member pgtest1 # 如果节点是 pending 状态的,才会执行重启操作 [[email protected] ~]# patronictl restart pg_cluster --any --pending --------- --------------- --------- --------- ---- ----------- | member | host | role | state | tl | lag in mb | cluster: pg_cluster (7025023477017500881) -- ---- ----------- | pgtest1 | 192.168.58.10 | replica | running | 6 | 0 | | pgtest2 | 192.168.58.11 | leader | running | 6 | | | pgtest3 | 192.168.58.12 | replica | running | 6 | 0 | --------- --------------- --------- --------- ---- ----------- when should the restart take place (e.g. 2021-10-31t21:01) [now]: are you sure you want to restart members pgtest1, pgtest2, pgtest3? [y/n]: y restart if the postgresql version is less than provided (e.g. 9.5.2) []: failed: restart for member pgtest2, status code=503, (restart conditions are not satisfied) # 重启所有成员 [[email protected] ~]# patronictl restart pg_cluster --------- --------------- --------- --------- ---- ----------- | member | host | role | state | tl | lag in mb | cluster: pg_cluster (7025023477017500881) -- ---- ----------- | pgtest1 | 192.168.58.10 | replica | running | 6 | 0 | | pgtest2 | 192.168.58.11 | leader | running | 6 | | | pgtest3 | 192.168.58.12 | replica | running | 6 | 0 | --------- --------------- --------- --------- ---- ----------- when should the restart take place (e.g. 2021-10-31t21:02) [now]: are you sure you want to restart members pgtest1, pgtest2, pgtest3? [y/n]: y restart if the postgresql version is less than provided (e.g. 9.5.2) []: success: restart on member pgtest1 success: restart on member pgtest2 success: restart on member pgtest3

6.5 如何仅调整指定节点参数

上述提到了postgresql的配置参数统一由patroni管理,单独修改节点的 postgresql.conf 参数不再生效。
如果想差异化配置其中一个节点的参数,可通过 alter system 命令动态配置,如下:

[[email protected] ~]# psql postgres=# show wal_keep_segments ; postgres=# alter system set wal_keep_segments =200; postgres=# select pg_reload_conf(); postgres=# show wal_keep_segments ;

7. 调用浏览器网页查看集群状态

linux curl命令需要提前安装 jq 命令格式化显示结果,windows 谷歌浏览器需要安装扩展插件 jsonview,否则显示结果不好看
jsonview 的下载地址
jsonview 参考





获取当前版本的动态配置:

[[email protected] ~]# curl -s "http://192.168.58.10:8008/cluster" |/enmo/soft/jq-linux64 . { "members": [ { "name": "pgtest1", "role": "leader", "state": "running", "api_url": "http://192.168.58.10:8008/patroni", "host": "192.168.58.10", "port": 5432, "timeline": 5 }, { "name": "pgtest2", "role": "replica", "state": "running", "api_url": "http://192.168.58.11:8008/patroni", "host": "192.168.58.11", "port": 5432, "timeline": 5, "pending_restart": true, "lag": 0 }, { "name": "pgtest3", "role": "replica", "state": "running", "api_url": "http://192.168.58.12:8008/patroni", "host": "192.168.58.12", "port": 5432, "timeline": 5, "pending_restart": true, "lag": 0 } ] }

8. patroni rest api

8.1 监控 monitoring endpoint

[[email protected] ~]# curl -s "http://192.168.58.10:8008/patroni" |/enmo/soft/jq-linux64 .

8.2 集群状态 cluster status endpoints

[[email protected] ~]# curl -s "http://192.168.58.10:8008/cluster" |/enmo/soft/jq-linux64 . [[email protected] ~]# curl -s "http://192.168.58.10:8008/history" |/enmo/soft/jq-linux64 .

8.3 配置 config endpoint

  1. 获取当前版本的动态配置
[[email protected] ~]# curl -s "http://192.168.58.10:8008/config" |/enmo/soft/jq-linux64 .
  1. 更改现有配置
[[email protected] ~]# curl -s -xpatch -d '{"loop_wait":9,"ttl":25,"postgresql":{"parameters":{"max_connections":"2500"}}}' http://192.168.58.10:8008/config |/enmo/soft/jq-linux64 . { "ttl": 25, "loop_wait": 9, "retry_timeout": 10, "maximum_lag_on_failover": 1048576, "max_timelines_history": 0, "master_start_timeout": 300, "master_stop_timeout": 0, "synchronous_mode": false, "postgresql": { "use_pg_rewind": true, "use_slots": true, "parameters": { "max_connections": "2500", "superuser_reserved_connections": 100, "max_locks_per_transaction": 64, "max_worker_processes": 2, "max_prepared_transactions": 0, "wal_level": "logical", "wal_log_hints": true, "track_commit_timestamp": false, "max_wal_senders": 10, "max_replication_slots": 10, "wal_keep_size": "4096mb", "hot_standby": "on", "listen_addresses": "*", "port": 5432, "cluster_name": "pg_cluster", "archive_mode": true, "archive_command": "cp %p /enmo/pgarch/%f" } } }

此时 patronictl 显示有需要重启生效的参数被修改(pending restart)

[[email protected] ~]# patronictl restart pg_cluster --------- --------------- --------- --------- ---- ----------- ----------------- | member | host | role | state | tl | lag in mb | pending restart | cluster: pg_cluster (7025023477017500881) -- ---- ----------- ----------------- | pgtest1 | 192.168.58.10 | leader | running | 5 | | * | | pgtest2 | 192.168.58.11 | replica | running | 5 | 0 | * | | pgtest3 | 192.168.58.12 | replica | running | 5 | 0 | * | --------- --------------- --------- --------- ---- ----------- -----------------
  1. 重启数据库使参数生效
[[email protected] ~]# patronictl restart pg_cluster --------- --------------- --------- --------- ---- ----------- ----------------- | member | host | role | state | tl | lag in mb | pending restart | cluster: pg_cluster (7025023477017500881) -- ---- ----------- ----------------- | pgtest1 | 192.168.58.10 | leader | running | 5 | | * | | pgtest2 | 192.168.58.11 | replica | running | 5 | 0 | * | | pgtest3 | 192.168.58.12 | replica | running | 5 | 0 | * | --------- --------------- --------- --------- ---- ----------- ----------------- when should the restart take place (e.g. 2021-10-31t18:55) [now]: are you sure you want to restart members pgtest1, pgtest2, pgtest3? [y/n]: y restart if the postgresql version is less than provided (e.g. 9.5.2) []: success: restart on member pgtest1 success: restart on member pgtest2 success: restart on member pgtest3 [[email protected] ~]# patronictl list --------- --------------- --------- --------- ---- ----------- | member | host | role | state | tl | lag in mb | cluster: pg_cluster (7025023477017500881) -- ---- ----------- | pgtest1 | 192.168.58.10 | leader | running | 5 | | | pgtest2 | 192.168.58.11 | replica | running | 5 | 0 | | pgtest3 | 192.168.58.12 | replica | running | 5 | 0 | --------- --------------- --------- --------- ---- -----------
  1. 删除(重置)某些设置 (null)
$ curl -s -xpatch -d '{"postgresql":{"parameters":{"max_connections":null}}}' http://192.168.58.10:8008/config |/enmo/soft/jq-linux64 .

8.4 switchover and failover endpoints

  • failover endpoints 允许在没有健康节点时执行手动 failover ,但同时它不允许 switchover 。
  • switchover endpoint 则相反。它仅在集群健康(有leader)时才起作用,并允许在指定时间安排切换。
  1. 执行到特定节点的 failover,在节点都正常的情况下,执行 failover 实际上和执行 switchover 一样
[[email protected] ~]# patronictl list --------- --------------- --------- --------- ---- ----------- | member | host | role | state | tl | lag in mb | cluster: pg_cluster (7025023477017500881) -- ---- ----------- | pgtest1 | 192.168.58.10 | leader | running | 5 | | | pgtest2 | 192.168.58.11 | replica | running | 5 | 0 | | pgtest3 | 192.168.58.12 | replica | running | 5 | 0 | --------- --------------- --------- --------- ---- ----------- [[email protected] ~]# curl -s http://192.168.58.10:8008/failover -xpost -d '{"candidate":"pgtest2"}' successfully failed over to "pgtest2" [[email protected] ~]# patronictl list --------- --------------- --------- --------- ---- ----------- | member | host | role | state | tl | lag in mb | cluster: pg_cluster (7025023477017500881) -- ---- ----------- | pgtest1 | 192.168.58.10 | replica | running | 6 | 0 | | pgtest2 | 192.168.58.11 | leader | running | 6 | | | pgtest3 | 192.168.58.12 | replica | running | 6 | 0 | --------- --------------- --------- --------- ---- ----------- [[email protected] ~]# psql psql (13.3) type "help" for help. postgres=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- t (1 row)
  1. 在指定时间从 leader 到集群中任何其他健康节点的 switchover
[[email protected] ~]# patronictl list --------- --------------- --------- --------- ---- ----------- | member | host | role | state | tl | lag in mb | cluster: pg_cluster (7025023477017500881) -- ---- ----------- | pgtest1 | 192.168.58.10 | replica | running | 6 | 0 | | pgtest2 | 192.168.58.11 | leader | running | 6 | | | pgtest3 | 192.168.58.12 | replica | running | 6 | 0 | --------- --------------- --------- --------- ---- ----------- # switchover 必须至少指定 leader 或 candidate 字段和可选的 scheduled_at 字段。 # 2021-10-31t18:42 08 年-月-日t时:分 时区( 08) [[email protected] ~]# curl -s http://192.168.58.10:8008/switchover -xpost -d '{"leader":"pgtest2","scheduled_at":"2021-10-31t18:42 08"}' switchover scheduled # patronictl list 会显示 switchover scheduled at [[email protected] ~]# patronictl list --------- --------------- --------- --------- ---- ----------- | member | host | role | state | tl | lag in mb | cluster: pg_cluster (7025023477017500881) -- ---- ----------- | pgtest1 | 192.168.58.10 | replica | running | 6 | 0 | | pgtest2 | 192.168.58.11 | leader | running | 6 | | | pgtest3 | 192.168.58.12 | replica | running | 6 | 0 | --------- --------------- --------- --------- ---- ----------- switchover scheduled at: 2021-10-31t18:42:00 00:00 from: pgtest2 # patroni 日志里也有倒计时提醒 2021-10-31 18:40:51,002 info: no action. i am (pgtest2) the leader with the lock 2021-10-31 18:40:59,995 info: lock owner: pgtest2; i am pgtest2 2021-10-31 18:40:59,997 info: awaiting failover at 2021-10-31t18:42:00 00:00 (in 28860 seconds) 2021-10-31 18:41:00,000 info: no action. i am (pgtest2) the leader with the lock 2021-10-31 18:41:08,996 info: lock owner: pgtest2; i am pgtest2 2021-10-31 18:41:08,998 info: awaiting failover at 2021-10-31t18:42:00 00:00 (in 28851 seconds) # 删除定时切换任务 [[email protected] ~]# curl -s http://192.168.58.10:8008/switchover -xdelete scheduled switchover deleted [[email protected] ~]# patronictl list --------- --------------- --------- --------- ---- ----------- | member | host | role | state | tl | lag in mb | cluster: pg_cluster (7025023477017500881) -- ---- ----------- | pgtest1 | 192.168.58.10 | replica | running | 6 | 0 | | pgtest2 | 192.168.58.11 | leader | running | 6 | | | pgtest3 | 192.168.58.12 | replica | running | 6 | 0 | --------- --------------- --------- --------- ---- -----------

post /switchover 和 post /failover 分别对应 patientictl switchoverpatonictl failover 命令。delete /switchover 对应 patronictl flush switchover 命令。

# switchover [[email protected] ~]$ patronictl switchover master [pgtest1]: candidate ['pgtest2', 'pgtest3'] []: pgtest2 when should the switchover take place (e.g. 2021-10-28t04:45 ) [now]: current cluster topology ... ... are you sure you want to switchover cluster pg_cluster, demoting current master pgtest1? [y/n]: y 2021-10-28 03:45:35.91763 successfully switched over to "pgtest2" ... ... # failover [[email protected] ~]$ patronictl failover candidate ['pgtest1', 'pgtest3'] []: pgtest1 current cluster topology ... ... are you sure you want to failover cluster pg_cluster, demoting current master pgtest2? [y/n]: y 2021-10-28 03:47:56.13486 successfully failed over to "pgtest1" ... ...

8.5 重启 restart endpoint

  1. post /restart 提供了几个参数
restart_pending: boolean, if set to true patroni will restart postgresql only when restart is pending in order to apply some changes in the postgresql config. role: perform restart only if the current role of the node matches with the role from the post request. postgres_version: perform restart only if the current version of postgres is smaller than specified in the post request. timeout: how long we should wait before postgresql starts accepting connections. overrides master_start_timeout. schedule: timestamp with time zone, schedule the restart somewhere in the future.
  1. post /restart 仅重启当前节点的数据库
[[email protected] ~]# curl -s http://192.168.58.10:8008/restart -xpost restarted successfully
  1. post /restart 定时重启当前节点的数据库
[[email protected] ~]# curl -s http://192.168.58.10:8008/restart -xpost -d '{"schedule":"2021-10-31t19:18 08"}' restart scheduled [[email protected] ~]# patronictl list --------- --------------- --------- --------- ---- ----------- --------------------------- | member | host | role | state | tl | lag in mb | scheduled restart | cluster: pg_cluster (7025023477017500881) -- ---- ----------- --------------------------- | pgtest1 | 192.168.58.10 | replica | running | 6 | 0 | 2021-10-31t19:18:00 08:00 | | pgtest2 | 192.168.58.11 | leader | running | 6 | | | | pgtest3 | 192.168.58.12 | replica | running | 6 | 0 | | --------- --------------- --------- --------- ---- ----------- ---------------------------
  1. delete /restart 删除定时重启任务
[[email protected] ~]# curl -s http://192.168.58.10:8008/restart -xdelete

post /restart and delete /restart endpoints are used by patronictl restart and patronictl flush restart respectively.

8.6 重新加载 reload endpoint

post /reload 让 patroni 重新读取和应用配置文件。这相当于向 patroni 进程发送 sighup 信号。如果您更改了一些需要重新启动 postgres 的参数(如 shared_buffers),您仍然必须通过调用 post /restart 或使用 patriotictl restart 明确地重新启动postgres。

[[email protected] ~]# curl -s http://192.168.58.10:8008/reload -xpost reload scheduled

the reload endpoint is used by patronictl reload

[[email protected] ~]$ patronictl reload pg_cluster ... ... are you sure you want to reload members pgtest3, pgtest2, pgtest1? [y/n]: y reload request received for member pgtest3 and will be processed within 10 seconds reload request received for member pgtest2 and will be processed within 10 seconds reload request received for member pgtest1 and will be processed within 10 seconds

8.7 重新初始化 reinitialize endpoint

只允许在从节点上执行,一旦调用,它将删除数据目录并启动 pg_basebackup 重新初始化指定节点上的 postgresql 数据目录。
the call might fail if patroni is in a loop trying to recover (restart) a failed postgres. in order to overcome this problem one can specify {“force”:true} in the request body.

[[email protected] ~]# curl -s http://192.168.58.10:8008/reinitialize -xpost reinitialize started

the reinitialize endpoint is used by patronictl reinit.

[[email protected] ~]$ patronictl reinit pg_cluster ... ... which member do you want to reinitialize [pgtest3, pgtest2, pgtest1]? []: pgtest3 are you sure you want to reinitialize members pgtest3? [y/n]: y success: reinitialize for member pgtest3

9. 使用 patronictl 执行数据库查询操作

[[email protected] ~]$ cat aa.sql select * from test_1; [[email protected] ~]$ patronictl -c /enmo/app/patroni/pg_test01.yml query -f aa.sql --password password: id create_time 1 2021-10-16 17:47:34 2 2021-10-16 17:55:06 [[email protected] ~]$ patronictl -c /enmo/app/patroni/pg_test01.yml query -c "select * from test_1;" --password password: id create_time 1 2021-10-16 17:47:34 2 2021-10-16 17:55:06

10. 其他参考官方文档

patroni 官方文档:

10.1 将独立设备转换为 patroni 集群

10.2 postgresql 版本重大升级

最后修改时间:2021-11-04 03:22:59
「喜欢文章,快来给作者赞赏墨值吧」
【利来手机国际的版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:[email protected]进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论