Re: BUG #16029: pg_basebackup - At least one WAL file is missing

Поиск
Список
Период
Сортировка
От Georg H.
Тема Re: BUG #16029: pg_basebackup - At least one WAL file is missing
Дата
Msg-id 00a1572e-e372-6f87-8902-effbe7a33640@arcor.de
обсуждение исходный текст
Ответ на BUG #16029: pg_basebackup - At least one WAL file is missing  (PG Bug reporting form <noreply@postgresql.org>)
Список pgsql-bugs
Hi,

is there any missing information or things I could check to get this 
issue solved?
We are planning to migrate our production clusters in the second week of 
November and I'm worried about to run into the same issue with them.
Any hint would be appeciated

kind regards

Georg

Am 27.09.2019 um 14:11 schrieb PG Bug reporting form:
> The following bug has been logged on the website:
>
> Bug reference:      16029
> Logged by:          Georg H.
> Email address:      silentx@arcor.de
> PostgreSQL version: 11.5
> Operating system:   CentOS Linux release 7.7.1908
> Description:
>
> Hello,
>
> I've upgraded 2 clusters (single, no HA / replication except BARMANs
> synchronous WAL streaming) as described below from v10.10 to v11.5.
> Both clusters are on the same server, using same file systems, having
> identical postgresql.conf settings (except port / cluster name / folders)
> and have more or less same content (2 development stages)
> We take sequential backups of both clusters using BARMAN with backup_method
> = postgres (If set to postgres barman will use the pg_basebackup command to
> execute the backup).
> BARMAN conf for both clusters is also identical and using same PostgreSQL
> 11.5 binaries out of the same repo as the DB-server uses.
>
> Since the upgrade the backup of one of both clusters is failing 2-3 times a
> week with "error=At least one WAL file is missing. The first missing WAL
> file is xxx".
> In fact the "missing" WAL is archived to the backup server 2 seconds after
> "pg_stop_backup complete" (based on postgresql.log).
> The second cluster doesn't have this problem at all.
> I'm not sure if this is a bug or just a configuration problem.
>
> Noticeable is a difference between backup.info file (end_wal) and backup
> history file (STOP WAL LOCATION) contained in the basebackup.
> When failing, the backup history file is the one of STOP WAL LOCATION and
> the next WAL is in the error marked as missing as it's set as end_wal in
> backup.info
> e.g. (see full content below):
> backup history file is
> 000000010000000100000083.00000028.backup saying  STOP WAL LOCATION:
> 1/83000130 (file 000000010000000100000083)
> and missing WAL = end_wal = 000000010000000100000084
>
> I would expect, that "end_wal = STOP WAL LOCATION", so pointing to the same
> WAL as it does when backup succeed.
>
> --backup.info
> backup_label=None
> begin_offset=40
> begin_time=2019-09-26 23:35:02+02:00
> begin_wal=000000010000000100000083
> begin_xlog=1/83000028
> config_file=/pg_data/data/i02mspdev1/postgresql.conf
> copy_stats={'total_time': 23.154973, 'copy_time': 23.154973}
> deduplicated_size=2038909210
> end_offset=96
> end_time=2019-09-26 23:35:25.052851+02:00
> end_wal=000000010000000100000084
> end_xlog=1/84000060
> error=At least one WAL file is missing. The first missing WAL file is
> 000000010000000100000084
> hba_file=/pg_data/data/i02mspdev1/pg_hba.conf
> ident_file=/pg_data/data/i02mspdev1/pg_ident.conf
> included_files=None
> mode=postgres
> pgdata=/pg_data/data/i02mspdev1
> server_name=dev_server_5433
> size=2038909210
> status=FAILED
> tablespaces=[('indexspace', 16400, '/pg_data/indexts/i02mspdev1')]
> timeline=1
> version=110005
> xlog_segment_size=16777216
> --backup.info end
>
> --backup history
> START WAL LOCATION: 1/83000028 (file 000000010000000100000083)
> STOP WAL LOCATION: 1/83000130 (file 000000010000000100000083)
> CHECKPOINT LOCATION: 1/83000060
> BACKUP METHOD: streamed
> BACKUP FROM: master
> START TIME: 2019-09-26 23:35:02 CEST
> LABEL: pg_basebackup base backup
> START TIMELINE: 1
> STOP TIME: 2019-09-26 23:35:21 CEST
> STOP TIMELINE: 1
> --backup history end
>
> --excerpt out of postgresql.log (Level DEBUG1) :
> 2019-09-26 23:35:01.732 CEST [166017] [unknown] streaming_barman 10.x.x.101
> LOG:  received replication command: IDENTIFY_SYSTEM
> 2019-09-26 23:35:02.010 CEST [166018] [unknown] streaming_barman 10.x.x.101
> LOG:  received replication command: SHOW data_directory_mode
> 2019-09-26 23:35:02.011 CEST [166018] [unknown] streaming_barman 10.x.x.101
> LOG:  received replication command: SHOW wal_segment_size
> 2019-09-26 23:35:02.023 CEST [166018] [unknown] streaming_barman 10.x.x.101
> LOG:  received replication command: IDENTIFY_SYSTEM
> 2019-09-26 23:35:02.038 CEST [166018] [unknown] streaming_barman 10.x.x.101
> LOG:  received replication command: BASE_BACKUP LABEL 'pg_basebackup base
> backup'   FAST
> 2019-09-26 23:35:02.057 CEST [97464]    DEBUG:  performing replication slot
> checkpoint
> 2019-09-26 23:35:05.090 CEST [97468]    DEBUG:  archived write-ahead log
> file "000000010000000100000082"
> 2019-09-26 23:35:13.901 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG:  file "pg_internal.init" excluded from backup
> 2019-09-26 23:35:13.915 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG:  contents of directory "pg_dynshmem" excluded from backup
> 2019-09-26 23:35:13.915 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG:  contents of directory "pg_notify" excluded from backup
> 2019-09-26 23:35:13.915 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG:  contents of directory "pg_serial" excluded from backup
> 2019-09-26 23:35:13.915 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG:  contents of directory "pg_snapshots" excluded from backup
> 2019-09-26 23:35:13.915 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG:  contents of directory "pg_subtrans" excluded from backup
> 2019-09-26 23:35:13.916 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG:  file "pg_internal.init" excluded from backup
> 2019-09-26 23:35:14.060 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG:  file "pg_internal.init" excluded from backup
> 2019-09-26 23:35:14.120 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG:  file "pg_internal.init" excluded from backup
> 2019-09-26 23:35:17.744 CEST [166068]    DEBUG:  autovacuum: processing
> database "postgres"
> 2019-09-26 23:35:21.325 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG:  contents of directory "pg_replslot" excluded from backup
> 2019-09-26 23:35:21.325 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG:  contents of directory "pg_stat_tmp" excluded from backup
> 2019-09-26 23:35:21.421 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG:  file "postmaster.opts" excluded from backup
> 2019-09-26 23:35:21.421 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG:  file "postmaster.pid" excluded from backup
> 2019-09-26 23:35:22.682 CEST [97468]    DEBUG:  archived write-ahead log
> file "000000010000000100000083"
> 2019-09-26 23:35:23.569 CEST [97468]    DEBUG:  archived write-ahead log
> file "000000010000000100000083.00000028.backup"
> 2019-09-26 23:35:24.449 CEST [166018] [unknown] streaming_barman 10.x.x.101
> NOTICE:  pg_stop_backup complete, all required WAL segments have been
> archived
> 2019-09-26 23:35:25.547 CEST [166098] postgres barman 10.x.x.101 LOG:
> restore point "barman_20190926T233501" created at 1/85000090
> 2019-09-26 23:35:25.547 CEST [166098] postgres barman 10.x.x.101 STATEMENT:
> SELECT pg_create_restore_point('barman_20190926T233501')
> 2019-09-26 23:35:26.201 CEST [97468]    DEBUG:  archived write-ahead log
> file "000000010000000100000084"
>
>
>
>
> OS:
> CentOS Linux release 7.7.1908 (Core) on a virtual machine (on a HyperV host)
> with 12 GB RAM
> uname -a:
> Linux dev_server 3.10.0-1062.1.1.el7.x86_64 #1 SMP Fri Sep 13 22:55:44 UTC
> 2019 x86_64 x86_64 x86_64 GNU/Linux
>
> We are using
> PostgreSQL 11.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623
> (Red Hat 4.8.5-36), 64-bit
>
> installed from repo
> https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-$releasever-$basearch
>
> Upgraded from PostgreSQL 10.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC)
> 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
> using pg_dumpall -> initdb -> psql load of dump (psql -f) -> vacuumdb -a -f
> -z
> initdb:
> /usr/pgsql-11/bin/initdb --pgdata=/pg_data/data/i02mspdev1/
> --waldir=/pg_data/wals/i02mspdev01/11 --pwprompt --encoding=UTF8
> --locale=en_US.UTF-8 --data-checksums
>
> barman -v
> 2.9
> Barman by 2ndQuadrant (www.2ndQuadrant.com)
> server config: path_prefix = "/usr/pgsql-11/bin"
>
>
> --changed postgresql.conf settings:
>
> 1    "listen_addresses"    "*"
> 2    "port"    "5433"
> 3    "max_connections"    "120"
> 4    "superuser_reserved_connections"    "5"
> 5    "unix_socket_directories"    "/var/run/modispostgresql, /tmp"
> 6    "tcp_keepalives_idle"    "4"
> 7    "tcp_keepalives_interval"    "4"
> 8    "tcp_keepalives_count"    "2"
> 9    "shared_buffers"    "2GB"
> 10    "temp_buffers"    "512MB"
> 11    "work_mem"    "16MB"
> 12    "maintenance_work_mem"    "512MB"
> 13    "dynamic_shared_memory_type"    "posix"
> 14    "effective_io_concurrency"    "200"
> 15    "wal_level"    "replica"
> 16    "wal_sync_method"    "fdatasync"
> 17    "wal_compression"    "on"
> 18    "wal_buffers"    "16MB"
> 19    "max_wal_size"    "2GB"
> 20    "min_wal_size"    "1GB"
> 21    "checkpoint_completion_target"    "0.9"
> 22    "archive_mode"    "on"
> 23    "archive_command"    "barman-wal-archive barman.server dev_server_5433 %p &&
> test ! -f /pg_data/walarch/i02mspdev1/11/%f && cp %p
> /pg_data/walarch/i02mspdev1/11/%f "
> 24    "wal_keep_segments"    "4"
> 25    "track_commit_timestamp"    "on"
> 26    "synchronous_standby_names"    "barman_receive_wal"
> 27    "hot_standby"    "off"
> 28    "random_page_cost"    "1.0"
> 29    "min_parallel_table_scan_size"    "4MB"
> 30    "from_collapse_limit"    "12"
> 31    "join_collapse_limit"    "16"
> 32    "log_destination"    "stderr"
> 33    "logging_collector"    "on"
> 34    "log_directory"    "log"
> 35    "log_filename"    "postgresql-%Y-%m-%d_%H%M%S.log"
> 36    "log_truncate_on_rotation"    "on"
> 37    "log_rotation_age"    "1d"
> 38    "log_rotation_size"    "10MB"
> 39    "syslog_ident"    "postgres_i02mspdev1"
> 40    "log_min_messages"    "debug1"
> 41    "log_line_prefix"    "%m [%p] %d %u %h "
> 42    "log_lock_waits"    "on"
> 43    "log_statement"    "ddl"
> 44    "log_replication_commands"    "on"
> 45    "log_timezone"    "Europe/Berlin"
> 46    "cluster_name"    "i02mspdev1"
> 47    "track_functions"    "pl"
> 48    "temp_tablespaces"    "/pg_tempts/i02mspdev1"
> 49    "datestyle"    "iso, mdy"
> 50    "timezone"    "Europe/Berlin"
> 51    "lc_messages"    "en_US.UTF-8"
> 52    "lc_monetary"    "en_US.UTF-8"
> 53    "lc_numeric"    "en_US.UTF-8"
> 54    "lc_time"    "en_US.UTF-8"
> 55    "default_text_search_config"    "pg_catalog.english"
> 56    "shared_preload_libraries"    "plugin_debugger,pg_stat_statements"
>
>
> kind regards
> Georg
>



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: pg_restore creates duplicate records when used with --role parameter
Следующее
От: Ivan Pantić
Дата:
Сообщение: Re: pg_restore creates duplicate records when used with --roleparameter