Re: BUG #16029: pg_basebackup - At least one WAL file is missing

From: "Georg H(dot)" <silentx(at)arcor(dot)de>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16029: pg_basebackup - At least one WAL file is missing
Date: 2019-10-18 10:14:49
Message-ID: 00a1572e-e372-6f87-8902-effbe7a33640@arcor.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

is there any missing information or things I could check to get this
issue solved?
We are planning to migrate our production clusters in the second week of
November and I'm worried about to run into the same issue with them.
Any hint would be appeciated

kind regards

Georg

Am 27.09.2019 um 14:11 schrieb PG Bug reporting form:
> The following bug has been logged on the website:
>
> Bug reference: 16029
> Logged by: Georg H.
> Email address: silentx(at)arcor(dot)de
> PostgreSQL version: 11.5
> Operating system: CentOS Linux release 7.7.1908
> Description:
>
> Hello,
>
> I've upgraded 2 clusters (single, no HA / replication except BARMANs
> synchronous WAL streaming) as described below from v10.10 to v11.5.
> Both clusters are on the same server, using same file systems, having
> identical postgresql.conf settings (except port / cluster name / folders)
> and have more or less same content (2 development stages)
> We take sequential backups of both clusters using BARMAN with backup_method
> = postgres (If set to postgres barman will use the pg_basebackup command to
> execute the backup).
> BARMAN conf for both clusters is also identical and using same PostgreSQL
> 11.5 binaries out of the same repo as the DB-server uses.
>
> Since the upgrade the backup of one of both clusters is failing 2-3 times a
> week with "error=At least one WAL file is missing. The first missing WAL
> file is xxx".
> In fact the "missing" WAL is archived to the backup server 2 seconds after
> "pg_stop_backup complete" (based on postgresql.log).
> The second cluster doesn't have this problem at all.
> I'm not sure if this is a bug or just a configuration problem.
>
> Noticeable is a difference between backup.info file (end_wal) and backup
> history file (STOP WAL LOCATION) contained in the basebackup.
> When failing, the backup history file is the one of STOP WAL LOCATION and
> the next WAL is in the error marked as missing as it's set as end_wal in
> backup.info
> e.g. (see full content below):
> backup history file is
> 000000010000000100000083.00000028.backup saying STOP WAL LOCATION:
> 1/83000130 (file 000000010000000100000083)
> and missing WAL = end_wal = 000000010000000100000084
>
> I would expect, that "end_wal = STOP WAL LOCATION", so pointing to the same
> WAL as it does when backup succeed.
>
> --backup.info
> backup_label=None
> begin_offset=40
> begin_time=2019-09-26 23:35:02+02:00
> begin_wal=000000010000000100000083
> begin_xlog=1/83000028
> config_file=/pg_data/data/i02mspdev1/postgresql.conf
> copy_stats={'total_time': 23.154973, 'copy_time': 23.154973}
> deduplicated_size=2038909210
> end_offset=96
> end_time=2019-09-26 23:35:25.052851+02:00
> end_wal=000000010000000100000084
> end_xlog=1/84000060
> error=At least one WAL file is missing. The first missing WAL file is
> 000000010000000100000084
> hba_file=/pg_data/data/i02mspdev1/pg_hba.conf
> ident_file=/pg_data/data/i02mspdev1/pg_ident.conf
> included_files=None
> mode=postgres
> pgdata=/pg_data/data/i02mspdev1
> server_name=dev_server_5433
> size=2038909210
> status=FAILED
> tablespaces=[('indexspace', 16400, '/pg_data/indexts/i02mspdev1')]
> timeline=1
> version=110005
> xlog_segment_size=16777216
> --backup.info end
>
> --backup history
> START WAL LOCATION: 1/83000028 (file 000000010000000100000083)
> STOP WAL LOCATION: 1/83000130 (file 000000010000000100000083)
> CHECKPOINT LOCATION: 1/83000060
> BACKUP METHOD: streamed
> BACKUP FROM: master
> START TIME: 2019-09-26 23:35:02 CEST
> LABEL: pg_basebackup base backup
> START TIMELINE: 1
> STOP TIME: 2019-09-26 23:35:21 CEST
> STOP TIMELINE: 1
> --backup history end
>
> --excerpt out of postgresql.log (Level DEBUG1) :
> 2019-09-26 23:35:01.732 CEST [166017] [unknown] streaming_barman 10.x.x.101
> LOG: received replication command: IDENTIFY_SYSTEM
> 2019-09-26 23:35:02.010 CEST [166018] [unknown] streaming_barman 10.x.x.101
> LOG: received replication command: SHOW data_directory_mode
> 2019-09-26 23:35:02.011 CEST [166018] [unknown] streaming_barman 10.x.x.101
> LOG: received replication command: SHOW wal_segment_size
> 2019-09-26 23:35:02.023 CEST [166018] [unknown] streaming_barman 10.x.x.101
> LOG: received replication command: IDENTIFY_SYSTEM
> 2019-09-26 23:35:02.038 CEST [166018] [unknown] streaming_barman 10.x.x.101
> LOG: received replication command: BASE_BACKUP LABEL 'pg_basebackup base
> backup' FAST
> 2019-09-26 23:35:02.057 CEST [97464] DEBUG: performing replication slot
> checkpoint
> 2019-09-26 23:35:05.090 CEST [97468] DEBUG: archived write-ahead log
> file "000000010000000100000082"
> 2019-09-26 23:35:13.901 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG: file "pg_internal.init" excluded from backup
> 2019-09-26 23:35:13.915 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG: contents of directory "pg_dynshmem" excluded from backup
> 2019-09-26 23:35:13.915 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG: contents of directory "pg_notify" excluded from backup
> 2019-09-26 23:35:13.915 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG: contents of directory "pg_serial" excluded from backup
> 2019-09-26 23:35:13.915 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG: contents of directory "pg_snapshots" excluded from backup
> 2019-09-26 23:35:13.915 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG: contents of directory "pg_subtrans" excluded from backup
> 2019-09-26 23:35:13.916 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG: file "pg_internal.init" excluded from backup
> 2019-09-26 23:35:14.060 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG: file "pg_internal.init" excluded from backup
> 2019-09-26 23:35:14.120 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG: file "pg_internal.init" excluded from backup
> 2019-09-26 23:35:17.744 CEST [166068] DEBUG: autovacuum: processing
> database "postgres"
> 2019-09-26 23:35:21.325 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG: contents of directory "pg_replslot" excluded from backup
> 2019-09-26 23:35:21.325 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG: contents of directory "pg_stat_tmp" excluded from backup
> 2019-09-26 23:35:21.421 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG: file "postmaster.opts" excluded from backup
> 2019-09-26 23:35:21.421 CEST [166018] [unknown] streaming_barman 10.x.x.101
> DEBUG: file "postmaster.pid" excluded from backup
> 2019-09-26 23:35:22.682 CEST [97468] DEBUG: archived write-ahead log
> file "000000010000000100000083"
> 2019-09-26 23:35:23.569 CEST [97468] DEBUG: archived write-ahead log
> file "000000010000000100000083.00000028.backup"
> 2019-09-26 23:35:24.449 CEST [166018] [unknown] streaming_barman 10.x.x.101
> NOTICE: pg_stop_backup complete, all required WAL segments have been
> archived
> 2019-09-26 23:35:25.547 CEST [166098] postgres barman 10.x.x.101 LOG:
> restore point "barman_20190926T233501" created at 1/85000090
> 2019-09-26 23:35:25.547 CEST [166098] postgres barman 10.x.x.101 STATEMENT:
> SELECT pg_create_restore_point('barman_20190926T233501')
> 2019-09-26 23:35:26.201 CEST [97468] DEBUG: archived write-ahead log
> file "000000010000000100000084"
>
>
>
>
> OS:
> CentOS Linux release 7.7.1908 (Core) on a virtual machine (on a HyperV host)
> with 12 GB RAM
> uname -a:
> Linux dev_server 3.10.0-1062.1.1.el7.x86_64 #1 SMP Fri Sep 13 22:55:44 UTC
> 2019 x86_64 x86_64 x86_64 GNU/Linux
>
> We are using
> PostgreSQL 11.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623
> (Red Hat 4.8.5-36), 64-bit
>
> installed from repo
> https://download.postgresql.org/pub/repos/yum/11/redhat/rhel-$releasever-$basearch
>
> Upgraded from PostgreSQL 10.10 on x86_64-pc-linux-gnu, compiled by gcc (GCC)
> 4.8.5 20150623 (Red Hat 4.8.5-36), 64-bit
> using pg_dumpall -> initdb -> psql load of dump (psql -f) -> vacuumdb -a -f
> -z
> initdb:
> /usr/pgsql-11/bin/initdb --pgdata=/pg_data/data/i02mspdev1/
> --waldir=/pg_data/wals/i02mspdev01/11 --pwprompt --encoding=UTF8
> --locale=en_US.UTF-8 --data-checksums
>
> barman -v
> 2.9
> Barman by 2ndQuadrant (www.2ndQuadrant.com)
> server config: path_prefix = "/usr/pgsql-11/bin"
>
>
> --changed postgresql.conf settings:
>
> 1 "listen_addresses" "*"
> 2 "port" "5433"
> 3 "max_connections" "120"
> 4 "superuser_reserved_connections" "5"
> 5 "unix_socket_directories" "/var/run/modispostgresql, /tmp"
> 6 "tcp_keepalives_idle" "4"
> 7 "tcp_keepalives_interval" "4"
> 8 "tcp_keepalives_count" "2"
> 9 "shared_buffers" "2GB"
> 10 "temp_buffers" "512MB"
> 11 "work_mem" "16MB"
> 12 "maintenance_work_mem" "512MB"
> 13 "dynamic_shared_memory_type" "posix"
> 14 "effective_io_concurrency" "200"
> 15 "wal_level" "replica"
> 16 "wal_sync_method" "fdatasync"
> 17 "wal_compression" "on"
> 18 "wal_buffers" "16MB"
> 19 "max_wal_size" "2GB"
> 20 "min_wal_size" "1GB"
> 21 "checkpoint_completion_target" "0.9"
> 22 "archive_mode" "on"
> 23 "archive_command" "barman-wal-archive barman.server dev_server_5433 %p &&
> test ! -f /pg_data/walarch/i02mspdev1/11/%f && cp %p
> /pg_data/walarch/i02mspdev1/11/%f "
> 24 "wal_keep_segments" "4"
> 25 "track_commit_timestamp" "on"
> 26 "synchronous_standby_names" "barman_receive_wal"
> 27 "hot_standby" "off"
> 28 "random_page_cost" "1.0"
> 29 "min_parallel_table_scan_size" "4MB"
> 30 "from_collapse_limit" "12"
> 31 "join_collapse_limit" "16"
> 32 "log_destination" "stderr"
> 33 "logging_collector" "on"
> 34 "log_directory" "log"
> 35 "log_filename" "postgresql-%Y-%m-%d_%H%M%S.log"
> 36 "log_truncate_on_rotation" "on"
> 37 "log_rotation_age" "1d"
> 38 "log_rotation_size" "10MB"
> 39 "syslog_ident" "postgres_i02mspdev1"
> 40 "log_min_messages" "debug1"
> 41 "log_line_prefix" "%m [%p] %d %u %h "
> 42 "log_lock_waits" "on"
> 43 "log_statement" "ddl"
> 44 "log_replication_commands" "on"
> 45 "log_timezone" "Europe/Berlin"
> 46 "cluster_name" "i02mspdev1"
> 47 "track_functions" "pl"
> 48 "temp_tablespaces" "/pg_tempts/i02mspdev1"
> 49 "datestyle" "iso, mdy"
> 50 "timezone" "Europe/Berlin"
> 51 "lc_messages" "en_US.UTF-8"
> 52 "lc_monetary" "en_US.UTF-8"
> 53 "lc_numeric" "en_US.UTF-8"
> 54 "lc_time" "en_US.UTF-8"
> 55 "default_text_search_config" "pg_catalog.english"
> 56 "shared_preload_libraries" "plugin_debugger,pg_stat_statements"
>
>
> kind regards
> Georg
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Ivan Pantić 2019-10-18 11:17:26 Re: pg_restore creates duplicate records when used with --role parameter
Previous Message Tom Lane 2019-10-18 09:13:36 Re: pg_restore creates duplicate records when used with --role parameter