BUG #15636: PostgreSQL 11.1 pg_basebackup backup to a CIFS destination throws fsync error at end of backup

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: jk7255(at)gmail(dot)com
Subject: BUG #15636: PostgreSQL 11.1 pg_basebackup backup to a CIFS destination throws fsync error at end of backup
Date: 2019-02-14 20:40:07
Message-ID: 15636-d380890dafd78fc6@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 15636
Logged by: John Klann
Email address: jk7255(at)gmail(dot)com
PostgreSQL version: 11.1
Operating system: Red Hat Enterprise Linux Server release 7.5
Description:

Issue:
- PostgreSQL 11.1 pg_basebackup and pg_dump parallel database backup to a
CIFS destination throws fsync error at the very end of the backup.
- Command: pg_basebackup -D /cifs/backups/<backupDirectoryName> -U
backupuser -Ft -Z 1 -X fetch -p 5432
- Error: pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>": Invalid argument

Details:
We are preparing to move to PostgreSQL 11.1 from 9.3.x, this move will be a
complete rebuild on new hardware. After setting up the new hardware
installing/configuring Rhel 7.5 (updated), installing/configuring PostgreSQL
11.1 I tested our migration process, parallel dump --> parellel restoring
databases worked without issue.

I started testing backups and that is when I came across the fsync error. I
have seen references that PostgreSQL does not support storing the data
directory on CIFS due to similar issues. Although I have not found any
reference to backing up to CIFS not being supported. I am able to fully
restore and recovery from these backups no issue and based off the research
I have done I would suspect some sort of issue of cifs not supporting the
fsync call on the containing directory level. I put examples below of all of
the testing I have performed that has also lead me to this conclusion.

Environment:
- Server
○ Model: Dell R740
○ RAM: 768 GB RDIMM 2666MT/s
○ Processor: Intel Xeon Gold 6146 3.2G 24.75MB
§ 2 Nodes, 12 cores each, HT = total cores of 48
○ Storage:
§ OS: local ssd in raid
§ pgdata, pgwal, pglog: each on their own dedicated EMC XIO luns attached
via 16 8Gbps paths (2x QLogic 2562, Dual Port 8Gb Optical Fibre Channel
HBAs)
□ XIO is XtremIO V1 in two brick clustered configuration
○ OS:
§ uname -a:
□ Linux 3.10.0-862.14.4.el7.x86_64 #1 SMP Fri Sep 21 09:07:21 UTC 2018
x86_64 x86_64 x86_64 GNU/Linux
§ cat /etc/redhat-release:
□ Red Hat Enterprise Linux Server release 7.5 (Maipo)
○ PostgreSQL
§ PostgreSQL 11.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-28), 64-bit
§ Custom Configs:
[postgres(at)servername data1]$ cat postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
port = '5432'
listen_addresses = '0.0.0.0'
work_mem = '8MB'
maintenance_work_mem = '1GB'
random_page_cost = '1.0'
track_functions = 'all'
wal_buffers = '-1'
checkpoint_timeout = '10min'
checkpoint_completion_target = '0.9'
checkpoint_warning = '30s'
log_destination = 'csvlog'
log_directory = '/dbalog/data1'
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log'
log_min_messages = 'error'
log_min_error_statement = 'error'
log_line_prefix = '[%u] [%d] [%h] [%m] [%p]:>'
log_rotation_size = '10MB'
log_statement = 'ddl'
shared_buffers = '8GB'
max_connections = '500'
effective_cache_size = '589824 MB'
wal_level = 'replica'
max_wal_senders = '2'
archive_mode = 'on'
archive_command = 'test ! -f /pgxlog1/data1/%f || cp /pgxlog1/data1/%f
/cifs/backups/<backupDirectoryName>/dmp/archive/%f'
log_disconnections = 'on'
standard_conforming_strings = 'off'
§ Databases
□ 4 dbs
® 3 - very small < 3 GB total
® 1 - 964.67 GB
□ Load: OLTP, DW mix
- CIFS (Backup destination)
○ Windows 2012 R2 (last patched September/2018)
○ VNX Block Storage lun configured for CIFS using NTFS

Testing/Reproduction:
- T1:
○ basebackup to windows cifs
§ Command:
□ pg_basebackup -D /cifs/backups/<backupDirectoryName> -U backupuser -Ft
-Z 1 -X fetch -p 5432
§ Error: pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>": Invalid argument
- T2:
○ Same backup as T1 with much less data - same result
- T3:
○ Same as T2 with -N (--no-sync) option - no error (fairly obvious why)
- T4
○ basebackup same dataset as T2, no tar, no compression going to windows
cifs
§ Command:
□ pg_basebackup -D /cifs/backups/dbadb1linbos/5432/dmp/basebkp -U
backupuser -X fetch -p 5432
§ Same error seems to happen on all directories:
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/base/1": Invalid argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/base/13877": Invalid argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/base/13878": Invalid argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/base/16397": Invalid argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/base/16660": Invalid argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/base": Invalid argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/global": Invalid argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/log": Invalid argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/pg_commit_ts": Invalid
argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/pg_dynshmem": Invalid
argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/pg_logical/mappings": Invalid
argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/pg_logical/snapshots": Invalid
argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/pg_logical": Invalid argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/pg_multixact/members": Invalid
argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/pg_multixact/offsets": Invalid
argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/pg_multixact": Invalid
argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/pg_notify": Invalid argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/pg_replslot": Invalid
argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/pg_serial": Invalid argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/pg_snapshots": Invalid
argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/pg_stat": Invalid argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/pg_stat_tmp": Invalid
argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/pg_subtrans": Invalid
argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/pg_tblspc": Invalid argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/pg_twophase": Invalid
argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/pg_wal/archive_status": Invalid
argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/pg_wal": Invalid argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/pg_xact": Invalid argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp": Invalid argument
pg_basebackup: could not fsync file
"/cifs/backups/<backupDirectoryName>/basebkp/pg_tblspc": Invalid argument
- T5
○ pg_dump single threaded backup of a database to windows cifs - no
error
- T6
○ pg_dump multithreaded backup of a database to windows cifs - same fsync
error on containing directory
§ Command: pg_dump -p 5432 -j 16 -f
/cifs/backups/<backupDirectoryName>/dmp/DBA_02132019_133120 -U backupuser
-Fd -d DBA
§ Error:
□ pg_dump: could not fsync file
"/cifs/backups/<backupDirectoryName>/dmp/DBA_02132019_133120": Invalid
argument
- T7
○ Same as T6 but with --no-sync option - no error
- T8
○ Same as T1 but to local storage (XIO SAN attached Lun ext4)
§ No error
- T9
○ Same as T1 but to linux CIFS share with XIO SAN attached lun ext4 (same
version of linux)
§ Same Fsync error
- T10
○ Same as T1 but to linux NFS share with XIO SAN attached lun ext4 (same
version of linux)
§ No Error
Questions:
- Is backing up to CIFS supported?
- Based on research and other reported issues that the error may mean that
cifs handles this call differently or already performs this action itself,
should this error bring the integrity of the backup into question?
○ In what scenarios would it bring the integrity into question?
○ Issue reference, see bug #6372 thread:
https://www.postgresql.org/message-id/1149.1325535272%40sss.pgh.pa.us
- Is there a workaround or configuration that we could use that maintains
using fsync to our current windows CIFS configuration?
- Is there a better way to check integrity of backup rather than restoring
and performing a dump backup?
- Recommended steps forward?

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2019-02-14 22:06:18 Re: BUG #15636: PostgreSQL 11.1 pg_basebackup backup to a CIFS destination throws fsync error at end of backup
Previous Message jfinzel 2019-02-14 20:33:43 Re: Segmentation Fault in logical decoding get/peek API