Bug in slot.c and are replication slots ever used at Window?

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Bug in slot.c and are replication slots ever used at Window?
Date: 2018-08-30 08:00:43
Message-ID: 9eb1a6d5-b66f-2640-598d-c5ea46b8f68a@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi hackers,

I am really confused.  If my conclusions are correct, then nobody ever
tried to use replication slots at Windows!
The function RestoreSlotFromDisk in slot.c contains the following code:

static void
RestoreSlotFromDisk(const char *name)
    ReplicationSlotOnDisk cp;
    int            i;
    char        path[MAXPGPATH + 22];
    int            fd;
    bool        restored = false;
    int            readBytes;
    pg_crc32c    checksum;

    /* no need to lock here, no concurrent access allowed yet */

    /* delete temp file if it exists */
    sprintf(path, "pg_replslot/%s/state.tmp", name);
    if (unlink(path) < 0 && errno != ENOENT)
                 errmsg("could not remove file \"%s\": %m", path)));

    sprintf(path, "pg_replslot/%s/state", name);

    elog(DEBUG1, "restoring replication slot from \"%s\"", path);

    fd = OpenTransientFile(path, O_RDWR | PG_BINARY);

     * We do not need to handle this as we are rename()ing the
directory into
     * place only after we fsync()ed the state file.
    if (fd < 0)
                 errmsg("could not open file \"%s\": %m", path)));

     * Sync state file before we're reading from it. We might have crashed
     * while it wasn't synced yet and we shouldn't continue on that basis.
    if (pg_fsync(fd) != 0)
        int            save_errno = errno;

        errno = save_errno;
                 errmsg("could not fsync file \"%s\": %m",

    /* Also sync the parent directory */
    fsync_fname(path, true);


Please notice that fsync_fname with comment "also sync parent directory"
is called for path of the file!
fsync_fname in turn does the following:

 * fsync_fname -- Try to fsync a file or directory
 * Ignores errors trying to open unreadable files, or trying to fsync
 * directories on systems where that isn't allowed/required. Reports
 * other errors non-fatally.
fsync_fname(const char *fname, bool isdir, const char *progname)
    int            fd;
    int            flags;
    int            returncode;

     * Some OSs require directories to be opened read-only whereas other
     * systems don't allow us to fsync files opened read-only; so we
need both
     * cases here.  Using O_RDWR will cause us to fail to fsync files
that are
     * not writable by our userid, but we assume that's OK.
    flags = PG_BINARY;
    if (!isdir)
        flags |= O_RDWR;
        flags |= O_RDONLY;

     * Open the file, silently ignoring errors about unreadable files (or
     * unsupported operations, e.g. opening a directory under Windows), and
     * logging others.
    fd = open(fname, flags);
    if (fd < 0)
        if (errno == EACCES || (isdir && errno == EISDIR))
            return 0;
        fprintf(stderr, _("%s: could not open file \"%s\": %s\n"),
                progname, fname, strerror(errno));
        return -1;

    returncode = fsync(fd);

So if "isdir" is true (and it is true in this case), it sets O_RDONLY flag.
Then fsync_fname successfully opens slot file in readonly mode and calls
fsync() which at windows
is substituted with _commit() which in turn is wrapper for FlushFileBuffers.
Finally FlushFileBuffers returns ERROR_ACCESS_DENINED which cause
assertion failure in _commit:

if ( !FlushFileBuffers((HANDLE)_get_osfhandle(filedes)) ) {
retval = GetLastError();
else {
retval = 0; /* return success */

/* map the OS return code to C errno value and return code */
if (retval == 0)
goto good;

_doserrno = retval;


errno = EBADF;
retval = -1;

_ASSERTE(("Invalid file descriptor. File possibly closed by a different thread",0));

I think that the problem happen only with debug version of postgres.
Release version will just return error in this case which is silently ignored by RestoreSlotFromDisk function.

I think that bug fix is trivial: we just need to use fsync_parent_path instead of fsync_fname in RestoreSlotFromDisk.

Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Banck 2018-08-30 08:13:31 Re: pg_verify_checksums -d option (was: Re: pg_verify_checksums -r option)
Previous Message Fabien COELHO 2018-08-30 07:33:41 Re[2]: doc - improve description of default privileges