Re: logical replication: could not create file "state.tmp": File exists

From: Dmitry Vasiliev <dmitry(dot)vasiliev(at)coins(dot)ph>
To: Grigory Smolkin <g(dot)smolkin(at)postgrespro(dot)ru>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: logical replication: could not create file "state.tmp": File exists
Date: 2019-12-02 17:35:53
Message-ID: CANCe5h0su4Jn7giDhWs0He=QSSXGEAWzijApet5K2PMSO9j5dQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Here's what happened from the publisher and subscriber point of view:

publisher: (some query) ERROR: could not write to tuplestore temporary
file: No space left on device
subscriber: db =, user =, app =, client = ERROR: could not receive data
from WAL stream: ERROR: could not write to file
"pg_logical/snapshots/2AE-F3E52FB8.snap.27574.tmp": No space left on device
subscriber: db =, user =, app =, client = LOG: background worker "logical
replication worker" (PID 23114) exited with exit code 1
subscriber: db =, user =, app =, client = LOG: logical replication apply
worker for subscription "<name> _sub" has started
publisher: LOG: received replication command: IDENTIFY_SYSTEM
publisher: LOG: received replication command: START_REPLICATION SLOT
"<name> _sub" LOGICAL 2AE/F3C0B920 (proto_version '1', publication_names
'"<name>_pub"')
publisher: ERROR: could not create file "pg_replslot
/<name>_sub/state.tmp": File exists

I think some publisher logs may not be available due to out of space
problem.

On Mon, Dec 2, 2019 at 7:54 PM Grigory Smolkin <g(dot)smolkin(at)postgrespro(dot)ru>
wrote:

>
> On 12/2/19 7:12 PM, Andres Freund wrote:
> > Hi,
> >
> > On 2019-11-30 15:09:39 +0300, Grigory Smolkin wrote:
> >> One of my colleagues encountered an out of space condition, which broke
> his
> >> logical replication setup.
> >> It`s manifested with the following errors:
> >>
> >> ERROR: could not receive data from WAL stream: ERROR: could not create
> >> file "pg_replslot/some_sub/state.tmp": File exists
> > Hm. What was the log output leading to this state? Some cases of this
> > would end up in a PANIC, which'd remove the .tmp file during
> > recovery. But there's some where we won't - it seems the right fix for
> > this would be to unlink the tmp file in that case?
> >
> >
> >> I`ve digged a bit into this problem, and it`s turned out that in
> >> SaveSlotToPath() temp file for replication slot is opened with 'O_CREAT
> |
> >> O_EXCL' flags, which makes this routine as not very reentrant.
> >>
> >> Since an exclusive lock is taken before temp file creation, I think it
> >> should be safe to replace O_EXCL with O_TRUNC.
> > I'm very doubtful about this. I think it's a good safety measure to
> > ensure that there's no previous state file that we're somehow
> > overwriting.
> Is it possible with exclusive lock taken before that?
> >
> >
> >> Script to reproduce and patch are attached.
> > Well:
> >
> >> # Imitate out_of_space/write_operation_error
> >> touch ${PGDATA_PUB}/pg_replslot/mysub/state.tmp
> > Doesn't really replicate how we got into this state...
>
> But it replicate the exactly the same state we would get, if write() to
> temp file would have failed with out of space.
>
>
> >
> > Greetings,
> >
> > Andres Freund
> >
> >
> --
> Grigory Smolkin
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
>
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Grigory Smolkin 2019-12-02 17:49:04 Re: logical replication: could not create file "state.tmp": File exists
Previous Message Grigory Smolkin 2019-12-02 16:54:50 Re: logical replication: could not create file "state.tmp": File exists