Re: basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?
Date: 2015-01-26 21:03:03
Message-ID: 20150126210303.GD5568@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2015-01-22 19:56:07 +0100, Andres Freund wrote:
> Hi,
>
> On 2015-01-20 16:28:19 +0100, Andres Freund wrote:
> > I'm analyzing a problem in which a customer had a pg_basebackup (from
> > standby) created 9.2 cluster that failed with "WAL contains references to
> > invalid pages". The failed record was a "xlog redo visible"
> > i.e. XLOG_HEAP2_VISIBLE.
> >
> > First I thought there might be another bug along the line of
> > 17fa4c321cc. Looking at the code and the WAL that didn't seem to be the
> > case (man, I miss pg_xlogdump). Other, slightly older, standbys, didn't
> > seem to have any problems.
> >
> > Logs show that a ALTER DATABASE ... SET TABLESPACE ... was running when
> > the basebackup was started and finished *before* pg_basebackup finished.
> >
> > movedb() basically works in these steps:
> > 1) lock out users of the database
> > 2) RequestCheckpoint(IMMEDIATE|WAIT)
> > 3) DropDatabaseBuffers()
> > 4) copydir()
> > 5) XLogInsert(XLOG_DBASE_CREATE)
> > 6) RequestCheckpoint(CHECKPOINT_IMMEDIATE)
> > 7) rmtree(src_dbpath)
> > 8) XLogInsert(XLOG_DBASE_DROP)
> > 9) unlock database
> >
> > If a basebackup starts while 4) is in progress and continues until 7)
> > happens I think a pretty wide race opens: The basebackup can end up with
> > a partial copy of the database in the old tablespace because the
> > rmtree(old_path) concurrently was in progress. Normally such races are
> > fixed during replay. But in this case, the replay of the
> > XLOG_DBASE_CREATE will just try to do a rmtree(new); copydiar(old, new);.
> > fixing nothing.
> >
> > Besides making AD .. ST use sane WAL logging, which doesn't seem
> > backpatchable, I don't see what could be done against this except
> > somehow making basebackups fail if a AD .. ST is in progress. Which
> > doesn't look entirely trivial either.
>
> I basically have two ideas to fix this.
>
> 1) Make do_pg_start_backup() acquire a SHARE lock on
> pg_database. That'll prevent it from starting while a movedb() is
> still in progress. Then additionally add pg_backup_in_progress()
> function to xlog.c that checks (XLogCtl->Insert.exclusiveBackup ||
> XLogCtl->Insert.nonExclusiveBackups != 0). Use that in createdb() and
> movedb() to error out if a backup is in progress.

Attached is a patch trying to this. Doesn't look too bad and lead me to
discover missing recovery conflicts during a AD ST.

But: It doesn't actually work on standbys, because lock.c prevents any
stronger lock than RowExclusive from being acquired. And we need need a
lock that can conflict with WAL replay of DBASE_CREATE, to handle base
backups that are executed on the primary. Those obviously can't detect
whether any standby is currently doing a base backup...

I currently don't have a good idea how to mangle lock.c to allow
this. I've played with doing it like in the second patch, but that
doesn't actually work because of some asserts around ProcSleep - leading
to locks on database objects not working in the startup process (despite
already being used).

The easiest thing would be to just use a lwlock instead of a heavyweight
lock - but those aren't canceleable...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-01-26 21:05:53 Re: New CF app deployment
Previous Message Magnus Hagander 2015-01-26 21:01:23 Re: New CF app deployment