|From:||Andres Freund <andres(at)2ndquadrant(dot)com>|
|Subject:||Re: basebackups during ALTER DATABASE ... SET TABLESPACE ... not safe?|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On 2015-01-22 19:56:07 +0100, Andres Freund wrote:
> On 2015-01-20 16:28:19 +0100, Andres Freund wrote:
> > I'm analyzing a problem in which a customer had a pg_basebackup (from
> > standby) created 9.2 cluster that failed with "WAL contains references to
> > invalid pages". The failed record was a "xlog redo visible"
> > i.e. XLOG_HEAP2_VISIBLE.
> > First I thought there might be another bug along the line of
> > 17fa4c321cc. Looking at the code and the WAL that didn't seem to be the
> > case (man, I miss pg_xlogdump). Other, slightly older, standbys, didn't
> > seem to have any problems.
> > Logs show that a ALTER DATABASE ... SET TABLESPACE ... was running when
> > the basebackup was started and finished *before* pg_basebackup finished.
> > movedb() basically works in these steps:
> > 1) lock out users of the database
> > 2) RequestCheckpoint(IMMEDIATE|WAIT)
> > 3) DropDatabaseBuffers()
> > 4) copydir()
> > 5) XLogInsert(XLOG_DBASE_CREATE)
> > 6) RequestCheckpoint(CHECKPOINT_IMMEDIATE)
> > 7) rmtree(src_dbpath)
> > 8) XLogInsert(XLOG_DBASE_DROP)
> > 9) unlock database
> > If a basebackup starts while 4) is in progress and continues until 7)
> > happens I think a pretty wide race opens: The basebackup can end up with
> > a partial copy of the database in the old tablespace because the
> > rmtree(old_path) concurrently was in progress. Normally such races are
> > fixed during replay. But in this case, the replay of the
> > XLOG_DBASE_CREATE will just try to do a rmtree(new); copydiar(old, new);.
> > fixing nothing.
> > Besides making AD .. ST use sane WAL logging, which doesn't seem
> > backpatchable, I don't see what could be done against this except
> > somehow making basebackups fail if a AD .. ST is in progress. Which
> > doesn't look entirely trivial either.
> I basically have two ideas to fix this.
> 1) Make do_pg_start_backup() acquire a SHARE lock on
> pg_database. That'll prevent it from starting while a movedb() is
> still in progress. Then additionally add pg_backup_in_progress()
> function to xlog.c that checks (XLogCtl->Insert.exclusiveBackup ||
> XLogCtl->Insert.nonExclusiveBackups != 0). Use that in createdb() and
> movedb() to error out if a backup is in progress.
Attached is a patch trying to this. Doesn't look too bad and lead me to
discover missing recovery conflicts during a AD ST.
But: It doesn't actually work on standbys, because lock.c prevents any
stronger lock than RowExclusive from being acquired. And we need need a
lock that can conflict with WAL replay of DBASE_CREATE, to handle base
backups that are executed on the primary. Those obviously can't detect
whether any standby is currently doing a base backup...
I currently don't have a good idea how to mangle lock.c to allow
this. I've played with doing it like in the second patch, but that
doesn't actually work because of some asserts around ProcSleep - leading
to locks on database objects not working in the startup process (despite
already being used).
The easiest thing would be to just use a lwlock instead of a heavyweight
lock - but those aren't canceleable...
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
|Next Message||Andres Freund||2015-01-26 21:05:53||Re: New CF app deployment|
|Previous Message||Magnus Hagander||2015-01-26 21:01:23||Re: New CF app deployment|