Re: Report: race conditions in WAL replay routines

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Report: race conditions in WAL replay routines
Date: 2012-02-05 20:55:20
Message-ID: CA+U5nMJh_HCTzUdHR9mE5ncOhzjMGCfmxgctKdW0bHVTjWoHGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 5, 2012 at 7:18 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Pursuant to the recent discussion about bugs 6200 and 6245, I went
> trawling through all the WAL redo routines looking for bugs such as
> inadequate locking or transiently trashing shared buffers.  Here's
> what I found:
>
> * As we already knew, RestoreBkpBlocks is broken because it transiently
> trashes a shared buffer, which another process could be accessing while
> holding only a pin.

Agreed

> * seq_redo() has the same disease, since we allow SELECT * FROM
> sequences.

Why do we do that?

> * Everything else seems to be free of that specific issue; in particular
> the index-related replay routines are at fairly low risk since we don't
> have any coding rules allowing index pages to be examined without
> holding a buffer lock.

Yep

> * There are assorted replay routines that assume they can whack fields
> of ShmemVariableCache around without any lock.  However, it's pretty
> inconsistent; about half do it like that, while the other half assume
> that they can read ShmemVariableCache without lock but should acquire
> lock to modify it.  I think the latter coding rule is a whole lot safer
> in the presence of Hot Standby and should be adopted throughout.

Agreed

> * Same goes for MultiXactSetNextMXact and MultiXactAdvanceNextMXact.
> It's not entirely clear to me that no read-only transaction can ever
> examine the shared-memory variables they change.  In any case, if there
> is in fact no other process examining those variables, there can be no
> contention for the lock so it should be cheap to get.

Row locking requires a WAL record to be written, so that whole path is
dead during HS.

> * Not exactly a race condition, but: tblspc_redo does ereport(ERROR)
> if it fails to clean out tablespace directories.  This seems to me to be
> the height of folly, especially when the failure is more or less an
> expected case.  If the error occurs the database is dead in the water,
> because that error is actually a PANIC and will recur on subsequent
> restart attempts.  Therefore there is no way to recover short of manual
> intervention to clean out the non-empty directory.  And why are we
> pulling the fire alarm like this?  Well, uh, it's because we might fail
> to recover some disk space in the dropped tablespace.  Seems to me to be
> a lot better to just elog(LOG) and move on.  This is quite analogous to
> the case of failing to unlink a file after commit --- wasting disk space
> might be bad, but it's very much the lesser evil compared to this.

If the sysadmin is managing the db properly then this shouldn't ever
happen - the only cause is if the tablespace being dropped is being
used as a temp tablespace on the standby.

The ERROR is appropriate because we first try to remove the files. If
they won't go we then raise an all-session conflict and then try
again. Only when we fail the second time does it ERROR, which seems
OK.

If you just LOG, when exactly would we get rid of the tablespace?

> Barring objections I'm going to fix all this stuff and back-patch as
> far as 9.0 where hot standby was added.

Please post the patch rather than fixing directly. There's some subtle
stuff there and it would be best to discuss first.

Thanks

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-02-05 21:03:33 Re: Report: race conditions in WAL replay routines
Previous Message Simon Riggs 2012-02-05 20:42:47 Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple