Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Marko Tiikkaja <pgmail(at)joh(dot)to>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "devrim(at)gunduz(dot)org" <devrim(at)gunduz(dot)org>
Subject: Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
Date: 2012-09-21 14:41:12
Message-ID: 201209211641.13259.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Friday, September 21, 2012 03:30:31 PM Marko Tiikkaja wrote:
> On 9/20/12 11:55 PM, Andres Freund wrote:
> > On Monday, September 17, 2012 03:58:37 PM Tom Lane wrote:
> >> OK, that explains why we've not seen a blizzard of trouble reports.
> >> Still seems like a good idea to fix it ASAP, though.
> >
> > Btw, I think RhodiumToad/Andrew Gierth and I some time ago helped a user
> > in the IRC Channel that had symptoms matching this bug.
>
> Another such user reporting in. :-(
>
> Our slave started accumulating WAL files and ran out of disk space
> yesterday. After investigation from Andres and Andrew, it turns out
> that we were most likely hit by this very same bug.
>
> Here's what they have to say:
> "If the db crashes between logging the split and the parent-node insert,
> then in recovery, since relpersistence is not initialized correctly,
> when the recovery process tries to complete the operation, no xlog
> record is written for the insert. If there's a slave server, then the
> missing xlog record for the insert means that the slave's
> incomplete_actions queue never becomes empty, therefore the slave can no
> longer do recovery restartpoints."
>
> Some relevant information:
>
> [cur:92/314BC870, xid:76872047, rmid:10(Heap), ... insert: ...
> [cur:92/314BC8F0, xid:76872047, rmid:11(Btree), ... split_r: ...
> [cur:92/314BCBD0, xid:0, rmid:0(XLOG), len/tot_len:56/88, info:0,
> prev:92/314BC8F0] checkpoint: redo 146/314BCBD0; ... shutdown
> ... "redo done at 92/314BC8F0",,,,,,,,"StartupXLOG, xlog.c:6641",""
Which means that an insert into the heap, triggered a btree split. At that
point the database crashed. During recovery the split was supposed to be
finished by the btree cleanup code.

> And apparently the relpersistence check in RelationNeedsWAL() call in
> _bt_insertonpg had a role in this as well.
When detecting an incomplete split the nbtree cleanup code calls
_bt_insert_parent, which calls _bt_insertonpg. Which finishes the split. BUT:
it doesn't log that it finished because RelationNeedsWal() says it doesn't need
to.

That means:
* indexes on stanbys will *definitely* be corrupted
* a standby won't perform any restartpoints anymore till restarted
* if the primary crashes corruption is likely.

Hrm. I retract my earlier statement about the low likelihood of corruption due
to this.

Greetings,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Heikki Linnakangas 2012-09-21 14:59:00 pgsql: Parse pg_ident.conf when it's loaded, keeping it in memory in pa
Previous Message Marko Tiikkaja 2012-09-21 13:30:31 Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2012-09-21 14:58:34 Re: [WIP] Patch : Change pg_ident.conf parsing to be the same as pg_hba.conf
Previous Message Tom Lane 2012-09-21 14:30:57 Re: 64-bit API for large object