Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From: Marko Tiikkaja <pgmail(at)joh(dot)to>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "devrim(at)gunduz(dot)org" <devrim(at)gunduz(dot)org>
Subject: Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
Date: 2012-09-21 13:30:31
Message-ID: 505C6BF7.3090808@joh.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On 9/20/12 11:55 PM, Andres Freund wrote:
> On Monday, September 17, 2012 03:58:37 PM Tom Lane wrote:
>> OK, that explains why we've not seen a blizzard of trouble reports.
>> Still seems like a good idea to fix it ASAP, though.
> Btw, I think RhodiumToad/Andrew Gierth and I some time ago helped a user in the
> IRC Channel that had symptoms matching this bug.

Another such user reporting in. :-(

Our slave started accumulating WAL files and ran out of disk space
yesterday. After investigation from Andres and Andrew, it turns out
that we were most likely hit by this very same bug.

Here's what they have to say:
"If the db crashes between logging the split and the parent-node insert,
then in recovery, since relpersistence is not initialized correctly,
when the recovery process tries to complete the operation, no xlog
record is written for the insert. If there's a slave server, then the
missing xlog record for the insert means that the slave's
incomplete_actions queue never becomes empty, therefore the slave can no
longer do recovery restartpoints."

Some relevant information:

[cur:92/314BC870, xid:76872047, rmid:10(Heap), len/tot_len:91/123,
info:0, prev:92/314BB890] insert: s/d/r:1663/408841/415746
blk/off:13904/65 header: t_infomask2 8 t_infomask 2050 t_hoff 24
[cur:92/314BC8F0, xid:76872047, rmid:11(Btree), len/tot_len:702/734,
info:64, prev:92/314BC870] split_r: s/d/r:1663/408841/475676 leftsib 2896
[cur:92/314BCBD0, xid:0, rmid:0(XLOG), len/tot_len:56/88, info:0,
prev:92/314BC8F0] checkpoint: redo 146/314BCBD0; tli 1; nextxid
76872048; nextoid 764990; nextmulti 62062; nextoffset 132044; shutdown
at 2012-09-11 14:26:26 CEST

2012-09-11 14:26:26.719 CEST,,,44620,,504f2df2.ae4c,5,,2012-09-11
14:26:26 CEST,,0,LOG,00000,"redo done at
92/314BC8F0",,,,,,,,"StartupXLOG, xlog.c:6641",""

And apparently the relpersistence check in RelationNeedsWAL() call in
_bt_insertonpg had a role in this as well.

Regards,
Marko Tiikkaja

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Andres Freund 2012-09-21 14:41:12 Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
Previous Message Heikki Linnakangas 2012-09-21 12:26:56 pgsql: Fix obsolete comment.

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2012-09-21 13:32:32 Re: pg_reorg in core?
Previous Message Alvaro Herrera 2012-09-21 13:20:36 Re: [v9.3] Extra Daemons (Re: elegant and effective way for running jobs inside a database)