Re: 9.2 recovery/startup problems

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.2 recovery/startup problems
Date: 2014-11-26 21:03:28
Message-ID: 20141126210328.GA19004@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2014-11-26 11:29:09 -0800, Jeff Janes wrote:
> On Wed, Nov 26, 2014 at 5:06 AM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
> wrote:
>
> > Jeff Janes wrote:
> >
> > > This is what I get in the log from the attempted restart:
> > >
> > > PST LOG: database system was interrupted; last known up at 2014-11-25
> > > 15:40:33 PST
> > > PST LOG: database system was not properly shut down; automatic recovery
> > in
> > > progress
> > > PST LOG: redo starts at 84/EF000080
> > > PST LOG: record with zero length at 84/EF09AE18
> > > PST LOG: redo done at 84/EF09AD28
> > > PST LOG: last completed transaction was at log time 2014-11-25
> > > 15:42:09.173599-08
> > > PST LOG: checkpoint starting: end-of-recovery immediate
> > > PST LOG: checkpoint complete: wrote 103 buffers (0.2%); 0 transaction
> > log
> > > file(s) added, 246 removed, 7 recycled; write=0.002 s, sync=0.020 s,
> > > total=0.526 s; sync files=51, longest=0.003 s, average=0.000 s
> > > PST FATAL: could not create file "base/16416/59288": File exists

Any chance you can reproduce this in a reproducible fashion?

> > Maybe fire up pg_xlogdump to see what xlog record references that
> > relfilenode.
> >
>
> pg_xlogdump doesn't exist yet in 9.2 (or can I use a newer one against the
> older files?).

Nope, you can't :(

> Immediately after the reboot, base/16416/59288 exists but is empty.
>
> After the failed start up attempt, base/16416/59288 and
> base/16416/59288_init both exist and are empty, with base/16416/59288
> having its pre-startup-attempt timestamp.
>
> So it seems like the code that copies the init fork over the main fork at
> the end of crash recovery is doing something wrong in this case. But I
> don't understand why it then succeeds at starting up the next time I try it.
>
>
> If I use Snaga's xlogdump (which I'm not sure is entirely correct under
> 9.2), the end of the stream looks like this:
>
> [cur:85/1008BD88, xid:4547590, rmid:10(Heap), len/tot_len:174/206, info:64,
> prev:85/1008BD58] hot_update: s/d/r:1663/16416/12636 block 16 off 41 to
> block 16 off 43
> [cur:85/1008BE58, xid:4547590, rmid:2(Storage), len/tot_len:16/48, info:16,
> prev:85/1008BD88] create rel: s/d/r:1663/16416/59288
> [cur:85/1008BE88, xid:4547590, rmid:8(Standby), len/tot_len:16/48, info:0,
> prev:85/1008BE58] standby
> [page:70, xlp_info:5, xlp_tli:2, xlp_pageaddr:85/1008C000]
> XLP_FIRST_IS_CONTRECORD XLP_BKP_REMOVABLE
> [cur:85/1008BEB8, xid:4547590, rmid:10(Heap), len/tot_len:28/7160, info:72,
> prev:85/1008BE88] hot_update: s/d/r:1663/16416/12636 block 30 off 32 to
> block 30 off 33
> [cur:85/1008BEB8, xid:4547590, rmid:10(Heap), len/tot_len:28/7160, info:72,
> prev:85/1008BE88] bkpblock[1]: s/d/r:1663/16416/12636 blk:30
> hole_off/len:156/1116

Hm. It looks like the problem here might be an ordering one. Notice that
there seems to be a record for the main relation, but not the init fork
in the excerpt.

It wonder if ExcecuteTruncate() does things in the wrong order.
/*
* Need the full transaction-safe pushups.
*
* Create a new empty storage file for the relation, and assign it
* as the relfilenode value. The old storage file is scheduled for
* deletion at commit.
*/
RelationSetNewRelfilenode(rel, RecentXmin, minmulti);
if (rel->rd_rel->relpersistence == RELPERSISTENCE_UNLOGGED)
heap_create_init_fork(rel);

Arguably that can cause problems if the node is promoted between the
RelationSetNewRelfilenode() and the heap_create_init_fork(). On the
other hand, the fork should essentially be 'invisible' in that case as
the transaction won't commit...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Maxim Boguk 2014-11-26 21:19:32 Re: BUG #12071: Stat collector went crasy (50MB/s constant writes)
Previous Message Tom Lane 2014-11-26 20:48:42 Re: bug in json_to_record with arrays