Re: Re: [BUGS] BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [BUGS] BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby
Date: 2014-06-26 04:45:19
Message-ID: 20140626044519.GJ7340@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Andres Freund wrote:
> On 2014-06-20 17:38:16 -0400, Alvaro Herrera wrote:

> > It seems to me that we need to keep the offsets files around until a
> > checkpoint has written the "oldest" number to WAL. In other words we
> > need additional state in shared memory: (a) what we currently store
> > which is the oldest number as computed by vacuum (not safe to delete,
> > but it's the number that the next checkpoint must write), and (b) the
> > oldest number that the last checkpoint wrote (the safe deletion point).
>
> Why not just WAL log truncations? If we'd emit the WAL record after
> determining the offsets page we should be safe I think? That seems like
> easier and more robust fix? And it's what e.g. the clog does.

Yes, I think this whole thing would be simpler if we just wal-logged the
truncations, like pg_clog does. But I would like to avoid doing that
for now, and do it in 9.5 only in the future. As a backpatchable (to
9.4/9.3) fix, I propose we do the following:

1. have vacuum update MultiXactState->oldestMultiXactId based on the
minimum value of pg_database->datminmxid. Since this value is saved in
pg_control, it is restored from checkpoint replay during recovery.

2. Keep track of a new value, MultiXactState->lastCheckpointedOldest.
This value is updated by CreateCheckPoint in a primary server after the
checkpoint record has been flushed, and by xlog_redo in a hot standby, to
be the MultiXactState->oldestMultiXactId value that was last flushed.

3. TruncateMultiXact() no longer receives a parameter. Files are
removed based on MultiXactState->lastCheckpointedOldest instead.

4. call TruncateMultiXact at checkpoint time, after the checkpoint WAL
record has been flushed, and at restartpoint time (just like today).
This means we only remove files that a prior checkpoint has already
registered as being no longer necessary. Also, if a recovery is
interrupted before end of WAL (recovery target), the files are still
present. So we no longer truncate during vacuum.

Another consideration for (4) is that right now we're only invoking
multixact truncation in a primary when we're able to advance
pg_database.datminmxid (see vac_update_datfrozenxid). The problem is
that after a crash and subsequent recovery, pg_database might be updated
without removing pg_multixact files; this would mean that the next
opportunity to remove files would be far in the future, when the minimum
datminmxid is advanced again. One way to fix that would be to have
every single call to vac_update_datfrozenxid() attempt multixact
truncation, but that seems wasteful since I expect vacuuming is more
frequent than checkpointing.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message 王学敏 2014-06-26 12:17:58 pg import text data to not null table comma error but semicolon right
Previous Message Devrim Gündüz 2014-06-25 14:45:41 Re: BUG #10297: yum - transaction check error

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2014-06-26 04:45:59 Re: idle_in_transaction_timeout
Previous Message Amit Kapila 2014-06-26 04:30:57 Re: Scaling shared buffer eviction