Re: [PATCHES] [WIP] shared locks

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [PATCHES] [WIP] shared locks
Date: 2005-04-27 23:05:40
Message-ID: 3122.1114643140@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Found another interesting thing while testing this. I got a core dump
from the Assert in GetMultiXactIdMembers, complaining that it was being
asked about a MultiXactId >= nextMXact. Sure enough, there was a
multixact on disk, left over from a previous core-dumped test, that was
larger than the nextMXact the current postmaster had started with.

My interpretation of this is that the MultiXact code is violating the
fundamental WAL rule, namely it is allowing data (multixact IDs in data
pages) to reach disk before the relevant WAL record (here the NEXTMULTI
record that should have advanced nextMXact) got to disk. It is very
easy for this to happen in the current system if the buffer page LSNs
aren't updated properly, because the bgwriter will be industriously
dumping dirty pages in the background.

AFAICS there isn't any very convenient way of propagating the true
location of the NEXTMULTI record into the page LSNs of the buffers that
heap_lock_tuple might stick relevant multi IDs into. What's probably
the easiest solution is for XLogPutNextMultiXactId to XLogFlush the
NEXTMULTI record before it returns. This is a mite annoying for
concurrency (because we'll have to hold MultiXactGenLock while flushing
xlog) but it should occur rarely enough to not be a huge deal.

At this point you're probably wondering why OID generation hasn't got
exactly the same problem, seeing that you borrowed all this logic from
the OID generator. The answer is that it would have the same problem,
except that an OID can only get onto disk as part of a tuple insert or
update, and all such events generate xlog records that must follow any
relevant NEXTOID record. Those records *will* get into the page LSNs,
and so the WAL rule is enforced.

So the problem would go away if heap_lock_tuple were generating any xlog
record of its own, which it might be doing by the time the 2PC dust
settles.

Plan B would be to decide that a multi ID that's >= nextMXact isn't
worthy of an Assert failure, but ought to be treated as just a dead
multixact. I'm kind of inclined to do that anyway, because I am not
convinced that this code guarantees no wraparound of multi IDs.

Thoughts?

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Juan Jose Costello Levien 2005-04-27 23:35:10 Developer Community
Previous Message Rod Taylor 2005-04-27 22:44:15 PITR bad restore possibility?

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2005-04-28 03:46:15 Re: [HACKERS] Continue transactions after errors in psql
Previous Message Bruce Momjian 2005-04-27 18:53:41 Re: Cleaning up unreferenced table files