Re: patch: avoid heavyweight locking on hash metapage

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: patch: avoid heavyweight locking on hash metapage
Date: 2012-06-15 17:58:44
Message-ID: CAMkU=1zsjKXLian2OCqX0wt6+VUMuSTcyNDBKB_ZVaFj0SA6xg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 30, 2012 at 3:14 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I developed the attached patch to avoid taking a heavyweight lock on
> the metapage of a hash index.  Instead, an exclusive buffer content
> lock is viewed as sufficient permission to modify the metapage, and a
> shared buffer content lock is used when such modifications need to be
> prevented.  For the most part this is a trivial change, because we
> were already taking these locks: we were just taking the heavyweight
> locks in addition.  The only sticking point is that, when we're
> searching or inserting, we previously locked the bucket before
> releasing the heavyweight metapage lock, which is unworkable when
> holding only a buffer content lock because (1) we might deadlock and
> (2) buffer content locks can't be held for long periods of time even
> when there's no deadlock risk.  To fix this, I implemented a simple
> loop-and-retry system: we release the metapage content lock, acquire
> the heavyweight lock on the target bucket, and then reacquire the
> metapage content lock and check that the bucket mapping has not
> changed.   Normally it hasn't, and we're done.  But if by chance it
> has, we simply unlock the metapage, release the heavyweight lock we
> acquired previously, lock the new bucket, and loop around again.  Even
> in the worst case we cannot loop very many times here, since we don't
> split the same bucket again until we've split all the other buckets,
> and 2^N gets big pretty fast.

Do we need the retry flag (applies to two places)? If oldblkno is
still InvalidBlockNumber then it can't equal blkno.
I think the extra variable might be clearer than the magic value, but
we already have the magic value so do we want to have both a flag
variable and a magic value?

+ if (retry)
+ {
+ if (oldblkno == blkno)
+ break;
+ _hash_droplock(rel, oldblkno, HASH_SHARE);
+ }

In the README, the psuedo codes probably needs to mention re-locking
the meta page in the loop.

Also, "page" is used to mean either the disk page or the shared buffer
currently holding that page, depending on context. This is confusing.
Maybe we should clarify "Lock the meta page buffer". Of course this
gripe precedes your patch and applies to other parts of the code as
well, but since we mingle LW locks (on buffers) and heavy locks (on
names of disk pages) it might make sense to be more meticulous here.

"exclusive-lock page 0 (assert the right to begin a split)" is no
longer true, nor is "release X-lock on page 0"

Also in the README, section "To prevent deadlock we enforce these
coding rules:" would need to be changed as those rules are being
changed. But, should we change them at all?

In _hash_expandtable, the claim "But since we are only trylocking here
it should be OK" doesn't completely satisfy me. Even a conditional
heavy-lock acquire still takes a transient non-conditional LW Lock on
the lock manager partition. Unless there is a global rule that no one
can take a buffer content lock while holding a lock manager partition
lock, this seems dangerous. Could this be redone to follow the
pattern of heavy locking with no content lock, then reacquiring the
buffer content lock to check to make sure we locked the correct
things? I don't know if it would be better to loop, or just give up,
if the meta page changed underneath us.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Kreen 2012-06-15 18:21:51 [patch] libpq one-row-at-a-time API
Previous Message Josh Berkus 2012-06-15 17:47:58 Re: Strange behavior with pg_locks and partitioning