Re: Shared row locking

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>, Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Shared row locking
Date: 2004-12-17 03:37:18
Message-ID: 200412170337.iBH3bI526678@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> writes:
> > Using a B-tree
>
> > At transaction end, nothing special happens (tuples are not unlocked
> > explicitly).
>
> I don't think that works, because there is no guarantee that an entry
> will get cleaned out before the XID counter wraps around. Worst case,
> you might think that a tuple is locked when the XID is left over from
> the previous cycle. (Possibly this could be avoided by cleaning out old
> XIDs in this table whenever we truncate pg_clog, but that seems a tad
> messy.) I'm also a bit concerned about how we avoid table bloat if
> there's no proactive cleanup at transaction end.
>
> I think I like the pg_clog-modeled structure a bit better. However it
> could be objected that that puts a hard limit of 4G share-locked tuples
> at any one time.
>
> In the clog-modeled idea, it wasn't real clear how you decide whether to
> assign a new counter value to a previously locked row, or reuse its
> previous counter. You must *not* assign a new value when the existing
> entry still has bits set, but you probably do want to be aggressive
> about assigning new values when you can; else it gets tough to be sure
> that the log can be truncated in a reasonable time.

I assume you check and if all the bits are zero, you don't reuse it and
get a new counter. In fact you shouldn't reuse it in case the log is
being truncated while you are looking. :-)

> ISTM that your description is conflating several orthogonal issues:
> how do we identify entries in this data structure (by CTID, or a shared
> counter that increments each time a new lock is acquired); how do we
> index the data structure (btree or linear array); and what is stored in
> each entry (array of XIDs, or bitmap indexed by BackendId). Not all of
> the eight combinations work, but we do have more alternatives than the
> two offered, even without coming up with any new ideas ;-)

True. The only advantage to a bitmap vs. just a counter of locked
backends is that you can clean out your own backend bits from the table
without having to record them in your memory. However, because
recording your own counters in local memory doesn't require fixed shared
memory we might be better just recording the shared lock indexes in your
local backend memory and just use an int4 counter in the pg_clog-like
file that we can decrement on backend commit. However I am unclear that
we can guarantee an exiting backend will do that. Certainly it is
cleared on server start.

> > Note that to check whether a transaction is running we need to lock
> > SInvalLock. To minimize the time we hold it, we save the BackendId so
> > we don't have to scan the whole shmInvalBuffer->procState array, only
> > the item that we need to look at. Another possibility would be to use
> > stock TransactionIdIsInProgress and save the extra 4 bytes of storage.
>
> I'm a bit worried about deadlocks and race conditions associated with
> the conflict between locking a page of this data structure and locking
> SInvalLock.
>
> > At server restart, the btree is created empty (or just deleted). There
> > is one btree per database.
>
> One per cluster you meant, right? (Else we can't do locking of rows in
> shared tables.)

He meant one per database, I think. I suppose we would need another one
for global tables or disallow shared locking of them.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message ohp 2004-12-17 13:33:10 Call for port reports
Previous Message Tom Lane 2004-12-17 02:58:58 Re: Shared row locking