Re: Shared row locking

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Shared row locking
Date: 2004-12-17 02:52:33
Message-ID: 13707.1103251953@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> writes:
> Using a B-tree

> At transaction end, nothing special happens (tuples are not unlocked
> explicitly).

I don't think that works, because there is no guarantee that an entry
will get cleaned out before the XID counter wraps around. Worst case,
you might think that a tuple is locked when the XID is left over from
the previous cycle. (Possibly this could be avoided by cleaning out old
XIDs in this table whenever we truncate pg_clog, but that seems a tad
messy.) I'm also a bit concerned about how we avoid table bloat if
there's no proactive cleanup at transaction end.

I think I like the pg_clog-modeled structure a bit better. However it
could be objected that that puts a hard limit of 4G share-locked tuples
at any one time.

In the clog-modeled idea, it wasn't real clear how you decide whether to
assign a new counter value to a previously locked row, or reuse its
previous counter. You must *not* assign a new value when the existing
entry still has bits set, but you probably do want to be aggressive
about assigning new values when you can; else it gets tough to be sure
that the log can be truncated in a reasonable time.

ISTM that your description is conflating several orthogonal issues:
how do we identify entries in this data structure (by CTID, or a shared
counter that increments each time a new lock is acquired); how do we
index the data structure (btree or linear array); and what is stored in
each entry (array of XIDs, or bitmap indexed by BackendId). Not all of
the eight combinations work, but we do have more alternatives than the
two offered, even without coming up with any new ideas ;-)

> Note that to check whether a transaction is running we need to lock
> SInvalLock. To minimize the time we hold it, we save the BackendId so
> we don't have to scan the whole shmInvalBuffer->procState array, only
> the item that we need to look at. Another possibility would be to use
> stock TransactionIdIsInProgress and save the extra 4 bytes of storage.

I'm a bit worried about deadlocks and race conditions associated with
the conflict between locking a page of this data structure and locking
SInvalLock.

> At server restart, the btree is created empty (or just deleted). There
> is one btree per database.

One per cluster you meant, right? (Else we can't do locking of rows in
shared tables.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2004-12-17 02:58:58 Re: Shared row locking
Previous Message Tom Lane 2004-12-17 02:17:43 Re: [INTERFACES] PL/Python: How do I use result methods?