Re: [WIP PATCH] Lazily assign xids for toplevel Transactions

From: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Postgresql-Hackers <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: [WIP PATCH] Lazily assign xids for toplevel Transactions
Date: 2007-08-27 14:11:44
Message-ID: 46D2DBA0.1030808@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> I wrote:
>> "Florian G. Pflug" <fgp(at)phlo(dot)org> writes:
>>> Yeah - I do not really like that dual-locking thing either. But it makes
>>> prepared transaction handling much easier - if we were to only lock the
>>> RID, we'd have to store the rid<->xid mapping for prepared transactions
>
>> Hmmm .... that's a good point. Not sure how to include prepared xacts
>> in the scheme.
>
> After further thought I think we should avoid depending on PIDs as part
> of a lock tag --- their behavior is too system-dependent, in particular
> we have no right to assume that when a backend exits the same PID won't
> be reassigned shortly after, possibly leading to confusion.
>
> Instead, I suggest that we keep a session counter in shared memory and
> have each backend assign itself a session ID at startup using that.
> A 32-bit session ID in combination with a 32-bit locally assigned
> transaction number should be sufficiently unique to identify a
> transaction (prepared or otherwise) for the purposes of locking.
> These "transient XIDs" only need to be unique as long as the transaction
> exists plus shortly thereafter (to avoid race conditions when someone
> waits for a transaction that actually terminated a moment before).
> So wraparound of the counters isn't a problem, although we probably
> want to reserve zero as an invalid value.
>
> I think we only need a transient XID of this sort for top-level
> transactions, not subtransactions.

Sounds good, if we decide to go with the transient XID idea. So below
for an alternative that I just came up with.

> To make CREATE INDEX CONCURRENTLY work, we'd need two things:
>
> * GetLockConflicts would need to report the transient XIDs of the
> conflicting xacts, not regular XIDs, since they might not have regular
> XIDs. Then we'd wait on those locks instead of regular-XID locks.

Yes. This is exactly what my patch does today.

> * The second phase where we wait out transactions that can still see
> old tuples doesn't work because such transactions won't necessarily
> be listed in the snapshot. Instead, what we have to do is look
> through the ProcArray for transactions whose advertised xmin is
> less than the xmax of our reference snapshot. When we find one,
> wait for it using its transient XID.

> AFAICT, C.I.C. is currently the only place in the system where we
> really need transient XIDs at all. Everyplace else that we need to
> wait for a transaction, it's because we found its regular XID in
> a tuple we want to lock or modify. So the whole thing is a bit
> annoying. Maybe we could get rid of the extra overhead with some
> shenanigans inside the lock manager, like not bothering to create
> a data structure representing the holding of a transient-XID lock
> until such time as C.I.C. actually tries to wait for it. But
> again, that seems like a second-pass optimization.

I've given some thought to that. There are two distinct things we
need to be able to wait for

1) Until all current holders of a lock, grantmask conflicts with
a given locklevel have dropped their lock

2) Until all currently in-use snapshots have an xmin larger than
some given value (The xmax of the reference snapshot).

(1) Could be solved directly in the lock manager. We'd need some
mechanism to wake up a process whenever someone releases a
ceratin lock.

(2) Could be done by acquireing a ShareLock (with a new locktype
LOCKTYPE_XMIN) on the xmin of a transaction's serializable
snapshot when it's created.
The second waiting phase of concurrent index builds would then be
a) Find the oldest xmin in the ProcArray.
b) If that xmin is equal or greater than the xmax of our
reference snapshot, we're done.
c) Wait until the ExclusiveLock (for LOCKTYPE_TRANSACTION)
is released on that xmin. After that point, new transactions
will compute an xmin greater than the oldest one we found
in the ProcArray, because the limiting transactions has
exited, and because ReadNewTransactionId returns a value
greater than that xmin too (Otherwise, we'd have exited in (b)).
d) Wait for all current holders of LOCKTYPE_XMIN to release
their locks. (Using the machinery needed for (1)). No
new holders can show up, because new snapshots will computer
a larger xmin.
e) Goto a).

I could code (2), but I'd need help with (1) - The details of the locking
subsystems are still somewhat a mystery to me.

greetings, Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Albe Laurenz 2007-08-27 14:46:53 Re: LDAP service lookup
Previous Message Dawid Kuroczko 2007-08-27 13:24:26 Re: LDAP service lookup