Re: SSI patch version 14

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI patch version 14
Date: 2011-01-25 09:07:39
Message-ID: 1295946459.11513.338.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2011-01-24 at 21:30 -0600, Kevin Grittner wrote:
> Attached.
>
> Dan and I have spent many, many hours desk-check and testing,
> including pounding for many hours in DBT-2 at high concurrency
> factors on a 16 processor box. In doing so, we found and fixed a few
> issues. Neither of us is aware of any bugs or deficiencies
> remaining, even after a solid day of pounding in DBT-2, other than
> the failure to extend any new functionality to hot standby.
>
> At Heikki's suggestion I have included a test to throw an error on an
> attempt to switch to serializable mode during recovery. Anything
> more to address that issue can be a small follow-up patch -- probably
> mainly a few notes in the docs.

Here is my first crack at a review:

First of all, I am very impressed with the patch. I was pleased to see
that both READ ONLY DEFERRABLE and 2PC work! Also, I found the patch
very usable and did not encounter any bugs or big surprises.

Second, I do not understand this patch entirely, so the following
statements can be considered questions as much as answers.

At a high level, there is a nice conceptual simplicity. Let me try to
summarize it as I understand it:
* RW dependencies are detected using predicate locking.
* RW dependencies are tracked from the reading transaction (as an
"out") conflict; and from the writing transaction (as an "in"
conflict).
* Before committing a transaction, then by looking only at the RW
dependencies (and predicate locks) for current and past
transactions, you can determine if committing this transaction will
result in a cycle (and therefore a serialization anomaly); and if
so, abort it.

That's where the simplicity ends, however ;)

For starters, the above structures require unlimited memory, while we
have fixed amounts of memory. The predicate locks are referenced by a
global shared hash table as well as per-process SHMQueues. It can adapt
memory usage downward in three ways:
* increase lock granularity -- e.g. change N page locks into a
table lock
* "summarization" -- fold multiple locks on the same object across
many old committed transactions into a single lock
* use the SLRU

Tracking of RW conflicts of current and past transactions is more
complex. Obviously, it doesn't keep _all_ past transactions, but only
ones that overlap with a currently-running transaction. It does all of
this management using SHMQueue. There isn't much of an attempt to
gracefully handle OOM here as far as I can tell, it just throws an error
if there's not enough room to track a new transaction (which is
reasonable, considering that it should be quite rare and can be
mitigated by increasing max_connections).

The heavy use of SHMQueue is quite reasonable, but for some reason I
find the API very difficult to read. I think it would help (at least me)
quite a lot to annotate (i.e. with a comment in the struct) the various
lists and links with the types of data being held.

The actual checking of conflicts isn't quite so simple, either, because
we want to detect problems before the victim transaction has done too
much work. So, checking is done along the way in several places, and
it's a little difficult to follow exactly how those smaller checks add
up to the overall serialization-anomaly check (the third point in my
simple model).

There are also optimizations around transactions declared READ ONLY.
Some of these are a little difficult to follow as well, and I haven't
followed them all.

There is READ ONLY DEFERRABLE, which is a nice feature that waits for a
"safe" snapshot, so that the transaction will never be aborted.

Now, on to my code comments (non-exhaustive):

* I see that you use a union as well as using "outLinks" to also be a
free list. I suppose you did this to conserve shared memory space,
right?

* In RegisterSerializableTransactionInt(), for a RO xact, it considers
any concurrent RW xact a possible conflict. It seems like it would be
possible to know whether a RW transaction may have overlapped with any
committed RW transaction (in finishedLink queue), and only add those as
potential conflicts. Would that work? If so, that would make more
snapshots safe.

* When a transaction finishes, then PID should probably be set to zero.
You only use it for waking up a deferrable RO xact waiting for a
snapshot, right?

* Still some compiler warnings:
twophase.c: In function ‘FinishPreparedTransaction’:
twophase.c:1360: warning: implicit declaration of function
‘PredicateLockTwoPhaseFinish’

* You have a function called CreatePredTran. We are already using
"Transaction" and "Xact" -- we probably don't need "Tran" as well.

* HS error has a typo:
"ERROR: cannot set transaction isolationn level to serializable during
recovery"

* I'm a little unclear on summarization and writing to the SLRU. I don't
see where it's determining that the predicate locks can be safely
released. Couldn't the oldest transaction still have relevant predicate
locks?

* In RegisterSerializableTransactionInt, if it doesn't get an sxact, it
goes into summarization. But summarization assumes that it has at least
one finished xact. Is that always true? If you have enough memory to
hold a transaction for each connection, plus max_prepared_xacts, plus
one, I think that's true. But maybe that could be made more clear?

I'll keep working on this patch. I hope I can be of some help getting
this committed, because I'm looking forward to this feature. And I
certainly think that you and Dan have applied the amount of planning,
documentation, and careful implementation necessary for a feature like
this.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2011-01-25 09:32:11 Bug in parse_basebackup_options Re: [COMMITTERS] pgsql: Make walsender options order-independent
Previous Message rsmogura 2011-01-25 09:06:10 Re: [HACKERS] Fwd: Weird issues when reading UDT from stored function