Re: Serializable snapshot isolation patch

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Serializable snapshot isolation patch
Date: 2010-10-18 20:18:41
Message-ID: 1287433121.15261.64.camel@jdavis-ux.asterdata.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2010-10-18 at 13:26 -0500, Kevin Grittner wrote:
> > 2. I think there's a GiST bug (illustrating with PERIOD type):
>
> My assumptions for GiST were that:
>
> (1) A search for a matching value could bail out at any level in the
> tree; there is no requirement for the search to proceed to the leaf
> level to get a negative index probe.
>
> (2) An index insertion which would cause a row to be found if an
> earlier search was repeated must modify some page which was read by
> the earlier search.
>
> (3) Because of the above, it is necessary and sufficient to acquire
> an SIRead lock all pages visited in a search of a GiST index, and
> flag a conflict on insertion into a locked page at any level of the
> index.
>
> That logic still seems sound to me, so if someone else sees a flaw
> in it, please point it out.

Looks sound to me, as well.

> > Also, it appears to be non-deterministic, to a degree at least, so
> > you may not observe the problem in the exact way that I do.
>
> Yeah, I have tested this area without seeing the failure , so that
> self-contained example would be a big help.

I assume here that you mean that you _did_ see the failure
(serialization error) and therefore did not see the problem? Also, are
you sure it was using the GiST index for the searches and didn't just
get a full table lock or full index lock?

I'll try to narrow it down. It could be a problem with PERIOD, or maybe
it wasn't using the GiST index for a search when I thought it was (I
didn't run EXPLAIN at every point, so I'll double-check). Clearly, a
problem exists somewhere though.

> > 3. Limited shared memory space to hold information about committed
> > transactions that are still "interesting". Relevant thread:
> >
> > http://archives.postgresql.org/pgsql-hackers/2010-09/msg01735.php
> >
> > It's a challenging problem, however, and the current solution is
> > less than ideal.
>
> I'd go further than that and say that it clearly needs to be fixed.

OK, this will remain an open issue then.

> Canceling an old transaction to allow new transactions to begin
> *might* be better than the current situation, but I think we can and
> should do better. The best I have been able to come up with is in
> this post:
>
> http://archives.postgresql.org/pgsql-hackers/2010-09/msg01724.php

At first I didn't like that approach much, but after re-reading it, I am
more optimistic. What I like most about it is that a transaction won't
be canceled if it doesn't interfere at all (e.g. operating on different
tables). There are still edge cases where it can be canceled, like when
the memory is so exhausted that it can't even track a couple table-level
locks, but that doesn't sound as bad (closer to the current restriction
on the total number of locks).

Ultimately I think this will be one of those problems with no 100% clean
solution. However, I think it's important that we try to minimize these
degenerate cases. Eventually, we may need to keep statistics about the
number of conflicts happening, and start to rate-limit new serializable
transactions (effectively reducing parallelism) to ensure that
reasonable progress can be made (hopefully faster than serial
execution).

> There's a fair amount of work there, so I was hoping for some
> confirmation that this combination of steps was both sane and had
> some chance of being considered sufficient and acceptable before
> diving in to code it. I also *really* hope to add the "SERIALIZABLE
> READ ONLY DEFERRABLE" mode so that pg_dump and other read-only
> transactions don't push things into a state where the rollback rate
> spikes:
>
> http://archives.postgresql.org/pgsql-hackers/2010-09/msg01771.php

One potential issue there is starvation. I don't see how you would
guarantee that there will ever be a point to grab a safe snapshot, even
if all of the other transactions are completing.

> > 4. A few final details:
> >
> > a. We should probably have a compatibility GUC that makes
> > SERIALIZABLE equal to REPEATABLE READ. My opinion is that this
> > should be only for compatibility, and should default to "off"
> > (i.e. SSI code enabled) either in 9.1 or soon after.
>
> I'm inclined to agree with you, with the strong preference for a 9.1
> setting of off. This was previously discussed, and there were
> people who felt that we should avoid a "behavior-changing" GUC like
> that, so I didn't add it. It wouldn't be hard to put it in, if
> that's the community consensus.

I think that was me, but I'm OK with behavior-changing GUCs as long as
they are there for compatibility and we intend to push people toward the
new behavior in the long run (like standard_conforming_strings, but
hopefully not as long-lived).

Maybe it's not necessary at all, because your patch offers strictly
better guarantees (at the cost of more serialization failures), and if
they want the old behavior then REPEATABLE READ is already there.

> > 1. For TargetTagIsCoveredBy(), why is it impossible for the
> > covering tag to have an offset?
>
> Because a "covering" lock is a lock at a coarser granularity than
> the "covered" lock, which includes what the finer-grained lock
> covers. Since the tuple level is as fine as we take it, and we only
> use the offset for tuple locks, a covering lock would not have that
> set. (We also check for duplicate locks.) I'll expand that comment
> a bit, since it obviously wasn't clear enough.

I see. So a lock can't ever cover itself?

> > 2. The explanation for SerializablePredicateLockListLock is a
> > little confusing. It makes it sound like other processes can't
> > walk the list, but they can modify it?
>
> That's a bit unconventional, I admit, but it was a big part of
> bringing the benchmarks for a fully cached, read-only transaction
> load down to running just 1.8% longer than REPEATABLE READ (which is
> the same as current SERIALIZABLE). I'm open to other locking
> strategies, or an explanation of where this one doesn't work, but
> I'd want to benchmark carefully. Since you took it to mean the
> right thing, but found that surprising enough to doubt that it was
> accurate, do you have any particular suggestions about how to
> clarify it? (Maybe just, "Yeah, that's unconventional, but it works
> and is faster than the alternatives we've tried."?)

When using locks in an unconventional way, it would be helpful to
describe the invalid schedules that you're preventing. Perhaps an
example if you think it would be reasonably simple? Also some indication
of how another process is intended to modify the list without walking
it.

> It certainly is for our shop, and I've heard from others who are
> very interested in it. I think, that it's generally of more
> benefit to "big shops" than situations where you have just a few
> programmers coding against a schema with just a few dozen tables;
> although it's also somewhat a matter of style: I would rather deal
> with *one* issue (serialization failures) in one place than scatter
> defensive code all over my application codebase, even on a moderate
> scale.

It's a complexity-reducing and correctness-increasing feature, which is
generally good.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-10-18 20:27:59 Re: Creation of temporary tables on read-only standby servers
Previous Message Tom Lane 2010-10-18 20:02:14 Re: knngist - 0.8