Re: We're leaking predicate locks in HEAD

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: We're leaking predicate locks in HEAD
Date: 2019-05-08 04:50:02
Message-ID: CA+hUKGJ=yLV+bCYZ6QNG4vS2kCk7WrzLhBiNu-RzR3WePDxqFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 8, 2019 at 3:53 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> > Reproduced here. Once the system reaches a state where it's leaking
> > (which happens only occasionally for me during installcheck-parallel),
> > it keeps leaking for future SSI transactions. The cause is
> > SxactGlobalXmin getting stuck. The attached fixes it for me. I can't
> > remember why on earth I made that change, but it is quite clearly
> > wrong: you have to check every transaction, or you might never advance
> > SxactGlobalXmin.
>
> Hm. So I don't have any opinion about whether this is a correct fix for
> the leak, but I am quite distressed that the system failed to notice that
> it was leaking predicate locks. Shouldn't there be the same sort of
> leak-detection infrastructure that we have for most types of resources?

Well, it is hooked up the usual release machinery, because it's in
ReleasePredicateLocks(), which is wired into the
RESOURCE_RELEASE_LOCKS phase of resowner.c. The thing is that lock
lifetime is linked to the last transaction with the oldest known xmin,
not the transaction that created them.

More analysis: Lock clean-up is deferred until "... the last
serializable transaction with the oldest xmin among serializable
transactions completes", but I broke that by excluding read-only
transactions from the check so that SxactGlobalXminCount gets out of
sync. There's a read-only SSI transaction in
src/test/regress/sql/transactions.sql, but I think the reason the
problem manifests only intermittently with installcheck-parallel is
because sometimes the read-only optimisation kicks in (effectively
dropping us to plain old SI because there's no concurrent serializable
activity) and it doesn't take any locks at all, and sometimes the
read-only transaction doesn't have the oldest known xmin among
serializable transactions. However, if a read-write SSI transaction
had already taken a snapshot and has the oldest xmin and then the
read-only one starts with the same xmin, we get into trouble. When
the read-only one releases, we fail to decrement SxactGlobalXminCount,
and then we'll never call ClearOldPredicateLocks().

--
Thomas Munro
https://enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ideriha, Takeshi 2019-05-08 05:29:30 RE: Copy data to DSA area
Previous Message Kyotaro HORIGUCHI 2019-05-08 04:09:23 Re: Statistical aggregate functions are not working with PARTIAL aggregation