Re: [DOC] Document concurrent index builds waiting on each other

From: James Coleman <jtc331(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [DOC] Document concurrent index builds waiting on each other
Date: 2020-04-17 01:04:41
Message-ID: CAAaqYe_onbxJ6x-eM5tnzDTcsO2490govyi-Tv1zf4o5oA_aUQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 16, 2020 at 6:12 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2020-04-15 21:44:48 -0400, James Coleman wrote:
> > On Wed, Apr 15, 2020 at 6:31 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > > If it's about the xmin horizon for vacuum: I think we could probably
> > > avoid that using the same flag. As vacuum cannot be run against a table
> > > that has a CIC running (although it'd theoretically be possible to allow
> > > that), it should be safe to ignore PROC_IN_CIC backends in vacuum's
> > > GetOldestXmin() call. That might not be true for system relations, but
> > > we don't allow CIC on those.
> >
> > Yeah, I mean that if I have a CIC running on table X then vacuum can't
> > remove dead tuples (from after the CIC's snapshot) on table Y.
>
> For me "blocking" evokes waiting for a lock, which is why I thought
> you'd not mean that issue.

It was sloppy choice of language on my part; for better or worse at
work we've taken to talking about "blocking vacuum" when that's really
shorthand for "blocking [or you'd prefer preventing] vacuuming dead
tuples".

> > That's a pretty significant danger, given the combination of:
> > 1. Index builds on very large tables can take many days, and
>
> We at least don't hold a single snapshot over the multiple phases...

For sure. And text sorting improvements have made this better also,
still, as you often point out re: xid size, databases are only getting
larger (and more TPS).

> > 2. The well understood problems of high update tables with dead tuples
> > and poor plans.
>
> Which specific problem are you referring to? The planner probing the end
> of the index for values outside of the histogram? I'd hope
> 3ca930fc39ccf987c1c22fd04a1e7463b5dd0dfd improved the situation there a
> bit?

Yes, and other commits too, IIRC from the time we spent debugging
exactly the scenario mentioned in that commit.

But by "poor plans" I don't mean specifically "poor planning time" but
that we can still end up choosing the "wrong" plan, right? And dead
tuples can make an index scan be significantly worse than it would
otherwise be. Same for a seq scan: you can end up looking at millions
of dead tuples in a nominally 500 row table.

> > > [description why we could ignore CIC for vacuum horizon on other tables ]
>
> > I've previously discussed this with other hackers and the reasoning
> > they'd understood way that we couldn't always safely ignore
> > PROC_IN_CIC backends in the vacuum's oldest xmin call because of
> > function indexes, and the fact that (despite clear recommendations to
> > the contrary), there's nothing actually preventing someone from adding
> > a function index on table X that queries table Y.
>
> Well, even if we consider this an actual problem, we could still use
> PROC_IN_CIC for non-expression non-partial indexes (index operator
> themselves better ensure this isn't a problem, or they're ridiculously
> broken already - they can get called during vacuum).

Agreed. It'd be unfortunate to have to limit it though.

> Even when expressions are involved, I don't think that necessarily would
> have to mean that we need to use the same snapshot to run expressions in
> for the hole scan. So we could occasionally take a new snapshot for the
> purpose of computing expressions.
>
> The hard part presumably would be that we'd need to advertise one xmin
> for the expression snapshot to protect tuples potentially accessed from
> being removed, but at the same time we also need to advertise the xmin
> of the snapshot used by CIC, to avoid HOT pruning in other session from
> removing tuple versions from the table the index is being created
> on.
>
> There's not really infrastructure for doing so. I think we'd basically
> have to start publicizing multiple xmin values (as long as PGXACT->xmin
> is <= new xmin for expressions, only GetOldestXmin() would need to care,
> and it's not that performance critical). Not pretty.

In other words, pretty invasive.

> > I'm not sure I buy that we should care about people doing something
> > clearly so dangerous, but...I grant that it'd be nice not to cause new
> > crashes.
>
> I don't think it's just dangerous expressions that would be
> affected. Normal expression indexes need to be able to do syscache
> lookups etc, and they can't safely do so if tuple versions can be
> removed in the middle of a scan. We could avoid that by not ignoring
> PROC_IN_CIC backend in GetOldestXmin() calls for catalog tables (yuck).

At first glance this sounds a lot less invasive, but I also agree it's gross.

James

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andy Fan 2020-04-17 01:16:59 Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Previous Message James Coleman 2020-04-17 01:02:39 Re: sqlsmith crash incremental sort