Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, amul sul <sulamul(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key
Date: 2018-03-28 18:12:11
Message-ID: 20180328181211.yilvi44y5rv6i3ud@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2018-03-28 13:52:24 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > Given, as explained nearby, we already do store transient data in the
> > ctid for speculative insertions (i.e. ON CONFLICT), and it hasn't caused
> > even a whiff of trouble, I'm currently not inclined to see a huge issue
> > here. It'd be great if you could expand on your concerns here a bit, we
> > gotta figure out a way forward.
>
> Just what I said. There's a lot of code that knows how to follow tuple
> update chains, probably not all of it in core, and this will break it.
> But only in seldom-exercised corner cases, which is the worst of all
> possible worlds from a reliability standpoint.

How will it break it? They'll see an invalid ctid and conclude that the
tuple is dead? Without any changes that's already something that can
happen if a later tuple in the chain has been pruned away. Sure, that
code won't realize it should error out because the tuple is now in a
different partition, but neither would a infomask bit.

I think my big problem is that I just don't see what the worst that can
happen is. We'd potentially see a broken ctid chain, something that very
commonly happens, and consider the tuple to be invisible. That seems
pretty sane behaviour for unadapted code, and not any worse than other
potential solutions.

> (I don't think ON CONFLICT is a counterexample because, IIUC, it's not
> a persistent state.)

Hm, it can be persistent if we error out in the right moment. But it's
nots super common to encounter that over a long time, I grant you
that. Not that this'd be super persistent either, vacuum/pruning would
normally remove the tuple as well, it's dead after all.

> >> I would've been happier about expending an infomask bit towards this
> >> purpose. Just eyeing what we've got, I can't help noticing that
> >> HEAP_MOVED_OFF/HEAP_MOVED_IN couldn't possibly be set in any tuple
> >> in a partitioned table. Perhaps making these tests depend on
> >> partitioned-ness would be unworkably messy, but it's worth thinking
> >> about.
>
> > They previously couldn't be set together IIRC, so we could just (mask &
> > (HEAP_MOVED_OFF |HEAP_MOVED_IN)) == (HEAP_MOVED_OFF |HEAP_MOVED_IN) but
> > that'd be permanently eating two infomask bits.
>
> Hmm. That objection only matters if we have realistic intentions of
> reclaiming those bits in future, which I've not heard anyone making
> serious effort towards.

I plan to submit a patch early v12 that keeps track of the last time a
table has been fully scanned (and when it was created). With part of the
goal being debuggability and part being able to reclaim things like
these bits.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2018-03-28 18:18:42 Re: Rangejoin rebased
Previous Message Tom Lane 2018-03-28 17:52:24 Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key