Re: [WIP] [B-Tree] Retail IndexTuple deletion

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>, Юрий Соколов <funny(dot)falcon(at)gmail(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] [B-Tree] Retail IndexTuple deletion
Date: 2018-07-19 11:29:51
Message-ID: CAD21AoApVGFf3q7WV5FuFzHDRtew9+fHHdCyOkk1uG+XG_6OKw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 13, 2018 at 4:00 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Tue, Jul 3, 2018 at 5:17 AM, Andrey V. Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> Done.
>> Attachment contains an update for use v.2 of the 'Ensure nbtree leaf tuple
>> keys are always unique' patch.
>
> My v3 is still pending, but is now a lot better than v2. There were
> bugs in v2 that were fixed.
>
> One area that might be worth investigating is retail index tuple
> deletion performed within the executor in the event of non-HOT
> updates. Maybe LP_REDIRECT could be repurposed to mean "ghost record",
> at least in unique index tuples with no NULL values. The idea is that
> MVCC index scans can skip over those if they've already found a
> visible tuple with the same value.

I think that's a good idea. The overhead of marking it as ghost seems
small and it would speed up index scans. If MVCC index scans have
already found a visible tuples with the same value they can not only
skip scanning but also kill them? If can, we can kill index tuples
without checking the heap.

> Also, when there was about to be a
> page split, they could be treated a little bit like LP_DEAD items. Of
> course, the ghost bit would have to be treated as a hint that could be
> "wrong" (e.g. because the transaction hasn't committed yet), so you'd
> have to go to the heap in the context of a page split, to double
> check. Also, you'd need heuristics that let you give up on this
> strategy when it didn't help.
>
> I think that this could work well enough for OLTP workloads, and might
> be more future-proof than doing it in VACUUM. Though, of course, it's
> still very complicated.

Agreed.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2018-07-19 11:35:40 Re: de-deduplicate code in DML execution hooks in postgres_fdw
Previous Message Pavel Stehule 2018-07-19 11:22:00 Re: Runtime partition pruning for MergeAppend