Re: [WIP] [B-Tree] Retail IndexTuple deletion

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [WIP] [B-Tree] Retail IndexTuple deletion
Date: 2018-06-22 19:43:10
Message-ID: CAH2-WzmpjdeU77aJf1gtte3QhWrnwMffx3pPJQpEDHA1zbx3gw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 22, 2018 at 4:24 AM, Andrey V. Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> According to your feedback, i develop second version of the patch.
> In this version:
> 1. High-level functions index_beginscan(), index_rescan() not used. Tree
> descent made by _bt_search(). _bt_binsrch() used for positioning on the
> page.
> 2. TID list introduced in amtargetdelete() interface. Now only one tree
> descent needed for deletion all tid's from the list with equal scan key
> value - logical duplicates deletion problem.
> 3. Only one WAL record for index tuple deletion per leaf page per
> amtargetdelete() call.

Cool.

What is this "race" code about?

> + buffer = ReadBufferExtended(rel, MAIN_FORKNUM, ItemPointerGetBlockNumber(tid), RBM_NORMAL, NULL);
> + LockBuffer(buffer, BUFFER_LOCK_SHARE);
> +
> + page = (Page) BufferGetPage(buffer);
> + offnum = ItemPointerGetOffsetNumber(tid);
> + lp = PageGetItemId(page, offnum);
> +
> + /*
> + * VACUUM Races: someone already remove the tuple from HEAP. Ignore it.
> + */
> + if (!ItemIdIsUsed(lp))
> + return NULL;

It looks wrong -- why should another process have set the item as
unused? And if we assume that that's possible at all, what's to stop a
third process from actually reusing the item pointer before we arrive
(at get_tuple_by_tid()), leaving us to find a tuple that is totally
unrelated to the original tuple to be deleted?

(Also, you're not releasing the buffer lock before you return.)

> 4. VACUUM can sort TID list preliminary for more quick search of duplicates.

This version of the patch prevents my own "force unique keys" patch
from working, since you're not using my proposed new
_bt_search()/_bt_binsrch()/_bt_compare() interface (you're not passing
them a heap TID). It is essential that your patch be able to quickly
reach any tuple that it needs to kill. Otherwise, the worst case
performance is totally unacceptable; it will never be okay to go
through 10%+ of the index to kill a single tuple hundreds or even
thousands of times per VACUUM. It seems to me that doing this
tid_list_search() binary search is pointless -- you should instead be
relying on logical duplicates using their heap TID as a tie-breaker.
Rather than doing a binary search within tid_list_search(), you should
instead match the presorted heap TIDs at the leaf level against the
sorted in-memory TID list. You know, a bit like a merge join.

I suggest that you go even further than this: I think that you should
just start distributing my patch as part of your patch series. You can
change my code if you need to. I also suggest using "git format patch"
with simple, short commit messages to produce patches. This makes it a
lot easier to track the version of the patch, changes over time, etc.

I understand why you'd hesitate to take ownership of my code (it has
big problems!), but the reality is that all the problems that my patch
has are also problems for your patch. One patch cannot get committed
without the other, so they are already the same project. As a bonus,
my patch will probably improve the best case performance for your
patch, since multi-deletions will now have much better locality of
access.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira 2018-06-22 19:45:57 Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query
Previous Message Andres Freund 2018-06-22 19:38:07 Re: [PATCH] Include application_name in "connection authorized" log message