Re: [WIP] [B-Tree] Retail IndexTuple deletion

From: "Andrey V(dot) Lepikhov" <a(dot)lepikhov(at)postgrespro(dot)ru>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: [WIP] [B-Tree] Retail IndexTuple deletion
Date: 2018-06-27 06:40:50
Message-ID: e4cec53a-ab12-4858-d9d8-5fac9948a4c7@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23.06.2018 00:43, Peter Geoghegan wrote:
> On Fri, Jun 22, 2018 at 4:24 AM, Andrey V. Lepikhov
> <a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
>> According to your feedback, i develop second version of the patch.
>> In this version:
>> 1. High-level functions index_beginscan(), index_rescan() not used. Tree
>> descent made by _bt_search(). _bt_binsrch() used for positioning on the
>> page.
>> 2. TID list introduced in amtargetdelete() interface. Now only one tree
>> descent needed for deletion all tid's from the list with equal scan key
>> value - logical duplicates deletion problem.
>> 3. Only one WAL record for index tuple deletion per leaf page per
>> amtargetdelete() call.
>
> Cool.
>
> What is this "race" code about?
>
>> + buffer = ReadBufferExtended(rel, MAIN_FORKNUM, ItemPointerGetBlockNumber(tid), RBM_NORMAL, NULL);
>> + LockBuffer(buffer, BUFFER_LOCK_SHARE);
>> +
>> + page = (Page) BufferGetPage(buffer);
>> + offnum = ItemPointerGetOffsetNumber(tid);
>> + lp = PageGetItemId(page, offnum);
>> +
>> + /*
>> + * VACUUM Races: someone already remove the tuple from HEAP. Ignore it.
>> + */
>> + if (!ItemIdIsUsed(lp))
>> + return NULL;
>
> It looks wrong -- why should another process have set the item as
> unused? And if we assume that that's possible at all, what's to stop a
> third process from actually reusing the item pointer before we arrive
> (at get_tuple_by_tid()), leaving us to find a tuple that is totally
> unrelated to the original tuple to be deleted?
>
> (Also, you're not releasing the buffer lock before you return.)
>
>> 4. VACUUM can sort TID list preliminary for more quick search of duplicates.
>
> This version of the patch prevents my own "force unique keys" patch
> from working, since you're not using my proposed new
> _bt_search()/_bt_binsrch()/_bt_compare() interface (you're not passing
> them a heap TID). It is essential that your patch be able to quickly
> reach any tuple that it needs to kill. Otherwise, the worst case
> performance is totally unacceptable; it will never be okay to go
> through 10%+ of the index to kill a single tuple hundreds or even
> thousands of times per VACUUM. It seems to me that doing this
> tid_list_search() binary search is pointless -- you should instead be
> relying on logical duplicates using their heap TID as a tie-breaker.
> Rather than doing a binary search within tid_list_search(), you should
> instead match the presorted heap TIDs at the leaf level against the
> sorted in-memory TID list. You know, a bit like a merge join.

I agree with you. Binary search was developed in awaiting your patch.
>
> I suggest that you go even further than this: I think that you should
> just start distributing my patch as part of your patch series. You can
> change my code if you need to.

Good. I am ready to start distributing your patch. At the beginning of
the work I planned to make patch for physical TID ordering in the btree
index. Your patch will make it much easier.

I also suggest using "git format patch"
> with simple, short commit messages to produce patches. This makes it a
> lot easier to track the version of the patch, changes over time, etc.
>
Ok

> I understand why you'd hesitate to take ownership of my code (it has
> big problems!), but the reality is that all the problems that my patch
> has are also problems for your patch. One patch cannot get committed
> without the other, so they are already the same project. As a bonus,
> my patch will probably improve the best case performance for your
> patch, since multi-deletions will now have much better locality of
> access.
>
I still believe that the patch for physical TID ordering in btree:
1) has its own value, not only for target deletion,
2) will require only a few local changes in my code,
and this patches can be developed independently.

I prepare third version of the patches. Summary:
1. Mask DEAD tuples at a page during consistency checking (See comments
for the mask_dead_tuples() function).
2. Still not using physical TID ordering.
3. Index cleanup() after each quick_vacuum_index() call was excluded.

--
Andrey Lepikhov
Postgres Professional:
https://postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-Retail-IndexTuple-Deletion-patch-v.3.patch text/x-patch 12.6 KB
0002-Quick-Vacuum-Strategy-patch-v.3.patch text/x-patch 11.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2018-06-27 06:44:15 Re: Constraint documentation
Previous Message Pavel Stehule 2018-06-27 06:33:34 Re: postgresql_fdw doesn't handle defaults correctly