Quick Links

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

From:	Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date:	2019-08-13 15:45:15
Message-ID:	54b629d2-0d54-1f7e-8d65-ae90ae39fcee@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

06.08.2019 4:28, Peter Geoghegan wrote:
> Attached is v5, which is based on your v4. The three main differences
> between this and v4 are:
>
> * Removed BT_COMPRESS_THRESHOLD stuff, for the reasons explained in my
> July 24 e-mail. We can always add something like this back during
> performance validation of the patch. Right now, having no
> BT_COMPRESS_THRESHOLD limit definitely improves space utilization for
> certain important cases, which seems more important than the
> uncertain/speculative downside.
Fair enough.
I think we can measure performance and make a decision, when patch will
stabilize.

> * We now have experimental support for unique indexes. This is broken
> out into its own patch.
>
> * We now handle LP_DEAD items in a special way within
> _bt_insertonpg_in_posting().
>
> As you pointed out already, we do need to think about LP_DEAD items
> directly, rather than assuming that they cannot be on the page that
> _bt_insertonpg_in_posting() must process. More on that later.
>
>> If sizeof(t_info) + sizeof(key) < sizeof(t_tid), resulting posting tuple
>> can be
>> larger. It may happen if keysize <= 4 byte.
>> In this situation original tuples must have been aligned to size 16
>> bytes each,
>> and resulting tuple is at most 24 bytes (6+2+4+6+6). So this case is
>> also safe.
> I still need to think about the exact details of alignment within
> _bt_insertonpg_in_posting(). I'm worried about boundary cases there. I
> could be wrong.
Could you explain more about these cases?
Now I don't understand the problem.

>> The main reason why I decided to avoid applying compression to unique
>> indexes
>> is the performance of microvacuum. It is not applied to items inside a
>> posting
>> tuple. And I expect it to be important for unique indexes, which ideally
>> contain only a few live values.
> I found that the performance of my experimental patch with unique
> index was significantly worse. It looks like this is a bad idea, as
> you predicted, though we may still want to do
> deduplication/compression with NULL values in unique indexes. I did
> learn a few things from implementing unique index support, though.
>
> BTW, there is a subtle bug in how my unique index patch does
> WAL-logging -- see my comments within
> index_compute_xid_horizon_for_tuples(). The bug shouldn't matter if
> replication isn't used. I don't think that we're going to use this
> experimental patch at all, so I didn't bother fixing the bug.
Thank you for the patch.
Still, I'd suggest to leave it as a possible future improvement, so that
it doesn't
distract us from the original feature.
>> if (ItemIdIsDead(itemId))
>> continue;
>>
>> In the previous review Rafia asked about "some reason".
>> Trying to figure out if this situation possible, I changed this line to
>> Assert(!ItemIdIsDead(itemId)) in our test version. And it failed in a
>> performance
>> test. Unfortunately, I was not able to reproduce it.
> I found it easy enough to see LP_DEAD items within
> _bt_insertonpg_in_posting() when running pgbench with the extra unique
> index patch. To give you a simple example of how this can happen,
> consider the comments about BTP_HAS_GARBAGE within
> _bt_delitems_vacuum(). That probably isn't the only way it can happen,
> either. ISTM that we need to be prepared for LP_DEAD items during
> deduplication, rather than trying to prevent deduplication from ever
> having to see an LP_DEAD item.

I added to v6 another related fix for _bt_compress_one_page().
Previous code was implicitly deleted DEAD items without
calling index_compute_xid_horizon_for_tuples().
New code has a check whether DEAD items on the page exist and remove
them if any.
Another possible solution is to copy dead items as is from old page to
the new one,
but I think it's good to remove dead tuples as fast as possible.

> v5 makes _bt_insertonpg_in_posting() prepared to overwrite an
> existing item if it's an LP_DEAD item that falls in the same TID range
> (that's _bt_compare()-wise "equal" to an existing tuple, which may or
> may not be a posting list tuple already). I haven't made this code do
> something like call index_compute_xid_horizon_for_tuples(), even
> though that's needed for correctness (i.e. this new code is currently
> broken in the same way that I mentioned unique index support is
> broken).
Is it possible that DEAD tuple to delete was smaller than itup?
> I also added a nearby FIXME comment to
> _bt_insertonpg_in_posting() -- I don't think think that the code for
> splitting a posting list in two is currently crash-safe.
>
Good catch. It seems, that I need to rearrange the code.
I'll send updated patch this week.

> How do you feel about officially calling this deduplication, not
> compression? I think that it's a more accurate name for the technique.
I agree.
Should I rename all related names of functions and variables in the patch?

--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment	Content-Type	Size
v6-0001-Compression-deduplication-in-nbtree.patch	text/x-patch	83.9 KB

In response to

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. at 2019-08-06 01:28:18 from Peter Geoghegan

Responses

Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. at 2019-08-16 15:56:12 from Anastasia Lubennikova
Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index. at 2019-08-20 00:52:05 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fabien COELHO	2019-08-13 15:54:31	pgbench - rework variable management
Previous Message	Alvaro Herrera	2019-08-13 15:25:17	Re: Problem with default partition pruning