Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date: 2019-08-13 15:45:15
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

06.08.2019 4:28, Peter Geoghegan wrote:
> Attached is v5, which is based on your v4. The three main differences
> between this and v4 are:
> * Removed BT_COMPRESS_THRESHOLD stuff, for the reasons explained in my
> July 24 e-mail. We can always add something like this back during
> performance validation of the patch. Right now, having no
> BT_COMPRESS_THRESHOLD limit definitely improves space utilization for
> certain important cases, which seems more important than the
> uncertain/speculative downside.
Fair enough.
I think we can measure performance and make a decision, when patch will

> * We now have experimental support for unique indexes. This is broken
> out into its own patch.
> * We now handle LP_DEAD items in a special way within
> _bt_insertonpg_in_posting().
> As you pointed out already, we do need to think about LP_DEAD items
> directly, rather than assuming that they cannot be on the page that
> _bt_insertonpg_in_posting() must process. More on that later.
>> If sizeof(t_info) + sizeof(key) < sizeof(t_tid), resulting posting tuple
>> can be
>> larger. It may happen if keysize <= 4 byte.
>> In this situation original tuples must have been aligned to size 16
>> bytes each,
>> and resulting tuple is at most 24 bytes (6+2+4+6+6). So this case is
>> also safe.
> I still need to think about the exact details of alignment within
> _bt_insertonpg_in_posting(). I'm worried about boundary cases there. I
> could be wrong.
Could you explain more about these cases?
Now I don't understand the problem.

>> The main reason why I decided to avoid applying compression to unique
>> indexes
>> is the performance of microvacuum. It is not applied to items inside a
>> posting
>> tuple. And I expect it to be important for unique indexes, which ideally
>> contain only a few live values.
> I found that the performance of my experimental patch with unique
> index was significantly worse. It looks like this is a bad idea, as
> you predicted, though we may still want to do
> deduplication/compression with NULL values in unique indexes. I did
> learn a few things from implementing unique index support, though.
> BTW, there is a subtle bug in how my unique index patch does
> WAL-logging -- see my comments within
> index_compute_xid_horizon_for_tuples(). The bug shouldn't matter if
> replication isn't used. I don't think that we're going to use this
> experimental patch at all, so I didn't bother fixing the bug.
Thank you for the patch.
Still, I'd suggest to leave it as a possible future improvement, so that
it doesn't
distract us from the original feature.
>> if (ItemIdIsDead(itemId))
>> continue;
>> In the previous review Rafia asked about "some reason".
>> Trying to figure out if this situation possible, I changed this line to
>> Assert(!ItemIdIsDead(itemId)) in our test version. And it failed in a
>> performance
>> test. Unfortunately, I was not able to reproduce it.
> I found it easy enough to see LP_DEAD items within
> _bt_insertonpg_in_posting() when running pgbench with the extra unique
> index patch. To give you a simple example of how this can happen,
> consider the comments about BTP_HAS_GARBAGE within
> _bt_delitems_vacuum(). That probably isn't the only way it can happen,
> either. ISTM that we need to be prepared for LP_DEAD items during
> deduplication, rather than trying to prevent deduplication from ever
> having to see an LP_DEAD item.

I added to v6 another related fix for _bt_compress_one_page().
Previous code was implicitly deleted DEAD items without
calling index_compute_xid_horizon_for_tuples().
New code has a check whether DEAD items on the page exist and remove
them if any.
Another possible solution is to copy dead items as is from old page to
the new one,
but I think it's good to remove dead tuples as fast as possible.

> v5 makes _bt_insertonpg_in_posting() prepared to overwrite an
> existing item if it's an LP_DEAD item that falls in the same TID range
> (that's _bt_compare()-wise "equal" to an existing tuple, which may or
> may not be a posting list tuple already). I haven't made this code do
> something like call index_compute_xid_horizon_for_tuples(), even
> though that's needed for correctness (i.e. this new code is currently
> broken in the same way that I mentioned unique index support is
> broken).
Is it possible that DEAD tuple to delete was smaller than itup?
> I also added a nearby FIXME comment to
> _bt_insertonpg_in_posting() -- I don't think think that the code for
> splitting a posting list in two is currently crash-safe.
Good catch. It seems, that I need to rearrange the code.
I'll send updated patch this week.

> How do you feel about officially calling this deduplication, not
> compression? I think that it's a more accurate name for the technique.
I agree.
Should I rename all related names of functions and variables in the patch?

Anastasia Lubennikova
Postgres Professional:
The Russian Postgres Company

Attachment Content-Type Size
v6-0001-Compression-deduplication-in-nbtree.patch text/x-patch 83.9 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2019-08-13 15:54:31 pgbench - rework variable management
Previous Message Alvaro Herrera 2019-08-13 15:25:17 Re: Problem with default partition pruning