Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

From: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date: 2019-07-17 16:36:27
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

11.07.2019 21:19, Peter Geoghegan wrote:
> On Thu, Jul 11, 2019 at 8:34 AM Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com> wrote:
Peter, Rafia, thanks for the review. New version is attached.

>> + elog(DEBUG4, "insert_itupprev_to_page. compressState->ntuples %d
>> IndexTupleSize %zu free %zu",
>> + compressState->ntuples, IndexTupleSize(to_insert), PageGetFreeSpace(page));
>> +
>> and other such DEBUG4 statements are meant to be removed, right...?
> I hope so too.
Yes, these messages are only for debugging.
I haven't delete them since this is still work in progress
and it's handy to be able to print inner details.
Maybe I should also write a patch for pageinspect.

>> /*
>> * If we have only 10 uncompressed items on the full page, it probably
>> * won't worth to compress them.
>> */
>> if (maxoff - n_posting_on_page < 10)
>> return;
>> Is this a magic number...?
> I think that this should be a constant or something.
Fixed. Now this is a constant in nbtree.h. I'm not 100% sure about the
When the code will stabilize we can benchmark it and find optimal value.

>> /*
>> * We do not expect to meet any DEAD items, since this function is
>> * called right after _bt_vacuum_one_page(). If for some reason we
>> * found dead item, don't compress it, to allow upcoming microvacuum
>> * or vacuum clean it up.
>> */
>> if (ItemIdIsDead(itemId))
>> continue;
>> This makes me wonder about those 'some' reasons.
> I think that this is just defensive. Note that _bt_vacuum_one_page()
> is prepared to find no dead items, even when the BTP_HAS_GARBAGE flag
> is set for the page.
You are right, now it is impossible to meet dead items in this function.
Though it can change in the future if, for example, _bt_vacuum_one_page
will behave lazily.
So this is just a sanity check. Maybe it's worth to move it to Assert.

>> Caller is responsible for checking BTreeTupleIsPosting to ensure that
>> + * he will get what he expects
>> This can be re-framed to make the caller more gender neutral.
> Agreed. I also don't like anthropomorphizing code like this.


>> Other than that, I am curious about the plans for its backward compatibility.
> Me too. There is something about a new version 5 in comments in
> nbtree.h, but the version number isn't changed. I think that we may be
> able to get away with not increasing the B-Tree version from 4 to 5,
> actually. Deduplication is performed lazily when it looks like we
> might have to split the page, so there isn't any expectation that
> tuples will either be compressed or uncompressed in any context.
Current implementation is backward compatible.
To distinguish posting tuples, it only adds one new flag combination.
This combination was never possible before. Comment about version 5 is

I also added a patch for amcheck.

There is one major issue left - preserving TID order in posting lists.
For a start, I added a sort into BTreeFormPostingTuple function.
It turned out to be not very helpful, because we cannot check this
invariant lazily.

Now I work on patching _bt_binsrch_insert() and _bt_insertonpg() to
insertion into the middle of the posting list. I'll send a new version
this week.

Anastasia Lubennikova
Postgres Professional:
The Russian Postgres Company

Attachment Content-Type Size
0001-btree_compression_pg12_v2.patch text/x-patch 57.2 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Shawn Debnath 2019-07-17 16:50:11 Re: Adding SMGR discriminator to buffer tags
Previous Message Tom Lane 2019-07-17 16:32:39 sepgsql seems rather thoroughly broken on Fedora 30