Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.
Date: 2019-08-30 00:07:53
Message-ID: CAH2-Wzmjs=FemKapmzridvMKk=SpvRpwuT5Y0-XHrna6aa5Eyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Aug 29, 2019 at 5:13 AM Anastasia Lubennikova
<a(dot)lubennikova(at)postgrespro(dot)ru> wrote:
> Your explanation helped me to understand that this approach can be
> extended to
> the case of insertion into posting list, that doesn't trigger posting
> split,
> and that nbtsplitloc indeed doesn't need to know about posting tuples
> specific.
> The code is much cleaner now.

Fantastic!

> Some individual indexes are larger, some are smaller compared to the
> expected output.

I agree that v9 might be ever so slightly more space efficient than v5
was, on balance. In any case v9 completely fixes the regression that I
saw in the last version. I have pushed the changes to the test output
for the serial tests that I privately maintain, that I gave you access
to. The MGD test output also looks perfect.

We may find that deduplication is a little too effective, in the sense
that it packs so many tuples on to leaf pages that *concurrent*
inserters will tend to get excessive page splits. We may find that it
makes sense to aim for posting lists that are maybe 96% of
BTMaxItemSize() -- note that BTREE_SINGLEVAL_FILLFACTOR is 96 for this
reason. Concurrent inserters will tend to have heap TIDs that are
slightly out of order, so we want to at least have enough space
remaining on the left half of a "single value mode" split. We may end
up with a design where deduplication anticipates what will be useful
for nbtsplitloc.c.

I still think that it's too early to start worrying about problems
like this one -- I feel it will be useful to continue to focus on the
code and the space utilization of the serial test cases for now. We
can look at it at the same time that we think about adding back
something like BT_COMPRESS_THRESHOLD. I am mentioning it now because
it's probably a good time for you to start thinking about it, if you
haven't already (actually, maybe I'm just describing what
BT_COMPRESS_THRESHOLD was supposed to do in the first place). We'll
need to have a good benchmark to assess these questions, and it's not
obvious what that will be. Two possible candidates are TPC-H and
TPC-E. (Of course, I mean running them for real -- not using their
indexes to make sure that the nbtsplitloc.c stuff works well in
isolation.)

Any thoughts on a conventional benchmark that allows us to understand
the patch's impact on both throughput and latency?

BTW, I notice that we often have indexes that are quite a lot smaller
when they were created with retail insertions rather than with CREATE
INDEX/REINDEX. This is not new, but the difference is much larger than
it typically is without the patch. For example, the TPC-E index on
trade.t_ca_id (which is named "i_t_ca_id" or "i_t_ca_id2" in my test)
is 162 MB with CREATE INDEX/REINDEX, and 121 MB with retail insertions
(assuming the insertions use the actual order from the test). I'm not
sure what to do about this, if anything. I mean, the reason that the
retail insertions do better is that they have the nbtsplitloc.c stuff,
and because we don't split the page until it's 100% full and until
deduplication stops helping -- we could apply several rounds of
deduplication before we actually have to split the cage. So the
difference that we see here is both logical and surprising.

How do you feel about this CREATE INDEX index-size-is-larger business?

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-08-30 00:10:16 Re: REINDEX filtering in the backend
Previous Message Peter Geoghegan 2019-08-29 23:33:48 Re: Yet another fast GiST build