Split interval (used by nbtree suffix truncation) and posting list tuples

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Split interval (used by nbtree suffix truncation) and posting list tuples
Date: 2020-04-17 19:11:52
Message-ID: CAH2-WznJt5aT2uUB2Bs+JBLdwe0XTX67+xeLFcaNvCKxO=QBVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We choose a split point in nbtsplitloc.c primarily based on evenly
dividing space among left and right halves of the split, while giving
secondary consideration to suffix truncation (there is other logic
that kicks in when there are many duplicates, which isn't related to
what I want to talk about). See my commit fab25024 from Postgres 12
for background information.

The larger the split interval, the more unbalanced page splits are
allowed to be, which is a cost that we may be willing to pay to
truncate more suffix attributes in the new high key. Split interval
represents a trade-off between two competing considerations (a
potential cost versus a potential benefit). Unfortunately, commit
fab25024 defined "split interval" based on logic that makes the
assumption that tuples on the same page are more or less of a uniform
size. That assumption was questionable when suffix truncation went in,
but now that deduplication exists the assumption seems quite risky. I
am concerned that suffix truncation will be far more aggressive than
appropriate when the delta-optimal split point is near large posting
list tuples, that are size outliers on the page. The logic in
nbtsplitloc.c might accept a much more unbalanced page split than is
truly reasonable because the average width of tuples on the page isn't
so large, even though the tuples around where we want to split the
page are very large.

Fortunately it's pretty easy to nail this down. We can determine the
default strategy split interval based on the cost that we actually
care about (not a fuzzy proxy of that cost): a leftfree and rightfree
space tolerance from the space optimal candidate split point,
expressed in bytes (actually, expressed as a proportion of the total
space that is used for data items on the page, which is converted into
bytes to form a tolerance). That's the way it's done in the attached
patch.

I plan to commit this patch next week, barring any objections.

--
Peter Geoghegan

Attachment Content-Type Size
0001-Redefine-split-interval-to-be-space-wise.patch application/octet-stream 5.6 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-04-17 19:16:46 Re: Poll: are people okay with function/operator table redesign?
Previous Message Andrew Dunstan 2020-04-17 19:00:15 Re: Build errors in VS