[patch] _bt_binsrch* improvements - equal-prefix-skip binary search

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: [patch] _bt_binsrch* improvements - equal-prefix-skip binary search
Date: 2020-09-11 14:57:36
Message-ID: CAEze2WjnrkzT9tfqFhSCPeH6LV-UUj5_oO5vX038dvfK7fDuyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I've not yet been involved in postgresql's development process, so here's a
first. Please find attached a patch for improving the BT binary search
speeds, with accompanying performance data.

v1-0001 adapts _bt_compare and _bt_binsrch* to make use of static
properties of L&Y-style nbtree indexes to speed up finding an initial
position in the nbtee.

I alter the logic from _bt_compare to accept a 'start-comparing-at-column'
argument, and to also return which column the comparison result came from.
This allows us to keep track of which columns we've already checked for
equality, and when getting deeper into the index this allows us to skip
checking the already equal key columns.

Specifically, when binary-searching through a page, we keep track of which
column was checked for the high-key, and which for the low-key. The first
column of these that is not yet equal will be used for the next comparison.
Any columns before this column must be equal, as both the high- and the
low-keys in that page have already been validated as having an equal
prefix. The column number is then passed on through the insertion key,
allowing the optimization to be used in the climbing of the nbtree index.

v1-0001 will be especially performant in nbtree indexes with a relatively
low initial cardinality and high compare cost. My own performance testing
was done on a laptop (4c/8t), after first getting it warm with other
benchmarks & compilations, so the results are a bit unstable.

While testing this I also noticed that there were a lot of pipeline stalls
around the arguments and results of _bt_compare in the hot loops of
_bt_binsrch* where the new functionality would be used, so I've moved the
core logic of _bt_compare to a static inline function _bt_compare_inline,
which helped the code to go from a 2% TPS decrease for single-column
indexes to ~ 8% TPS increase for an insert-only workload, and for 3-column
text all-collated indexes a TPS increase of 20% on my system. This also
allowed me to keep the signature of _bt_compare the same as it was,
limiting the scope of the changes to only the named functions.

Finally, this could be a great start on prefix truncation for btree
indexes, though that is _not_ the goal of this patch. This patch skips, but
does not truncate, the common prefix.

Kind regards,

Matthias van de Meent

P.S. One more thing I noticed in benchmarking is that a significant part of
the costs of non-default collations is in looking up the collation twice in
the collation cache. My guess from flame graphs is that there could be a
large throughput increase for short strings if the collation lookup from
lc_collate_is_c() in varstr_cmp could be reused in the subsequent call to
pg_newlocale_from_collation()

Attachment Content-Type Size
performance-tests.txt text/plain 32.5 KB
v1-0001-Update-_bt_binsrch-to-account-for-prefix-column-equa.patch text/x-patch 15.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message 曾文旌 2020-09-11 15:00:12 Re: [Proposal] Global temporary tables
Previous Message Tom Lane 2020-09-11 14:52:23 Re: Logical Replication - detail message with names of missing columns