Re: B-Tree support function number 3 (strxfrm() optimization)

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Greg Stark <stark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: B-Tree support function number 3 (strxfrm() optimization)
Date: 2014-08-03 06:53:11
Message-ID: CAM3SWZQTYv3KP+CakZJZV3RwB1OJjaHwPCZ9cOYJXPkhbtcBVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Consider the "cities" table I've played around with throughout the
development of this patch:

postgres=# select tablename, attname, n_distinct, correlation from
pg_stats where attname in ('country', 'province', 'city');
tablename | attname | n_distinct | correlation
-----------+----------+------------+-------------
cities | city | -0.666628 | -0.0169657
cities | province | 1958 | 0.0776529
cities | country | 206 | 1
(3 rows)

Consider this query:

select * from (select * from cities order by country, province, city
offset 1000000) i;

With master, this consistently takes about 6.4 seconds on my laptop.
With the patch I posted most recently, it takes about 3.5 seconds. But
if I change this code:

+ if (ssup->position == sortKeyTiebreak && len1 == len2)
+ {

To this:

+ if (len1 == len2)
+ {

Then my patch takes about 1.3 seconds to execute the query, since now
we're always optimistic about the prospect of getting away with a
cheap memcmp() when the lengths match, avoiding copying and strcoll()
overhead entirely. "province" has a relatively low number of distinct
values - about 1,958, in a table of 317,102 cities - so clearly that
optimism is justified in this instance.

This does seem to suggest that there is something to be said for being
optimistic about yielding equality not just when performing a leading
attribute abbreviated key tie-breaker. Maybe the executor could pass a
per-attribute n_distinct hint to tuplesort, which would pass that on
to our sortsupport routine. Of course, if there was a low cardinality
attribute with text strings all of equal length, and/or if there
wasn't a correlation between "country" and "province" a lot of the
time those opportunistic memcmp()s could go to waste. But that might
be just fine, especially if we only did this for moderately short
strings (say less than 64 bytes). I don't have a good sense of the
costs yet, but the hashing that HyperLogLog requires in my patch
appears to be almost free, since it occurs at a time when we're
already bottlenecked on memory bandwidth, and is performed on memory
that needed to be manipulated at that juncture anyway. It wouldn't
surprise me if this general optimism about a simple memcmp() working
out had a very acceptable worst case. It might even be practically
free to be totally wrong about equality being likely, and if we have
nothing to lose and everything to gain, clearly we should proceed. I'm
aware of cases where it makes probabilistic sense to waste compute
bandwidth to gain memory bandwidth. It's something I've seen crop up a
number of times in various papers.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-08-03 13:27:56 Re: Proposed changing the definition of decade for date_trunc and extract
Previous Message Mike Swanson 2014-08-03 06:53:06 Re: Proposed changing the definition of decade for date_trunc and extract