Quick Links

Re: B-Tree support function number 3 (strxfrm() optimization)

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc:	Noah Misch <noah(at)leadboat(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Greg Stark <stark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: B-Tree support function number 3 (strxfrm() optimization)
Date:	2014-08-03 06:53:11
Message-ID:	CAM3SWZQTYv3KP+CakZJZV3RwB1OJjaHwPCZ9cOYJXPkhbtcBVg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Consider the "cities" table I've played around with throughout the
development of this patch:

Consider this query:

select * from (select * from cities order by country, province, city
offset 1000000) i;

With master, this consistently takes about 6.4 seconds on my laptop.
With the patch I posted most recently, it takes about 3.5 seconds. But
if I change this code:

+ if (ssup->position == sortKeyTiebreak && len1 == len2)
+ {

To this:

+ if (len1 == len2)
+ {

Then my patch takes about 1.3 seconds to execute the query, since now
we're always optimistic about the prospect of getting away with a
cheap memcmp() when the lengths match, avoiding copying and strcoll()
overhead entirely. "province" has a relatively low number of distinct
values - about 1,958, in a table of 317,102 cities - so clearly that
optimism is justified in this instance.

This does seem to suggest that there is something to be said for being
optimistic about yielding equality not just when performing a leading
attribute abbreviated key tie-breaker. Maybe the executor could pass a
per-attribute n_distinct hint to tuplesort, which would pass that on
to our sortsupport routine. Of course, if there was a low cardinality
attribute with text strings all of equal length, and/or if there
wasn't a correlation between "country" and "province" a lot of the
time those opportunistic memcmp()s could go to waste. But that might
be just fine, especially if we only did this for moderately short
strings (say less than 64 bytes). I don't have a good sense of the
costs yet, but the hashing that HyperLogLog requires in my patch
appears to be almost free, since it occurs at a time when we're
already bottlenecked on memory bandwidth, and is performed on memory
that needed to be manipulated at that juncture anyway. It wouldn't
surprise me if this general optimism about a simple memcmp() working
out had a very acceptable worst case. It might even be practically
free to be totally wrong about equality being likely, and if we have
nothing to lose and everything to gain, clearly we should proceed. I'm
aware of cases where it makes probabilistic sense to waste compute
bandwidth to gain memory bandwidth. It's something I've seen crop up a
number of times in various papers.

--
Peter Geoghegan

In response to

Re: B-Tree support function number 3 (strxfrm() optimization) at 2014-08-02 22:58:14 from Peter Geoghegan

Responses

Re: B-Tree support function number 3 (strxfrm() optimization) at 2014-08-06 02:32:35 from Peter Geoghegan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Bruce Momjian	2014-08-03 13:27:56	Re: Proposed changing the definition of decade for date_trunc and extract
Previous Message	Mike Swanson	2014-08-03 06:53:06	Re: Proposed changing the definition of decade for date_trunc and extract