Re: B-Tree support function number 3 (strxfrm() optimization)

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Greg Stark <stark(at)mit(dot)edu>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: B-Tree support function number 3 (strxfrm() optimization)
Date: 2014-04-08 22:21:06
Message-ID: CAM3SWZTatHOuq5YLq_Gh3-pxA-dx4T2QeJxkFhCiOOZSSLkSyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 8, 2014 at 2:48 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> I think the point here is what matters is that that gain from the
> strxfrm part of the patch is large, regardless of what the baseline is
> (right?). If there's a small loss in an uncommon worst case, that's
> probably acceptable, as long as the worst case is uncommon and the loss
> is small. But if the loss is large, or the case is not uncommon, then a
> fix for the regression is going to be a necessity.

That all seems reasonable. I just don't understand why you'd want to
break out the fmgr-elision part, given that that was already discussed
at length two years ago.

> You seem to be assuming that a fix for whatever regression is found is
> going to be impossible to find.

I think that a fix that is actually worth it on balance will be
elusive. Heikki's worst case is extremely narrow. I think he'd
acknowledge that himself. I've already fixed some plausible
regressions. For example, the opportunistic early "len1 == l3n2 &&
memcmp() == 0?" test covers the common case where two leading keys are
equal. I think we're very much into chasing diminishing returns past
this point. I think that my figure of a 5% regression is much more
realistic, even though it is itself quite unlikely.

I think that the greater point is that we don't want to take worrying
about worst case performance to extremes. Calling Heikki's 50%
regression the worst case is almost unfair, since it involves very
carefully crafted input. You could probably also carefully craft input
that made our quicksort implementation itself go quadratic, a behavior
not necessarily exhibited by an inferior implementation for the same
input. Yes, let's consider a pathological worst case, but lets put it
in the context of being one end of a spectrum of behaviors, on the
extreme fringes. In reality, only a tiny number of individual sort
operations will experience any kind of regression at all. In simple
terms, I'd be very surprised if anyone complained about a regression
at all. If anyone does, it's almost certainly not going to be a 50%
regression. There is a reason why many other systems have
representative workloads that they target (i.e. a variety of tpc
benchmarks). I think that a tyranny of the majority is a bad thing
myself, but I'm concerned that we sometimes take that too far.

I wonder, why did Heikki not add more padding to the end of the
strings in his example, in order to give strxfrm() more wasted work?
Didn't he want to make his worst case even worse? Or was is to control
for TOASTing noise?

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-04-08 22:51:04 Re: psql \d+ and oid display
Previous Message Andrew Dunstan 2014-04-08 22:18:31 Re: default opclass for jsonb (was Re: Call for GIST/GIN/SP-GIST opclass documentation)