Re: Progress on fast path sorting, btree index creation time

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jay Levitt <jay(dot)levitt(at)gmail(dot)com>, "Jim \"Decibel!\" Nasby" <decibel(at)decibel(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Progress on fast path sorting, btree index creation time
Date: 2012-02-08 16:53:37
Message-ID: CAEYLb_W=j0_fXtcXiaDXAbALX8AwQCfP7aoKm3YjDqqducsf0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8 February 2012 15:17, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Feb 8, 2012 at 9:51 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> IMO this patch is already well past the point of diminishing returns in
>> value-per-byte-added.  I'd like to see it trimmed back to provide a fast
>> path for just single-column int4/int8/float4/float8 sorts.  The other
>> cases aren't going to offer enough of a win to justify the code space.
>
> I'm curious about how much we're gaining from the single-column
> specializations vs. the type-specific specializations.  I think I'm
> going to go try to characterize that.

I think it might make more sense to lose the type-specific
specialisations for the multi-key case while adding a generic
multi-key specialisation, than to lose all multi-key specialisations,
though I have not considered that question at length, and would think
that we'd still want to keep an int4 version in that case. Note that I
*did not* include a generic multi-key specialisation, though only
because I saw little point, having already covered by far the most
common cases.

While you're at it, I'd like to suggest that you perform a benchmark
on a multi-key specialisation, so we can see just what we're throwing
away before we do so. Better to have those numbers come from you.

I continue to maintain that the most appropriate course of action is
to provisionally commit all specialisations. If it's hard to know what
effect this is going to have on real workloads, let's defer to beta
testers, who presumably try the new release out with their
application. It's a question you could squarely put to them, without
gradually rolling back from that initial position being much of a
problem.

The mysql-server package is 45 MB on Fedora 16. That 1% of Postgres
binary figure is for my earlier patch with btree specialisations,
right? I'm not asking you to look at that right now. I also don't
think that "where do we eventually draw the line with specialisations
like this in Postgres generally?" is a question that you should expect
me to answer, though I will say that we should look at each case on
its merits.

I have not "totally denied" binary bloat costs. I have attempted to
quantify them, while acknowledging that such a task is difficult, as
was evident from the fact that Robert "wasn't suprised" that I could
not demonstrate any regression. Granted, my definition of a regression
is that there is very clearly no net loss in performance at some
reasonable granularity, which is a very practical definition. You can
quite easily contrive a case that HOT handles really badly. Some
people did, I believe, but HOT won out because it was clearly very
useful in the real world.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2012-02-08 17:23:44 Re: Text-any concatenation volatility acting as optimization barrier
Previous Message Marti Raudsepp 2012-02-08 16:47:58 Re: Text-any concatenation volatility acting as optimization barrier