Re: Progress on fast path sorting, btree index creation time

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Progress on fast path sorting, btree index creation time
Date: 2012-01-27 03:32:14
Message-ID: CA+TgmobtSBxqi3qz8PmDg-S=Vk8d9eKD1jv9s-2Y6fppTKCygg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 26, 2012 at 5:10 PM, Peter Geoghegan <peter(at)2ndquadrant(dot)com> wrote:
> Alright, so while I agree with everything you've asked for, the fact
> is that there is a controversy in relation to binary bloat, and that's
> the blocker here. How can we satisfactorily resolve that, or is that
> question adequately addressed by the benchmark that I produced?

I could go either way on that, honestly. I took a look today and
found out that on my machine, a postgres 9.0.x executable was about
282kB larger than an 8.4 postgres executable. 9.1 was about 386kB
larger than that, and 9.2 current is 239kB larger still. So it's not
as if your patch is the first one to enlarge the size of the binary -
it's obviously been growing steadily from release to release for
years. I found that your patch adds 55kB to the executable size,
which is clearly a lot more than what a typical patch adds, but still
under 1%. So I wouldn't be shocked and appalled if we decided to
commit this as-is.

But if we want to put it on a diet, the first thing I'd probably be
inclined to lose is the float4 specialization. Some members of the
audience will recall that I take dim view of floating point arithmetic
generally, but I'll concede that there are valid reasons for using
float8. I have a harder time coming up with a good reason to use
float4 - ever, for anything you care about. So I would be inclined to
think that if we want to trim this back a bit, maybe that's the one to
let go. If we want to be even more aggressive, the next thing I'd
probably lose is the optimization of multiple sortkey cases, on the
theory that single sort keys are probably by far the most common
practical case.

I'm not surprised that you weren't able to measure a performance
regression from the binary bloat. Any such regression is bound to be
very small and probably quite difficult to notice most of the time;
it's really the cumulative effect of many binary-size-increasing
changes we're worried about, not each individual one. Certainly,
trying to shrink the binary is micro-optimimzation at its finest, but
then so is inlining comparators. I don't think it can be
realistically argued that we can increasing the size of the binary
arbitrarily will never get us in trouble, much like (for a typical
American family) spending $30 to have dinner at a cheap resteraunt
will never break the budget. But if you do it every day, it gets
expensive (and fattening).

> What if third party modules could include the template_qsort_arg.h
> header, and instantiate their own specialisation?

Seems reasonable to me.

> The sort support
> function could either set an enum, or set a pointer to a qsort_arg
> specialisation, if there comparator was a little different, but still

The latter seems better.

> just a few instructions, as with float-based timestamps (I don't care
> enough about them to provide one in core, though). It would also
> essentially allow for user-defined sort functions, provided they
> fulfilled a basic interface. They may not even have to be
> comparison-based. I know that I expressed scepticism about the weird
> and wonderful ideas that some people have put forward in that area,
> but that's mostly because I don't think that GPU based sorting in a
> database is going to be practical.

A question for another day.

> Why not add this capability to the SortSupport API, given that it
> wouldn't be very hard? The user would set the enum too, so it would
> appear in explain analyze output as "custom sort" or something like
> that.

I'm unenthused about going to any trouble to change the EXPLAIN format.

> While a sorting specialisation of our qsort_arg is actually pretty
> close to optimal for scalar types and façades thereof, there could be
> a type that would benefit from using radix sort, for example, which
> isn't even comparison-based, and a user couldn't very well do that
> with the existing SortSupport API.
>
> I certainly don't care about this capability enough to defend it
> against any objections that anyone may have, especially at this late
> stage in the cycle. I just think that we might as well have it.

I don't see any reason not too, assuming it's not a lot of code.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Abhijit Menon-Sen 2012-01-27 04:17:05 Re: JSON for PG 9.2
Previous Message Tom Lane 2012-01-27 02:11:08 Re: hiding variable-length fields from Form_pg_* structs