Re: Progress on fast path sorting, btree index creation time

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Geoghegan <peter(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Progress on fast path sorting, btree index creation time
Date: 2012-02-06 21:19:07
Message-ID: 20120206211907.GG19450@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 27, 2012 at 09:37:37AM -0500, Robert Haas wrote:
> On Fri, Jan 27, 2012 at 9:27 AM, Peter Geoghegan <peter(at)2ndquadrant(dot)com> wrote:
> > Well, I don't think it's all that subjective - it's more the case that
> > it is just difficult, or it gets that way as you consider more
> > specialisations.
>
> Sure it's subjective. Two well-meaning people could have different
> opinions without either of them being "wrong". If you do a lot of
> small, in-memory sorts, more of this stuff is going to seem worthwhile
> than if you don't.
>
> > As for what types/specialisations may not make the cut, I'm
> > increasingly convinced that floats (in the following order: float4,
> > float8) should be the first to go. Aside from the fact that we cannot
> > use their specialisations for anything like dates and timestamps,
> > floats are just way less useful than integers in the context of
> > database applications, or at least those that I've been involved with.
> > As important as floats are in the broad context of computing, it's
> > usually only acceptable to store data in a database as floats within
> > scientific applications, and only then when their limitations are
> > well-understood and acceptable. I think we've all heard anecdotes at
> > one time or another, involving their limitations not being well
> > understood.
>
> While we're waiting for anyone else to weigh in with an opinion on the
> right place to draw the line here, do you want to post an updated
> patch with the changes previously discussed?

Well, I think we have to ask not only how many people are using
float4/8, but how many people are sorting or creating indexes on them.
I think it would be few and perhaps should be eliminated.

Peter Geoghegan obviously has done some serious work in improving
sorting, and worked well with the community process. He has done enough
analysis that I am hard-pressed to see how we would get similar
improvement using a different method, so I think it comes down to
whether we want the 28% speedup by adding 55k (1%) to the binary.

I think Peter has shown us how to get that, and what it will cost --- we
just need to decide now whether it is worth it. What I am saying is
there probably isn't a cheaper way to get that speedup, either now or in
the next few years. (COPY might need similar help for speedups.)

I believe this is a big win and well worth the increased binary size
because the speed up is significant, and because it is of general
usefulness for a wide range of queries. Either of these would be enough
to justify the additional 1% size, but both make it an easy decision for
me.

FYI, I believe COPY needs similar optimizations; we have gotten repeated
complaints about its performance and this method of optmization might
also be our only option.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-02-06 21:33:26 Re: Assertion failure in AtCleanup_Portals
Previous Message Tom Lane 2012-02-06 20:48:46 Re: BUG #6425: Bus error in slot_deform_tuple