Quick Links

Re: BUG #5294: Sorts on more than just the order-by clause

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Allen Johnson <akjohnson78(at)gmail(dot)com>
Cc:	pgsql-bugs(at)postgresql(dot)org
Subject:	Re: BUG #5294: Sorts on more than just the order-by clause
Date:	2010-01-23 03:55:58
Message-ID:	28616.1264218958@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

Allen Johnson <akjohnson78(at)gmail(dot)com> writes:
> Ok, I've generated a test database with:
> * 20,000 users
> * 250,000 contacts
> * 1,124,700 attachments
> The summary of the results is that the normal query takes about 32sec
> on my machine. The hack query takes about 13sec.

I poked at this for a bit. At least with the test data (dunno about
your real data), the first few grouping columns are pretty nearly unique
so the "extra" sort columns really aren't affecting the runtime anyway.
I believe that the reason the hacked query is cheaper is simply that the
sort is sorting fewer rows because it's applied after aggregation
instead of beforehand.

The planner is well aware of that effect, but the reason it fails to
choose hashed aggregation is that it doesn't think the aggregation will
reduce the number of rows --- so it estimates the sort for that case as
being much more expensive than it really is. Notice that the
post-aggregation and pre-aggregation rowcount estimates are just the
same in both these queries. If I force choose_hashed_grouping() to
make the other decision, I get the same plan out of the "normal"
query as the hacked query produces.

I have an idea for improving the accuracy of the post-aggregation
rowcount estimate, which I'll post on pgsql-hackers in a bit. But
it's not something I have enough confidence in to risk back-patching.
So for the moment your hack with forcing the sort to be done separately
is probably your best answer.

regards, tom lane

In response to

Re: BUG #5294: Sorts on more than just the order-by clause at 2010-01-22 21:07:39 from Allen Johnson

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Euler Taveira de Oliveira	2010-01-24 01:26:08	Re: BUG #5295: Function OUT parameters names and data types skewed with IN parameters
Previous Message	Allen Johnson	2010-01-22 21:07:39	Re: BUG #5294: Sorts on more than just the order-by clause