Re: Partitioned Tables and ORDER BY

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>
Cc: Michal Szymanski <mich20061(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Partitioned Tables and ORDER BY
Date: 2009-10-19 19:10:56
Message-ID: 603c8f070910191210j41522833le6053718efff88e6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

2009/10/19 Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>:
>
>
> 2009/10/19 Robert Haas <robertmhaas(at)gmail(dot)com>
>>
>> 2009/10/19 Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>:
>> >
>> >
>> > On Sun, Oct 11, 2009 at 3:30 PM, Michal Szymanski <mich20061(at)gmail(dot)com>
>> > wrote:
>> >>
>> >> We have similar problem and now we are try to find solution. When you
>> >> execute query on partion there is no sorting - DB use index to
>> >> retrieve data and if you need let say 50 rows it reads 50 rows using
>> >> index. But when you execute on parent table query optymizer do this:
>> >>
>> >>  ->  Sort  (cost=726844.88..748207.02 rows=8544855 width=37739)
>> >> (actual time=149864.868..149864.876 rows=50 loops=1)
>> >>
>> >> it means 8544855 rows should be sorted and it takes long minutes.
>> >
>> > The figures in first parenthesis are estimates, not the actual row
>> > count.
>> > If you think it is too low, increase statistic target for that column.
>>
>> It's true that the figures in parentheses are estimates, it's usually
>> bad when the estimated and actual row counts are different by 5 orders
>> of magnitude, and that large of a difference is not usually fixed by
>> increasing the statistics target.
>>
> I thought that this means, that either analyze was running quite a long time
> ago, or that the value didn't made it to histogram. In the later case,
> that's mostly case when your statistic target is low, or that the value is
> really 'rare'.

It's possible, but (1) most people are running autovacuum these days,
in which case this isn't likely to occur and (2) most people do not
manage to expand the size of a table by five orders of magnitude
without analyzing it. Generally these kinds of problems come from bad
selectivity estimates.

In this case, though, I think that the actual number is less than the
estimate because of the limit node immediately above. The problem is
just that a top-N heapsort requires scanning the entire set of rows,
and scanning 8 million rows is slow.

...Robert

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message phb07@apra.asso.fr 2009-10-19 19:32:18 maintain_cluster_order_v5.patch
Previous Message Tom Lane 2009-10-19 18:23:45 Re: Calculation of unused columns