Re: Strange explain on partitioned tables

From: Philippe Rimbault <primbault(at)edd(dot)fr>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Strange explain on partitioned tables
Date: 2010-07-22 14:10:51
Message-ID: 4C48516B.90103@edd.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Greg,

First : thank you for you help.

On 22/07/2010 15:32, Greg Smith wrote:
> Philippe Rimbault wrote:
>> I have one thousand inherited tables like this one (with a different
>> check constraint on each) :
>
> The PostgreSQL partitioning system is aimed to support perhaps a
> hundred inherited tables. You can expect to get poor performance on
> queries if you create 1000 of them. That's not the cause of your
> current problem, just pointing out there's a larger design problem
> here you'll probably have to fix one day.

Right now, there is only 6 inherited tables, but for performance issue,
where are testing solutions on more partionned systeme (all work fine
except for query with order by).

>
>> EXPLAIN ANALYZE
>> select
>> documents.id,
>> documents.num,
>> sources.name,
>> l.name
>> from
>> documents,
>> locations l,
>> sources
>> where
>> documents.id_source = 113 and
>> documents.id_location=l.id and
>> documents.id_source=sources.id
>> order by
>> documents.id desc
>> limit 5;
>
> Please don't put your EXPLAIN plans surrounded in " marks; it makes it
> harder to copy out of your message to analyze them with tools. I put
> this bad one into http://explain.depesz.com/s/XD and it notes that the
> "public.documents.id_location = l.id" search is underestimating the
> number of rows by a factor of 8.7. You might get a better plan if you
> can get better table statistics on that column. Did you run ANALYZE
> since the partitioning was done? If not, that could be making this
> worse. You might increase the amount of table statistics on this
> specific column too, not sure what would help without knowing exactly
> what's in there.
>
> Another thing you can try is suggest the optimizer not use a hash join
> here and see if it does the right thing instead; be a useful bit of
> feedback to see what that plan turns out to be. Just put "set
> enable_hashjoin=off;" before the rest of the query, it will only
> impact that session.
>

Sorry for the output of the EXPLAIN ...

VACUUM ANALYZE have been done just before test of query.

I think that the optimizer overestimates "public.documents.id_location =
l.id" because it plan on the primary table and not the child ...
I've change statistics to 1000 for documents.id_location and result is
the same.

I've tested "set enable_hashjoin=off;" and the result is worst (sorry
i'm still using 9.0b3) :

Limit (cost=197755.49..197755.50 rows=5 width=23) (actual
time=4187.148..4187.150 rows=5 loops=1)
-> Sort (cost=197755.49..200390.32 rows=1053932 width=23) (actual
time=4187.146..4187.147 rows=5 loops=1)
Sort Key: public.documents.id
Sort Method: top-N heapsort Memory: 17kB
-> Nested Loop (cost=151258.55..180250.06 rows=1053932
width=23) (actual time=1862.214..3769.611 rows=1053929 loops=1)
-> Index Scan using pk_sources on sources
(cost=0.00..8.27 rows=1 width=8) (actual time=0.007..0.013 rows=1 loops=1)
Index Cond: (id = 113)
-> Merge Join (cost=151258.55..169702.47 rows=1053932
width=21) (actual time=1862.204..3360.555 rows=1053929 loops=1)
Merge Cond: (l.id = public.documents.id_location)
-> Sort (cost=1.75..1.81 rows=23 width=10)
(actual time=0.028..0.036 rows=21 loops=1)
Sort Key: l.id
Sort Method: quicksort Memory: 17kB
-> Seq Scan on locations l (cost=0.00..1.23
rows=23 width=10) (actual time=0.002..0.009 rows=23 loops=1)
-> Materialize (cost=151256.80..156526.46
rows=1053932 width=13) (actual time=1862.162..2841.302 rows=1053929 loops=1)
-> Sort (cost=151256.80..153891.63
rows=1053932 width=13) (actual time=1862.154..2290.881 rows=1053929 loops=1)
Sort Key: public.documents.id_location
Sort Method: external merge Disk: 24496kB
-> Append (cost=0.00..27810.36
rows=1053932 width=13) (actual time=0.003..838.644 rows=1053929 loops=1)
-> Seq Scan on documents
(cost=0.00..18.25 rows=3 width=39) (actual time=0.000..0.000 rows=0 loops=1)
Filter: (id_source = 113)
-> Seq Scan on documents_mond
documents (cost=0.00..27792.11 rows=1053929 width=13) (actual
time=0.002..502.345 rows=1053929 loops=1)
Filter: (id_source = 113)
Total runtime: 4198.703 ms

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Craig James 2010-07-22 16:57:06 Re: Using more tha one index per table
Previous Message Greg Smith 2010-07-22 13:32:36 Re: Strange explain on partitioned tables