Re: bad plan

From: Ants Aasma <ants(at)cybertec(dot)at>
To: Julien Cigar <jcigar(at)ulb(dot)ac(dot)be>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: bad plan
Date: 2012-04-05 19:47:25
Message-ID: CA+CSw_vkSc4VymdCiHNoTmUE3SDrufBg2r156qo_5fRopW_DVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Apr 5, 2012 at 2:47 PM, Julien Cigar <jcigar(at)ulb(dot)ac(dot)be> wrote:
> - http://www.pastie.org/3731956 : with default config
> - http://www.pastie.org/3731960 : this is with enable_seq_scan = off

It looks like the join selectivity of (context_to_context_links,
ancestors) is being overestimated by almost two orders of magnitude.
The optimizer thinks that there are 564 rows in the
context_to_context_links table for each taxon_id, while in fact for
this query the number is 9. To confirm that this, you can force the
selectivity estimate to be 200x lower by adding a geo_id = geod_id
where clause to the subquery.

If it does help, then the next question would be why is the estimate
so much off. It could be either because the stats for
context_to_context_links.taxon_id are wrong or because
ancestors.taxon_id(subphylum_id = 18830) is a special case. To help
figuring this is out, you could run the following to queries and post
the results:

SELECT floor(log(num,2)) AS nmatch, COUNT(*) AS freq FROM (SELECT
COUNT(*) AS num FROM context_to_context_links GROUP BY taxon_id) AS
dist GROUP BY 1 ORDER BY 1;

SELECT floor(log(num,2)) AS nmatch, COUNT(*) AS freq FROM (SELECT
COUNT(*) AS num FROM context_to_context_links WHERE NOT geo_id IS NULL
and taxon_id= ANY ( select taxon_id from rab.ancestors where
ancestors.subphylum_id = 18830) GROUP BY taxon_id) AS dist GROUP BY
1 ORDER BY 1;

If the second distribution has a significantly different shape then
cross column statistics are necessary to get good plans. As it happens
I'm working on adding this functionality to PostgreSQL and would love
to hear more details about your use-case to understand if it would be
solved by this work.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

In response to

  • bad plan at 2012-04-05 11:47:33 from Julien Cigar

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tomas Vondra 2012-04-05 20:06:49 Re: H800 + md1200 Performance problem
Previous Message Merlin Moncure 2012-04-05 18:43:55 Re: H800 + md1200 Performance problem