Re: bad plan

From: Julien Cigar <jcigar(at)ulb(dot)ac(dot)be>
To: Ants Aasma <ants(at)cybertec(dot)at>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: bad plan
Date: 2012-04-06 11:16:39
Message-ID: 4F7ED097.70807@ulb.ac.be
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 04/05/2012 21:47, Ants Aasma wrote:
> On Thu, Apr 5, 2012 at 2:47 PM, Julien Cigar<jcigar(at)ulb(dot)ac(dot)be> wrote:
>> - http://www.pastie.org/3731956 : with default config
>> - http://www.pastie.org/3731960 : this is with enable_seq_scan = off
> It looks like the join selectivity of (context_to_context_links,
> ancestors) is being overestimated by almost two orders of magnitude.
> The optimizer thinks that there are 564 rows in the
> context_to_context_links table for each taxon_id, while in fact for
> this query the number is 9. To confirm that this, you can force the
> selectivity estimate to be 200x lower by adding a geo_id = geod_id
> where clause to the subquery.

adding a geo_id = geo_id to the subquery helped a little bit with a
cpu_tuple_cost of 0.1: http://www.pastie.org/3738224 :

without:

Index Scan using ltlc_taxon_id_idxoncontext_to_context_links (cost=0.00..146.93 rows=341 width=8) (actual time=0.004..0.019 rows=9 loops=736)

with geo_id = geo_id:

Index Scan using ltlc_taxon_id_idxoncontext_to_context_links (cost=0.00..148.11 rows=2 width=8) (actual time=0.004..0.020 rows=9 loops=736)

> If it does help, then the next question would be why is the estimate
> so much off. It could be either because the stats for
> context_to_context_links.taxon_id are wrong or because
> ancestors.taxon_id(subphylum_id = 18830) is a special case. To help
> figuring this is out, you could run the following to queries and post
> the results:
>
> SELECT floor(log(num,2)) AS nmatch, COUNT(*) AS freq FROM (SELECT
> COUNT(*) AS num FROM context_to_context_links GROUP BY taxon_id) AS
> dist GROUP BY 1 ORDER BY 1;
>
> SELECT floor(log(num,2)) AS nmatch, COUNT(*) AS freq FROM (SELECT
> COUNT(*) AS num FROM context_to_context_links WHERE NOT geo_id IS NULL
> and taxon_id= ANY ( select taxon_id from rab.ancestors where
> ancestors.subphylum_id = 18830) GROUP BY taxon_id) AS dist GROUP BY
> 1 ORDER BY 1;

I'm sorry but I get an "ERROR: division by zero" for both of your queries..

> If the second distribution has a significantly different shape then
> cross column statistics are necessary to get good plans. As it happens
> I'm working on adding this functionality to PostgreSQL and would love
> to hear more details about your use-case to understand if it would be
> solved by this work.

Thank you for your help,
Julien

> Regards,
> Ants Aasma

--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.

Attachment Content-Type Size
jcigar.vcf text/x-vcard 292 bytes

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jeff Janes 2012-04-06 17:11:37 Re: Planner selects slow "Bitmap Heap Scan" when "Index Scan" is faster
Previous Message Tomas Vondra 2012-04-05 20:06:49 Re: H800 + md1200 Performance problem