Re: Query running a lot faster with enable_nestloop=false

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Mohanaraj Gopala Krishnan <mohangk(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query running a lot faster with enable_nestloop=false
Date: 2011-11-03 15:35:00
Message-ID: CA+TgmoY1Qhi6-8iz_y6Qr_Wnjcs_BXufHHOZPuBqij_4jewVHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Oct 25, 2011 at 5:09 AM, Mohanaraj Gopala Krishnan
<mohangk(at)gmail(dot)com> wrote:
> I have a query that runs a lot slower (~5 minutes) when I run it with
> the default enable_nestloop=true and enable_nestloop=false (~10 secs).
> The actual query is available here http://pastie.org/2754424 . It is a
> reporting query with many joins as the database is mainly used for
> transaction processing.
>
> Explain analyse result for both cases:
>
> Machine A nestloop=true - http://explain.depesz.com/s/nkj0 (~5 minutes)
> Machine A nestloop=false - http://explain.depesz.com/s/wBM (~10 secs)

A good start might be to refactor this:

Seq Scan on retailer_categories retailer_category_leaf_nodes
(cost=0.00..18.52 rows=1 width=16) (actual time=0.016..0.392 rows=194
loops=1)
Filter: ((tree_right - tree_left) = 1)

And this:

Seq Scan on product_categories product_category_leaf_nodes
(cost=0.00..148.22 rows=2 width=32) (actual time=0.031..1.109 rows=383
loops=1)
Filter: ((tree_right - tree_left) = 1)

The query planner really has no idea what selectivity to assign to
that condition, and so it's basically guessing, and it's way off. You
could probably improve the estimate a lot by adding a column that
stores the values of tree_right - tree_left and is updated manually or
by triggers as you insert and update data. Then you could just check
tree_left_right_difference = 1, which should get a much more accurate
estimate, and hopefully therefore a better plan.

You've also got a fairly large estimation error here:

Index Scan using invoices_create_date_idx on invoices (cost=0.00..8.28
rows=1 width=4) (actual time=0.055..0.305 rows=109 loops=1)
Index Cond: ((create_date >= '2011-09-15'::date) AND (create_date
<= '2011-09-15'::date))
Filter: (status = 128)

Apparently, status 128 is considerably more common among rows in that
date range than it is overall. Unfortunately, it's not so easy to fix
this kind of estimation error, unless you can refactor your schema to
avoid needing to filter on both create_date and status at the same
time.

It might be worth using temporary tables here - factor out sections of
the query that are referenced multiple times, like the join between
sales_order_items and invoices, and create a temporary table. ANALYZE
it, and then use it to run the main query.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2011-11-03 15:40:30 Re: function slower than the same code in an sql file
Previous Message Rodrigo Gonzalez 2011-11-03 15:31:59 Re: function slower than the same code in an sql file