Re: How can I make PosgreSQL use an Index ?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Rhaoni Chiu Pereira <rhaoni(at)sistemica(dot)info>
Cc: PostgreSQL ADMIN <pgsql-admin(at)postgresql(dot)org>
Subject: Re: How can I make PosgreSQL use an Index ?
Date: 2003-09-30 15:08:56
Message-ID: 19517.1064934536@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Rhaoni Chiu Pereira <rhaoni(at)sistemica(dot)info> writes:
> [ very large join plan ]

Sorry for not responding sooner --- somehow this got overlooked in my
inbox. It looks to me like the big problem is that you have all these
nested-loop joins:

> -> Nested Loop (cost=100025960.94..100027780.65 rows=1
> width=818) (actual time=3476.87..99606.77 rows=19923 loops=1)
> Join Filter: (("outer".emp = "inner".emp) AND ("outer".fil
> = "inner".fil) AND ("outer".codigo_supervisor = "inner".codigo_supervisor) AND
> ("outer".codigo_grupo_rep = "inner".codigo_grupo_rep))
> -> Nested Loop (cost=100025960.94..100027775.22 rows=1
> width=765) (actual time=3476.74..97802.69 rows=19923 loops=1)
> Join Filter: (("inner".ano_mes)::text = to_char
> ("outer".data_emissao, 'YYYYMM'::text))
> -> Nested Loop (cost=25960.94..27762.92 rows=1
> width=755) (actual time=3475.14..32090.12 rows=19923 loops=1)
> Join Filter: (("inner".emp = "outer".emp) AND
> ("outer".fil = "inner".fil) AND ("outer".codigo = "inner".cla_categoria) AND
> ("outer".codigo = "inner".cla_marca) AND ("outer".codigo = "inner".situacao))
> -> Nested Loop (cost=25960.94..27705.22
> rows=10 width=665) (actual time=3474.12..17734.21 rows=199230 loops=1)
> Join Filter: (("outer".emp
> = "inner".emp) AND ("inner".fil = "outer".fil))
> -> Nested Loop
> (cost=25960.94..27699.30 rows=1 width=638) (actual time=3474.02..6030.09
> rows=19923 loops=1)
> Join Filter: (("inner".emp
> = "outer".emp) AND ("inner".empfil = "outer".fil))

The reason the planner is choosing nestloop here is that it thinks there
are only a few rows involved (notice the estimated row counts are all "1"
or "10"). Nestloop is a fine join plan for small numbers of rows, but
it pretty well sucks for tens of thousands of rows which is what you've
actually got. I am not sure why the row-count estimates are so far off,
but it could be because the planner is unaware of cross-column
correlations in your data. Are the multiple join conditions actually
necessary, or are some of them redundant?

A quick and dirty thing you could try to see if the plan can be improved
is to set "enable_nestloop" off (do NOT set "enable_seqscan" off).

In the long run you want the planner to do better without such a
brute-force hack, though. A simple answer is to boost the statistics
target on the join columns and re-analyze, but that may not help much
if the real issue is cross-column correlations. A more invasive
solution is to reconsider your data design. It looks like you have a
lot of multi-column join keys --- can you find ways to combine those
into single columns? (As an example, I'd never build a table containing
separate date and time columns rather than a single timestamp column.)
I realize that this'd probably be a pain in the neck, but if you can do
it, it would simplify your queries as well as help the planner produce
better plans.

regards, tom lane

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Anna Kanevsky 2003-09-30 15:28:33 Delete accident
Previous Message lgama@sagitario.cic.ipn.mx 2003-09-29 19:59:27 remove my email account, please