Re: Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running

From: El-Lotso <el(dot)lotso(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running
Date: 2007-09-14 05:51:41
Message-ID: 1189749101.22514.39.camel@neuromancer.home.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 2007-09-12 at 10:41 -0400, Tom Lane wrote:
> El-Lotso <el(dot)lotso(at)gmail(dot)com> writes:
> > I'm really at my wits end here.
>
> Try to merge the multiple join keys into one, somehow. I'm not sure why
> the planner is overestimating the selectivity of the combined join
> conditions, but that's basically where your problem is coming from.

I've tried merging them together.. what previously was

INNER JOIN TS
ON TS.ID = TRH.ID AND
TS.TTYPE = TRH.TTYPE AND
TS.START_TIMESTAMP = TRH.START_TIMESTAMP

has become
inner join TS
on ts.id_ttype_startstamp = trh.id_ttype_startstamp

where id_ttype_startstamp = (id || '-'||ttype || '-' || start_timestamp)

It's working somewhat better but everything is not as rosy as it should
as the planner is still over/under estimating the # of rows.

FROM org :
Nested Loop Left Join (cost=10612.48..24857.20 rows=1 width=61) (actual
time=1177.626..462856.007 rows=750 loops=1)

TO merge joined conditions :
Hash Join (cost=41823.94..45889.49 rows=6101 width=61) (actual
time=3019.609..3037.692 rows=750 loops=1)
Hash Cond: (trd.trd_join_key = ts.ts_join_key)

Merged Join using the Main table : 3 - 5 million rows
Hash Left Join (cost=80846.38..121112.36 rows=25 width=244) (actual
time=5088.437..5457.269 rows=750 loops=1)

Note that it still doesn't really help that much, the estimated rows is
still way off the actual number of rows. On one of the querys there the
hid field has a subset of 8 values, it's even worst. And it seems like
the merge condition doesn't help at all.

I'm still trying to merge more join conditions to see if it helps.

> A truly brute-force solution would be "set enable_nestloop = off"
> but this is likely to screw performance for other queries.

I've also tried this... It's not helping much actually.
As mentioned previously, this is a one to many relationship and because
of that, somehow PG just doesn't take it into account.

I'm still not having much luck here. (playing with a subset of the main
table's data _does_ show some promise, but when querying on the main
table w/ 3 million data, everything grinds to a halt)

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Ow Mun Heng 2007-09-14 06:30:27 Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running
Previous Message Bruce Momjian 2007-09-14 04:23:00 Re: When/if to Reindex