Re: Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running

From: El-Lotso <el(dot)lotso(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running
Date: 2007-09-12 08:09:33
Message-ID: 1189584573.13796.12.camel@neuromancer.home.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, 2007-09-12 at 10:15 +0800, El-Lotso wrote:
> On Tue, 2007-09-11 at 14:23 -0400, Tom Lane wrote:
> > El-Lotso <el(dot)lotso(at)gmail(dot)com> writes:
> > > sorry.. I sent this as I was about to go to bed and the explain analyse
> > > of the query w/ 4 tables joined per subquery came out.
> >
> > It's those factor-of-1000 misestimates of the join sizes that are
> > killing you, eg this one:
> >
> > > -> Hash Join (cost=249.61..512.56 rows=1 width=87) (actual time=15.139..32.858 rows=969 loops=1)
> > > Hash Cond: (((test_db.ts.id)::text = (test_db.d.id)::text) AND (test_db.ts.start_timestamp = test_db.trd.start_timestamp) AND (test_db.ts.ttype = test_db.trd.ttype))
> > > -> Seq Scan on ts (cost=0.00..226.44 rows=3244 width=40) (actual time=0.135..6.916 rows=3244 loops=1)
> > > -> Hash (cost=235.00..235.00 rows=835 width=47) (actual time=14.933..14.933 rows=1016 loops=1)
> >
> > The single-row-result estimate persuades it to use a nestloop at the
> > next level up, and then when the output is actually 969 rows, that
> > means 969 executions of the other side of the upper join.
>
> Yep.. that's consistent with the larger results output. more rows = more
> loops

I'm on the verge of giving up... the schema seems simple and yet there's
so much issues with it. Perhaps it's the layout of the data, I don't
know. But based on the ordering/normalisation of the data and the one to
many relationship of some tables, this is giving the planner a headache
(and me a bulge on the head from knockin it against the wall)

I've tried multiple variations, subqueries, not use subqueries, not join
the table, (but to include it as a subquery - which gets re-written to a
join anyway) exists/not exists to no avail.

PG is fast, yes even w/ all the nested loops for up to 48K of results,
(within 4 minutes) but as soon as I put it into a inner join/left
join/multiple temporary(memory) tables it will choke.

select
a.a,b.b,c.c from
(select
x,y,z
from zz)a
inner join b
on a.a = b.a
left join (select
x,a,z
from xx)
then it will choke.

I'm really at my wits end here.

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Nis Jørgensen 2007-09-12 13:14:25 Re: 500rows = 1min/2.5k rows=20min/6K rows 2 hours and still running
Previous Message ruben 2007-09-12 07:42:29 Re: [Again] Postgres performance problem