Re: Inner Join of the same table

From: Sebastián Baioni <sebaioni-postgresql(at)yahoo(dot)com(dot)ar>
To: Performance PostgreSQL <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Inner Join of the same table
Date: 2006-08-15 18:53:35
Message-ID: 20060815185335.14227.qmail@web36114.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello Jim, we can't use the Where cuiT='12345678901' in the subquery because we need max(cuiL)
independently of that cuiT:
cuiT cuiL PERI FAMI
1 a 200608 0
1 a 200601 2
1 b 200607 3
1 c 200605 4
2 a 200605 9
2 c 200604 4
2 b 200608 1
We need:
where cuiT = '1'
cuiT cuiL PERI FAMI
1 a 200608 9
1 c 200605 4
If we place the Where cuiT = '1' in the subquery we couldn't get the max(FAMI) of cuiL a = 9 and
we couldn't know if that PERI is the max(PERI) of that cuiL independently of that cuiT.

Here is the explain analyze with PG 8.0 for Windows:
Explain Analyze
SELECT DISTINCT T.cuiT,T.cuiL, U.MAXPERI AS ULT_APORTE_O_DDJJ
FROM APORTES AS T
INNER JOIN
(
SELECT cuiL, MAX(PERI) AS MAXPERI
FROM APORTES
GROUP BY cuiL
) AS U ON T.cuiL=U.cuiL AND T.PERI=U.MAXPERI
WHERE T.cuiT='12345678901'
order by T.cuiT, T.cuiL, U.MAXPERI;

QUERY PLAN
1 Unique (cost=37478647.41..37478650.53 rows=312 width=62) (actual time=2677209.000..2677520.000
rows=1720 loops=1)
2 -> Sort (cost=37478647.41..37478648.19 rows=312 width=62) (actual
time=2677209.000..2677260.000 rows=3394 loops=1)
3 Sort Key: t.cuiT, t.cuiL, u.maxperi
4 -> Merge Join (cost=128944.78..37478634.48 rows=312 width=62) (actual
time=74978.000..2677009.000 rows=3394 loops=1)
5 Merge Cond: ("outer".cuiL = "inner".cuiL)
6 Join Filter: (("inner".peri)::text = "outer".maxperi)
7 -> Subquery Scan u (cost=0.00..37348434.56 rows=3951 width=47) (actual
time=130.000..2634923.000 rows=254576 loops=1)
8 -> GroupAggregate (cost=0.00..37348395.05 rows=3951 width=25) (actual
time=130.000..2629617.000 rows=254576 loops=1)
9 -> Index Scan using uesapt001 on APORTES (cost=0.00..37301678.64
rows=9339331 width=25) (actual time=110.000..2520690.000 rows=9335892 loops=1)
10 -> Sort (cost=128944.78..129100.44 rows=62263 width=40) (actual
time=30684.000..36838.000 rows=80471 loops=1)
11 Sort Key: t.cuiL
12 -> Index Scan using uesapt002 on APORTES t (cost=0.00..122643.90
rows=62263 width=40) (actual time=170.000..25566.000 rows=80471 loops=1)
13 Index Cond: (cuiT = '12345678901'::bpchar)
Total runtime: 2677640.000 ms

Thanks
Sebastián Baioni

--- "Jim C. Nasby" <jnasby(at)pervasive(dot)com> escribió:

> On Tue, Aug 15, 2006 at 03:43:29PM +0000, Sebasti?n Baioni wrote:
> > Hi Nark, thanks for your answer.
> >
> > It's expected to return 1,720 rows (of 80,471 that match with condition WHERE
> > T.cuiT='12345678901')
> >
> > We have indexes by :
> > uesapt000: cuiT, cuiL, PERI;
> > uesapt001: cuiL, PERI;
> > uesapt002: cuiT, PERI;
> >
> > We usually make a vacuum analyze and reindex of every table, and we are running 8.0 and 8.1
for windows and 7.4 for Linux.
>
> That's EXPLAIN, not EXPLAIN ANALYZE, which doesn't help us much. Best
> bet would be an EXPLAIN ANALYZE from 8.1.x. It would also be useful to
> know how MSSQL is executing this query.
>
> If it would serve your purposes, copying the WHERE clause into the
> subquery would really help things. I think it might also mean you could
> combine everything into one query.
>
> > Thanks
> > Sebasti?n Baioni
> >
> > --- Mark Lewis <mark(dot)lewis(at)mir3(dot)com> escribi?:
> >
> > > Can you provide an EXPLAIN ANALYZE of the query in PG? Have you
> > > analyzed the PG database? How many rows is this query expected to
> > > return? Which version of PG are you running? What indexes have you
> > > defined?
> > >
> > > -- Mark
> > >
> > > On Tue, 2006-08-15 at 14:38 +0000, Sebasti?n Baioni wrote:
> > > > Hello, I'm migrating from MS SQL Server to PostgreSQL 8.1 and I have a
> > > > serious problem:
> > > > Table: APORTES - Rows: 9,000,000 (9 million)
> > > > *cuiT (char 11)
> > > > *cuiL (char 11)
> > > > *PERI (char 6)
> > > > FAMI (numeric 6)
> > > >
> > > > I need all the cuiLs whose max(PERI) are from a cuiT, and the Max
> > > > (FAMI) of those cuiLs, so the sentence is:
> > > >
> > > > SELECT DISTINCT T.cuiT, T.cuiL. U.MAXPERI, U.MAXFAMI
> > > > FROM APORTES T
> > > > INNER JOIN
> > > > (SELECT cuiL, MAX(PERI) AS MAXPERI,
> > > > MAX(FAMI) AS MAXFAMI
> > > > FROM APORTES
> > > > GROUP BY cuiL) AS U
> > > > ON T.cuiL = U.cuiL AND T.PERI=U.MAXPERI
> > > > WHERE T.cuiT='12345678901'
> > > >
> > > > In MS SQL Server it lasts 1minute, in PostgreSQL for Windows it lasts
> > > > 40minutes and in PostgreSQL for Linux (FreeBSD) it lasts 20minuts.
> > > >
> > > > Do you know if there is any way to tune the server or optimize this
> > > > sentence?
> > > >
> > > > Thanks
> > > > Sebastián Baioni




__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya!
http://www.yahoo.com.ar/respuestas

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Michael Stone 2006-08-15 19:02:56 Re: Postgresql Performance on an HP DL385 and
Previous Message Luke Lonergan 2006-08-15 18:50:02 Re: Dell PowerEdge 2950 performance