Re: Outer Join performance in PostgreSQL

From: Michael Alan Dorman <mdorman(at)tendentious(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Outer Join performance in PostgreSQL
Date: 2005-11-09 20:45:53
Message-ID: 87d5l9r1mm.fsf@hero.mallet-assembly.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Ashok Agrawal <Ashok(dot)Agrawal(at)Sun(dot)COM> writes:
> I noticed outer join is very very slow in postgresql as compared
> to Oracle.

I think the three things the people best able to help you are going to
ask for are 1) what version of PostgreSQL, 2) what are the tables, and
how many rows in each, and 3) output from 'explain analyze' rather
than just 'explain'.

That said, I'm willing to take an amateurish stab at it even without
that.

In fact, I don't think the outer joins are the issue at all. I see
that you're forcing a right join from ecms_certificate_types to
ecms_cert_headers. This seems to be causing postgresql to think it
must (unnecessarily) consider three quarters of a billion rows, which,
if I'm reading right, seems to be producing the majority of the
estimated cost:

> Hash Join (cost=1666049.74..18486619.37 rows=157735046 width=874)
> Hash Cond: ("outer".certificate_type_id = "inner".certificate_type_id)
> -> Merge Right Join (cost=1666048.13..11324159.05 rows=643816513 width=826)

In fact, looking at the fact that you're doing a COALESCE on a column
from b, it seems to me that doing a right join from ecms_cert_headers
to ecms_certificate_types is just wrong. It seems to me that that
should be a left join as well.

With that in mind, I would rewrite the whole FROM clause as:

FROM ecms_cert_headers a
LEFT OUTER JOIN ecms_certificate_types b
ON (a.certificate_type_id = b.certificate_type_id)
LEFT OUTER JOIN taxpack_user c
ON (a.created_by = c.emp_no)
LEFT OUTER JOIN taxpack_user d
ON (a.updated_by = d.emp_no)
LEFT OUTER JOIN taxpack_user e
ON (a.approved_by = e.emp_no)
WHERE a.dln_code = '17319'

It seems to me that this more reflects the intent of the data that is
being retrieved. I would also expect it to be a boatload faster.

Assuming I've understood the intent correctly, I would guess that the
difference is the result of the Oracle planner being able to eliminate
the right join or something.

Mike

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Ron Peacetree 2005-11-09 21:24:49 Re: Some help on buffers and other performance tricks
Previous Message Eric Lauzon 2005-11-09 20:43:18 (View and SQL) VS plpgsql