Re: Query performance

From: Matthew Wakeling <matthew(at)flymine(dot)org>
To: S Arvind <arvindwill(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query performance
Date: 2009-10-12 13:01:55
Message-ID: alpine.DEB.2.00.0910121350260.19472@aragorn.flymine.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, 12 Oct 2009, S Arvind wrote:
> I can understand left join, actually can any one tell me why sort operation is carried
> out and wat Materialize means...
> Can anyone explain me the mentioned plan with reason(s)?

> Merge Left Join  (cost=62451.86..67379.08 rows=286789 width=0)
>   Merge Cond: (a.id = b.id)
>   ->  Sort  (cost=18610.57..18923.27 rows=125077 width=8)
>         Sort Key: a.id
>         ->  Seq Scan on a  (cost=0.00..6309.77 rows=125077 width=8)
>   ->  Materialize  (cost=43841.28..47426.15 rows=286789 width=8)
>         ->  Sort  (cost=43841.28..44558.26 rows=286789 width=8)
>             Sort Key: b.id
>            ->  Seq Scan on b (cost=0.00..13920.89 rows=286789 width=8)

This is a merge join. A merge join joins together two streams of data,
where both streams are sorted, by placing the two streams side by side and
advancing through both streams finding matching rows. The algorithm can
use a pointer to a position in both of the streams, and advance the
pointer of the stream that has the earlier value according to the sort
order, and therefore get all the matches.

You are performing a query over the whole of both of the tables, so the
cheapest way to obtain a sorted stream of data is to do a full sequential
scan of the whole table, bring it into memory, and sort it. An alternative
would be to follow a B-tree index if one was available on the correct
column, but that is usually more expensive unless the table is clustered
on the index or only a small portion of the table is to be read. If you
had put a "LIMIT 10" clause on the end of the query and had such an index,
it would probably switch to that strategy instead.

The materialise step is effectively a buffer that allows one of the
streams to be rewound cheaply, which will be necessary if there are
multiple rows with the same value.

Does that answer your question?

Matthew

--
The only secure computer is one that's unplugged, locked in a safe,
and buried 20 feet under the ground in a secret location...and i'm not
even too sure about that one. --Dennis Huges, FBI

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message S Arvind 2009-10-12 13:23:22 Re: Query performance
Previous Message S Arvind 2009-10-12 12:39:39 Re: Query performance