Skip site navigation (1) Skip section navigation (2)

Re: Query performance

From: Matthew Wakeling <matthew(at)flymine(dot)org>
To: S Arvind <arvindwill(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Query performance
Date: 2009-10-12 13:01:55
Message-ID: alpine.DEB.2.00.0910121350260.19472@aragorn.flymine.org (view raw or flat)
Thread:
Lists: pgsql-performance
On Mon, 12 Oct 2009, S Arvind wrote:
> I can understand left join, actually can any one tell me why sort operation is carried
> out and wat Materialize means...
> Can anyone explain me the mentioned plan with reason(s)?

>  Merge Left Join  (cost=62451.86..67379.08 rows=286789 width=0)
>      Merge Cond: (a.id = b.id)
>      ->  Sort  (cost=18610.57..18923.27 rows=125077 width=8)
>          Sort Key: a.id
>          ->  Seq Scan on a  (cost=0.00..6309.77 rows=125077 width=8)
>      ->  Materialize  (cost=43841.28..47426.15 rows=286789 width=8)
>          ->  Sort  (cost=43841.28..44558.26 rows=286789 width=8)
>              Sort Key: b.id
>              ->  Seq Scan on b (cost=0.00..13920.89 rows=286789 width=8)

This is a merge join. A merge join joins together two streams of data, 
where both streams are sorted, by placing the two streams side by side and 
advancing through both streams finding matching rows. The algorithm can 
use a pointer to a position in both of the streams, and advance the 
pointer of the stream that has the earlier value according to the sort 
order, and therefore get all the matches.

You are performing a query over the whole of both of the tables, so the 
cheapest way to obtain a sorted stream of data is to do a full sequential 
scan of the whole table, bring it into memory, and sort it. An alternative 
would be to follow a B-tree index if one was available on the correct 
column, but that is usually more expensive unless the table is clustered 
on the index or only a small portion of the table is to be read. If you 
had put a "LIMIT 10" clause on the end of the query and had such an index, 
it would probably switch to that strategy instead.

The materialise step is effectively a buffer that allows one of the 
streams to be rewound cheaply, which will be necessary if there are 
multiple rows with the same value.

Does that answer your question?

Matthew

-- 
 The only secure computer is one that's unplugged, locked in a safe,
 and buried 20 feet under the ground in a secret location...and i'm not
 even too sure about that one.                         --Dennis Huges, FBI

In response to

Responses

pgsql-performance by date

Next:From: S ArvindDate: 2009-10-12 13:23:22
Subject: Re: Query performance
Previous:From: S ArvindDate: 2009-10-12 12:39:39
Subject: Re: Query performance

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group