Re: Query performance

From: Matthew Wakeling S Arvind pgsql-performance(at)postgresql(dot)org Re: Query performance 2009-10-12 13:01:55 alpine.DEB.2.00.0910121350260.19472@aragorn.flymine.org (view raw, whole thread or download thread mbox) 2009-10-12 11:21:27 from S Arvind  2009-10-12 12:30:12 from Grzegorz Jaśkiewicz   2009-10-12 12:39:39 from S Arvind    2009-10-12 13:01:55 from Matthew Wakeling     2009-10-12 13:23:22 from S Arvind      2009-10-12 13:29:13 from Grzegorz Jaśkiewicz       2009-10-12 13:36:59 from Matthew Wakeling        2009-10-12 13:40:14 from Grzegorz Jaśkiewicz        2009-10-12 14:45:47 from S Arvind       2009-10-12 14:52:53 from S Arvind        2009-10-12 15:10:51 from Grzegorz Jaśkiewicz pgsql-performance
```On Mon, 12 Oct 2009, S Arvind wrote:
> I can understand left join, actually can any one tell me why sort operation is carried
> out and wat Materialize means...
> Can anyone explain me the mentioned plan with reason(s)?

>  Merge Left Join  (cost=62451.86..67379.08 rows=286789 width=0)
>      Merge Cond: (a.id = b.id)
>      ->  Sort  (cost=18610.57..18923.27 rows=125077 width=8)
>          Sort Key: a.id
>          ->  Seq Scan on a  (cost=0.00..6309.77 rows=125077 width=8)
>      ->  Materialize  (cost=43841.28..47426.15 rows=286789 width=8)
>          ->  Sort  (cost=43841.28..44558.26 rows=286789 width=8)
>              Sort Key: b.id
>              ->  Seq Scan on b (cost=0.00..13920.89 rows=286789 width=8)

This is a merge join. A merge join joins together two streams of data,
where both streams are sorted, by placing the two streams side by side and
advancing through both streams finding matching rows. The algorithm can
use a pointer to a position in both of the streams, and advance the
pointer of the stream that has the earlier value according to the sort
order, and therefore get all the matches.

You are performing a query over the whole of both of the tables, so the
cheapest way to obtain a sorted stream of data is to do a full sequential
scan of the whole table, bring it into memory, and sort it. An alternative
would be to follow a B-tree index if one was available on the correct
column, but that is usually more expensive unless the table is clustered
on the index or only a small portion of the table is to be read. If you
had put a "LIMIT 10" clause on the end of the query and had such an index,
it would probably switch to that strategy instead.

The materialise step is effectively a buffer that allows one of the
streams to be rewound cheaply, which will be necessary if there are
multiple rows with the same value.

Matthew

--
The only secure computer is one that's unplugged, locked in a safe,
and buried 20 feet under the ground in a secret location...and i'm not
even too sure about that one.                         --Dennis Huges, FBI
```

pgsql-performance by date

 Next: From: S Arvind Date: 2009-10-12 13:23:22 Subject: Re: Query performance Previous: From: S Arvind Date: 2009-10-12 12:39:39 Subject: Re: Query performance