Re: PoC: Partial sort

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Andreas Karlsson <andreas(at)proxel(dot)se>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Subject: Re: PoC: Partial sort
Date: 2013-12-28 08:28:04
Message-ID: CAPpHfdvHBXX5CVsbt8a8ao3RomHe+WZ3Ph06_gHSr93pQo1aXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 24, 2013 at 6:02 AM, Andreas Karlsson <andreas(at)proxel(dot)se> wrote:

> On 12/22/2013 04:38 PM, Alexander Korotkov wrote:
>
>> postgres=# explain analyze select * from test order by v1, id limit 10;
>> QUERY
>> PLAN
>> ------------------------------------------------------------
>> ------------------------------------------------------------
>> -----------------------
>> Limit (cost=11441.77..11442.18 rows=10 width=12) (actual
>> time=79.980..79.982 rows=10 loops=1)
>> -> Partial sort (cost=11441.77..53140.44 rows=1000000 width=12)
>> (actual time=79.978..79.978 rows=10 loops=1)
>> Sort Key: v1, id
>> Presorted Key: v1
>> Sort Method: top-N heapsort Memory: 25kB
>> -> Index Scan using test_v1_idx on test (cost=0.42..47038.83
>> rows=1000000 width=12) (actual time=0.031..38.275 rows=100213 loops=1)
>> Total runtime: 81.786 ms
>> (7 rows)
>>
>
> Have you thought about how do you plan to print which sort method and how
> much memory was used? Several different sort methods may have been use in
> the query. Should the largest amount of memory/disk be printed?

Apparently, now amount of memory for sorted last group is printed. Your
proposal makes sense: largest amount of memory/disk should be printed.

> However, work with joins needs more improvements.
>>
>
> That would be really nice to have, but the patch seems useful even without
> the improvements to joins.

Attached revision of patch implements partial sort usage in merge joins.

create table test1 as (
select id,
(random()*100)::int as v1,
(random()*10000)::int as v2
from generate_series(1,1000000) id);

create table test2 as (
select id,
(random()*100)::int as v1,
(random()*10000)::int as v2
from generate_series(1,1000000) id);
create index test1_v1_idx on test1 (v1);
create index test2_v1_idx on test2 (v1);

create index test1_v1_idx on test1 (v1);
create index test2_v1_idx on test2 (v1);

# explain select * from test1 t1 join test2 t2 on t1.v1 = t2.v1 and t1.v2 =
t2.v2;
QUERY PLAN

----------------------------------------------------------------------------------------------------------
Merge Join (cost=2257.67..255273.39 rows=983360 width=24)
Merge Cond: ((t1.v1 = t2.v1) AND (t1.v2 = t2.v2))
-> Partial sort (cost=1128.84..116470.79 rows=1000000 width=12)
Sort Key: t1.v1, t1.v2
Presorted Key: t1.v1
-> Index Scan using test1_v1_idx on test1 t1
(cost=0.42..47604.01 rows=1000000 width=12)
-> Materialize (cost=1128.83..118969.00 rows=1000000 width=12)
-> Partial sort (cost=1128.83..116469.00 rows=1000000 width=12)
Sort Key: t2.v1, t2.v2
Presorted Key: t2.v1
-> Index Scan using test2_v1_idx on test2 t2
(cost=0.42..47602.22 rows=1000000 width=12)

I believe now patch covers desired functionality. I'm going to focus on
nailing down details, refactoring and documenting.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
partial-sort-3.patch application/octet-stream 65.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2013-12-28 09:04:52 Re: PoC: Partial sort
Previous Message MauMau 2013-12-28 07:37:44 Re: [bug fix] connection service file doesn't take effect with ECPG apps