Re: PoC: Partial sort

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Andreas Karlsson <andreas(at)proxel(dot)se>, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Subject: Re: PoC: Partial sort
Date: 2013-12-22 15:38:05
Message-ID: CAPpHfdtqfqm+WR7cQWK-f0+qD_F4VNuvPuyvjmR7xCxQODAXrw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

Next revision. It expected to do better work with optimizer. It introduces
presorted_keys argument of cost_sort function which represent number of
keys already sorted in Path. Then this function uses estimate_num_groups to
estimate number of groups with different values of presorted keys and
assumes that dataset is uniformly divided by
groups. get_cheapest_fractional_path_for_pathkeys tries to select the path
matching most part of path keys.
You can see it's working pretty good on single table queries.

create table test as (select id, (random()*5)::int as v1,
(random()*1000)::int as v2 from generate_series(1,1000000) id);
create index test_v1_idx on test (v1);
create index test_v1_v2_idx on test (v1, v2);
create index test_v2_idx on test (v2);
vacuum analyze;

postgres=# explain analyze select * from test order by v1, id;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------
Sort (cost=149244.84..151744.84 rows=1000000 width=12) (actual
time=2111.476..2586.493 rows=1000000 loops=1)
Sort Key: v1, id
Sort Method: external merge Disk: 21512kB
-> Seq Scan on test (cost=0.00..15406.00 rows=1000000 width=12)
(actual time=0.012..113.815 rows=1000000 loops=1)
Total runtime: 2683.011 ms
(5 rows)

postgres=# explain analyze select * from test order by v1, id limit 10;
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=11441.77..11442.18 rows=10 width=12) (actual
time=79.980..79.982 rows=10 loops=1)
-> Partial sort (cost=11441.77..53140.44 rows=1000000 width=12)
(actual time=79.978..79.978 rows=10 loops=1)
Sort Key: v1, id
Presorted Key: v1
Sort Method: top-N heapsort Memory: 25kB
-> Index Scan using test_v1_idx on test (cost=0.42..47038.83
rows=1000000 width=12) (actual time=0.031..38.275 rows=100213 loops=1)
Total runtime: 81.786 ms
(7 rows)

postgres=# explain analyze select * from test order by v1, v2 limit 10;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.42..0.90 rows=10 width=12) (actual time=0.031..0.047
rows=10 loops=1)
-> Index Scan using test_v1_v2_idx on test (cost=0.42..47286.28
rows=1000000 width=12) (actual time=0.029..0.043 rows=10 loops=1)
Total runtime: 0.083 ms
(3 rows)

postgres=# explain analyze select * from test order by v2, id;
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
Partial sort (cost=97.75..99925.50 rows=1000000 width=12) (actual
time=1.069..1299.481 rows=1000000 loops=1)
Sort Key: v2, id
Presorted Key: v2
Sort Method: quicksort Memory: 52kB
-> Index Scan using test_v2_idx on test (cost=0.42..47603.79
rows=1000000 width=12) (actual time=0.030..812.083 rows=1000000 loops=1)
Total runtime: 1393.850 ms
(6 rows)

However, work with joins needs more improvements.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
partial-sort-2.patch application/octet-stream 50.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2013-12-22 16:12:56 Re: PoC: Partial sort
Previous Message Amit Kapila 2013-12-22 15:02:58 Re: ALTER SYSTEM SET command to change postgresql.conf parameters