Re: PoC: Partial sort

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, David Rowley <dgrowleyml(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)2ndquadrant(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Subject: Re: PoC: Partial sort
Date: 2014-01-14 15:49:34
Message-ID: CAPpHfdsjKJUDJnwgMTTQek6pZfL_opwoE1DtpgukNVNXSJeOSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On Tue, Jan 14, 2014 at 12:54 AM, Marti Raudsepp <marti(at)juffo(dot)org> wrote:

> First, thanks a lot for working on this feature. This PostgreSQL
> shortcoming crops up in all the time in web applications that implement
> paging by multiple sorted columns.
>

Thanks!

I've been trying it out in a few situations. I implemented a new
> enable_partialsort GUC to make it easier to turn on/off, this way it's a
> lot easier to test. The attached patch applies on top of
> partial-sort-5.patch
>

I though about such option. Generally not because of testing convenience,
but because of overhead of planning. This way you implement it is quite
naive :) For instance, merge join rely on partial sort which will be
replaced with simple sort.

> I will spend more time reviewing the patch, but some of this planner code
> is over my head. If there's any way I can help to make sure this lands in
> the next version, let me know.
>
> ----
>
> The patch performs just as well as I would expect it to:
>
> marti=# select ac.name, r.name from artist_credit ac join release r on (
> ac.id=r.artist_credit) order by ac.name, r.name limit 1000;
> Time: 9.830 ms
> marti=# set enable_partialsort = off;
> marti=# select ac.name, r.name from artist_credit ac join release r on (
> ac.id=r.artist_credit) order by ac.name, r.name limit 1000;
> Time: 1442.815 ms
>
> A difference of almost 150x!
>
> There's a missed opportunity in that the code doesn't consider pushing new
> Sort steps into subplans. For example, if there's no index on
> language(name) then this query cannot take advantage partial sorts:
>
> marti=# explain select l.name, r.name from language l join release r on (
> l.id=r.language) order by l.name, r.name limit 1000;
> Limit (cost=123203.20..123205.70 rows=1000 width=32)
> -> Sort (cost=123203.20..126154.27 rows=1180430 width=32)
> Sort Key: l.name, r.name
> -> Hash Join (cost=229.47..58481.49 rows=1180430 width=32)
> Hash Cond: (r.language = l.id)
> -> Seq Scan on release r (cost=0.00..31040.10
> rows=1232610 width=26)
> -> Hash (cost=131.43..131.43 rows=7843 width=14)
> -> Seq Scan on language l (cost=0.00..131.43
> rows=7843 width=14)
>
> But because there are only so few languages, it would be a lot faster to
> sort languages in advance and then do partial sort:
> Limit (rows=1000 width=31)
> -> Partial sort (rows=1180881 width=31)
> Sort Key: l.name, r.name
> Presorted Key: l.name
> -> Nested Loop (rows=1180881 width=31)
> -> Sort (rows=7843 width=10)
> Sort Key: name
> -> Seq Scan on language (rows=7843 width=14)
> -> Index Scan using release_language_idx on release r
> (rows=11246 width=25)
> Index Cond: (language = l.id)
>
> Even an explicit sorted CTE cannot take advantage of partial sorts:
> marti=# explain with sorted_lang as (select id, name from language order
> by name)
> marti-# select l.name, r.name from sorted_lang l join release r on (l.id=r.language)
> order by l.name, r.name limit 1000;
> Limit (cost=3324368.83..3324371.33 rows=1000 width=240)
> CTE sorted_lang
> -> Sort (cost=638.76..658.37 rows=7843 width=14)
> Sort Key: language.name
> -> Seq Scan on language (cost=0.00..131.43 rows=7843 width=14)
> -> Sort (cost=3323710.46..3439436.82 rows=46290543 width=240)
> Sort Key: l.name, r.name
> -> Merge Join (cost=664.62..785649.92 rows=46290543 width=240)
> Merge Cond: (r.language = l.id)
> -> Index Scan using release_language_idx on release r
> (cost=0.43..87546.06 rows=1232610 width=26)
> -> Sort (cost=664.19..683.80 rows=7843 width=222)
> Sort Key: l.id
> -> CTE Scan on sorted_lang l (cost=0.00..156.86
> rows=7843 width=222)
>
> But even with these limitations, this will easily be the killer feature of
> the next release, for me at least.
>

I see. But I don't think it can be achieved by small changes in planner.
Moreover, I didn't check but I think if you remove ordering by r.name you
will still not get sorting languages in the inner node. So, this problem is
not directly related to partial sort.

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Claudio Freire 2014-01-14 16:04:36 Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Previous Message Trond Myklebust 2014-01-14 15:42:05 Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance