Re: [HACKERS] [PATCH] Incremental sort

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, Antonin Houska <ah(at)cybertec(dot)at>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [PATCH] Incremental sort
Date: 2018-04-05 23:43:18
Message-ID: CAPpHfdte2tr+_W9TmRZ5rrfqhOCteYrBkgKV07QCbNoycV=U5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

On Tue, Apr 3, 2018 at 2:10 PM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

> I think solving this may be fairly straight-forward. Essentially, until
> now we only had one way to do the sort, so it was OK to make the sort
> implicit by checking if the path is sorted
>
> if (input not sorted)
> {
> ... add a Sort node ...
> }
>
> But now we have multiple possible ways to do the sort, with different
> startup/total costs. So the places that create the sorts need to
> actually generate the Sort paths for each sort alternative, and store
> the information in the Sort node (instead of relying on pathkeys).
>
> Ultimately, this should simplify the createplan.c places making all the
> make_sort calls unnecessary (i.e. the input should be already sorted
> when needed). Otherwise it'd mean the decision needs to be done locally,
> but I don't think that should be needed.
>
> But it's surely a fairly invasive change to the patch ...
>

Right, there are situation when incremental sort has lower startup cost,
but higher total cost. In order to find lower cost, we ideally should
generate
paths for both full sort and incremental sort. However, that would increase
number of total pathes, and could slowdown planning time. Another issue
that we don't always generate pathes for sort. And yes, it would be rather
invasive. So, that doesn't look feasible to get into 11.

Intead, I decided to cut usage of incremental sort. Now, incremental sort
is generated only in create_sort_path(). Cheaper path selected between
incremental sort and full sort with taking limit_tuples into account.
That limits usage of incremental sort, however risk of regression by this
patch is also minimal. In fact, incremental sort will be used only when
sort is explicitly specified and simultaneously LIMIT is specified or
dataset to be sorted is large and incremental sort saves disk IO.

Attached patch also incorporates following commits made by Alexander
Kuzmenkov:
* Rename fields of IncrementalSortState to snake_case for the sake of
consistency.
* Rename group test function to isCurrentGroup.
* Comments from Tomas Vondra about nodeIncrementalSort.c
* Add a test for incremental sort.
* Add a separate function to calculate costs of incremental sort.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
incremental-sort-22.patch application/octet-stream 106.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2018-04-06 00:02:11 Re: [HACKERS] path toward faster partition pruning
Previous Message Andrew Gierth 2018-04-05 23:37:42 Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS