Re: [PATCH] Incremental sort (was: PoC: Partial sort)

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: James Coleman <jtc331(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Rafia Sabih <rafia(dot)pghackers(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Shaun Thomas <shaun(dot)thomas(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Date: 2020-03-12 23:40:15
Message-ID: 20200312234014.GI29065@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for working on this. I have some minor comments.

In 0005:

+ /* Restore the input path (we might have addes Sort on top). */

=> added? There's at least two more of the same typo.

+ /* also ignore already sorted paths */

=> You say that in a couple places, but I don't think "also" makes sense since
there's nothing preceding it ?

In 0004:

+ * end up resorting the entire data set. So, unless we can push

=> re-sorting

+ * Unlike generate_gather_paths, this does not look just as pathkeys of the

=> look just AT ?

+ /* now we know is_sorted == false */

=> I would just spell that "Assert", as I think you already do elsewhere.

+ /* continue */

=> Please consider saying "fall through", since "continue" means exactly the
opposite.

+generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows)
...
+ /* finally, consider incremental sort */
...
+ /* Also consider incremental sort. */

=> I think it's more confusing than useful with two comments - one is adequate.

In 0002:

+ * If it's EXPLAIN ANALYZE, show tuplesort stats for a incremental sort node
...
+ * make_incrementalsort --- basic routine to build a IncrementalSort plan node

=> AN incremental

+ * Initial size of memtuples array. We're trying to select this size so that
+ * array don't exceed ALLOCSET_SEPARATE_THRESHOLD and overhead of allocation
+ * be possible less. However, we don't cosider array sizes less than 1024

Four typos (?)
that array DOESN'T
and THE overhead
CONSIDER
I'm not sure, but "be possible less" should maybe say "possibly be less" ?

+ bool maxSpaceOnDisk; /* true when maxSpace is value for on-disk

I suggest to call it IsMaxSpaceDisk

+ MemoryContext maincontext; /* memory context for tuple sort metadata
+ that persist across multiple batches */

persists

+ * a new sort. It allows evade recreation of tuple sort (and save resources)
+ * when sorting multiple small batches.

allows to avoid? Or allows avoiding?

+ * When performing sorting by multiple keys input dataset could be already
+ * presorted by some prefix of these keys. We call them "presorted keys".

"already presorted" sounds redundant

+ int64 fullsort_group_count; /* number of groups with equal presorted keys */
+ int64 prefixsort_group_count; /* number of groups with equal presorted keys */

I guess these should have different comments

--
Justin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-03-13 00:18:37 Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line
Previous Message Justin Pryzby 2020-03-12 23:11:16 Re: Additional size of hash table is alway zero for hash aggregates