Re: Parallel tuplesort (for parallel B-Tree index creation)

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: Parallel tuplesort (for parallel B-Tree index creation)
Date: 2016-09-06 19:57:32
Message-ID: CAGTBQpa6y=z3E4nDys6NyB9SBirQOZ63wfToG38r-TD752f_PQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 15, 2016 at 9:33 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> The patch is intended to be applied on top of parallel B-Tree patches
> 0001-* and 0002-* [1]. I happened to test it with parallelism, but
> these are all independently useful, and will be entered as a separate
> CF entry (perhaps better to commit the earlier two patches first, to
> avoid merge conflicts). I'm optimistic that we can get those 3 patches
> in the series out of the way early, without blocking on discussing
> parallel sort.

Applied patches 1 and 2, builds fine, regression tests run fine. It
was a prerequisite to reviewing patch 3 (which I'm going to do below),
so I thought I might as well report on that tidbit of info, fwiw.

> The patch makes tuplesort merging shift down and displace the root
> tuple with the tape's next preread tuple, rather than compacting and
> then inserting into the heap anew. This approach to maintaining the
> heap as tuples are returned to caller will always produce fewer
> comparisons overall. The new approach is also simpler. We were already
> shifting down to compact the heap within the misleadingly named [2]
> function tuplesort_heap_siftup() -- why not instead just use the
> caller tuple (the tuple that we currently go on to insert) when
> initially shifting down (not the heap's preexisting last tuple, which
> is guaranteed to go straight to the leaf level again)? That way, we
> don't need to enlarge the heap at all through insertion, shifting up,
> etc. We're done, and are *guaranteed* to have performed less work
> (fewer comparisons and swaps) than with the existing approach (this is
> the reason for my optimism about getting this stuff out of the way
> early).

Patch 3 applies fine to git master as of
25794e841e5b86a0f90fac7f7f851e5d950e51e2 (on top of patches 1 and 2).

Builds fine and without warnings on gcc 4.8.5 AFAICT, regression test
suite runs without issues as well.

Patch lacks any new tests, but the changed code paths seem covered
sufficiently by existing tests. A little bit of fuzzing on the patch
itself, like reverting some key changes, or flipping some key
comparisons, induces test failures as it should, mostly in cluster.

The logic in tuplesort_heap_root_displace seems sound, except:

+ */
+ memtuples[i] = memtuples[imin];
+ i = imin;
+ }
+
+ Assert(state->memtupcount > 1 || imin == 0);
+ memtuples[imin] = *newtup;
+}

Why that assert? Wouldn't it make more sense to Assert(imin < n) ?

In the meanwhile, I'll go and do some perf testing.

Assuming the speedup is realized during testing, LGTM.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2016-09-06 20:08:27 Re: [PATCH] Alter or rename enum value
Previous Message Peter Geoghegan 2016-09-06 19:52:06 Re: Bug in 9.6 tuplesort batch memory growth logic