Re: Parallel tuplesort (for parallel B-Tree index creation)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: Parallel tuplesort (for parallel B-Tree index creation)
Date: 2017-02-16 17:24:20
Message-ID: CAH2-Wz=t+t1ChRowkq1ZU3Af-jMsNAYU-DyUYGbXeyhsJeDR2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 15, 2017 at 6:05 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> Here are some results with your latest patch, using the same test as
> before but this time with SCALE=100 (= 100,000,000 rows).

Cool.

> To reduce the number of combinations I did only unique data and built
> only non-unique indexes with only 'wide' tuples (= key plus a text
> column that holds a 151-character wide string, rather than just the
> key), and also didn't bother with the 1MB memory size as suggested.
> Here are the results up to 4 workers (a results table going up to 8
> workers is attached, since it wouldn't format nicely if I pasted it
> here).

I think that you are still I/O bound in a way that is addressable by
adding more disks. The exception is the text cases, where the patch
does best. (I don't place too much emphasis on that because I know
that in the long term, we'll have abbreviated keys, which will take
some of the sheen off of that.)

> Again, the w = 0 time is seconds, the rest show relative
> speed-up.

I think it's worth pointing out that while there are cases where we
see no benefit from going from 4 to 8 workers, it tends to hardly hurt
at all, or hardly help at all. It's almost irrelevant that the number
of workers used is excessive, at least up until the point when all
cores have their own worker. That's a nice quality for this to have --
the only danger is that we use parallelism when we shouldn't have at
all, because the serial case could manage an internal sort, and the
sort was small enough that that could be a notable factor.

> "textwide" "asc" is nearly an order of magnitude faster than other
> initial orders without parallelism, but then parallelism doesn't seem
> to help it much. Also, using more that 64MB doesn't ever seem to help
> very much; in the "desc" case it hinders.

Maybe it's CPU cache efficiency? There are edge cases where multiple
passes are faster than one pass. That'ks the only explanation I can
think of.

> I was curious to understand how performance changes if we become just
> a bit less correlated (rather than completely uncorrelated or
> perfectly inversely correlated), so I tried out a 'banana skin' case:
> I took the contents of the textwide asc table and copied it to a new
> table, and then moved the 900 words matching 'banana%' to the physical
> end of the heap by deleting and reinserting them in one transaction.

A likely problem with that is that most runs will actually not have
their own banana skin, so to speak. You only see a big drop in
performance when every quicksort operation has presorted input, but
with one or more out-of-order tuples at the end. In order to see a
really unfortunate case with parallel CREATE INDEX, you'd probably
have to have enough memory that workers don't need to do their own
merge (and so worker's work almost entirely consists of one big
quicksort operation), with enough "banana skin heap pages" that the
parallel heap scan is pretty much guaranteed to end up giving "banana
skin" (out of order) tuples to every worker, making all of them "have
a slip" (throw away a huge amount of work as the presorted
optimization is defeated right at the end of its sequential read
through).

A better approach would be to have several small localized areas
across input where input tuples are a little out of order. That would
probably show that the performance is pretty in line with random
cases.

> It's hard to speculate about this, but I guess that a significant
> number of indexes in real world databases might be uncorrelated to
> insert order.

That would certainly be true with text, where we see a risk of (small)
regressions.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-02-16 17:30:42 Re: Parallel tuplesort (for parallel B-Tree index creation)
Previous Message Konstantin Knizhnik 2017-02-16 17:00:31 Re: VOPS: vectorized executor for Postgres: how to speedup OLAP queries more than 10 times without changing anything in Postgres executor