Re: Parallel tuplesort (for parallel B-Tree index creation)

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: Parallel tuplesort (for parallel B-Tree index creation)
Date: 2016-12-21 18:21:16
Message-ID: CAM3SWZRckVHLH2xCZVxDvzy7s0g3Vh2tax1_oxRn-cR=NVXvVA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 21, 2016 at 6:00 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> 3. Just live with the waste of space.

I am loathe to create a special case for the parallel interface too,
but I think it's possible that *no* caller will ever actually need to
live with this restriction at any time in the future. I am strongly
convinced that adopting tuplesort.c for parallelism should involve
partitioning [1]. With that approach, even randomAccess callers will
not want to read at random for one big materialized tape, since that's
at odds with the whole point of partitioning, which is to remove any
dependencies between workers quickly and early, so that as much work
as possible is pushed down into workers. If a merge join were
performed in a world where we have this kind of partitioning, we
definitely wouldn't require one big materialized tape that is
accessible within each worker.

What are the chances of any real user actually having to live with the
waste of space at some point in the future?

> Another tangentially-related problem I just realized is that we need
> to somehow handle the issues that tqueue.c does when transferring
> tuples between backends -- most of the time there's no problem, but if
> anonymous record types are involved then tuples require "remapping".
> It's probably harder to provoke a failure in the tuplesort case than
> with parallel query per se, but it's probably not impossible.

Thanks for pointing that out. I'll look into it.

BTW, I discovered a bug where there is very low memory available
within each worker -- tuplesort.c throws an error within workers
immediately. It's just a matter of making sure that they at least have
64KB of workMem, which is a pretty straightforward fix. Obviously it
makes no sense to use so little memory in the first place; this is a
corner case.

[1] https://www.postgresql.org/message-id/CAM3SWZR+ATYAzyMT+hm-Bo=1L1smtJbNDtibwBTKtYqS0dYZVg@mail.gmail.com
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2016-12-21 19:43:52 Re: pg_dump vs. TRANSFORMs
Previous Message Tom Lane 2016-12-21 18:08:53 Re: Getting rid of "unknown error" in dblink and postgres_fdw