Re: Parallel tuplesort, partitioning, merging, and the future

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Subject: Re: Parallel tuplesort, partitioning, merging, and the future
Date: 2016-08-10 21:13:59
Message-ID: CAM3SWZQ0F7skDggoUG0KMGLb6G9yBC=0x0PP3t01yy4G+JqN6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 10, 2016 at 12:08 PM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> I think it's a great design, but for that, per-worker final tapes have
> to always be random-access.

Thanks. I don't think I need to live with the randomAccess
restriction, because I can be clever about reading only the first
tuple on each logtape.c block initially. Much later, when the binary
search gets down to seeking within a single block, everything in the
block can be read at once into memory, and we can take the binary
search to that other representation. This latter part only needs to
happen once or twice per partition boundary per worker.

> I'm not hugely familiar with the code, but IIUC there's some penalty
> to making them random-access right?

Yeah, there is. For one thing, you have to store the length of the
tuple twice, to support incremental seeking in both directions. For
another, you cannot perform the final merge on-the-fly; you must
produce a serialized tape as output, which is used subsequently to
support random seeks. There is no penalty when you manage to do the
sort in memory, though (not that that has anything to do with parallel
sort).

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-08-10 21:18:22 Re: Set log_line_prefix and application name in test drivers
Previous Message Tom Lane 2016-08-10 21:04:36 Re: new pgindent run before branch?