Re: Tuplesort merge pre-reading

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Peter Geoghegan <pg(at)bowt(dot)ie>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)heroku(dot)com>
Subject: Re: Tuplesort merge pre-reading
Date: 2017-04-18 08:05:40
Message-ID: 5db9d579-efdb-857e-82a8-7763313e7931@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04/14/2017 08:19 AM, Peter Geoghegan wrote:
> BTW, I'm skeptical of the idea of Heikki's around killing polyphase
> merge itself at this point. I think that keeping most tapes active per
> pass is useful now that our memory accounting involves handing over an
> even share to each maybe-active tape for every merge pass, something
> established by Heikki's work on external sorting.

The pre-read buffers are only needed for input tapes; the total number
of tapes doesn't matter.

For comparison, imagine that you want to perform a merge, such that you
always merge 7 runs into one. With polyphase merge, you would need 8
tapes, so that you always read from 7 of them, and write onto one. With
balanced merge, you would need 14 tapes: you always read from 7 tapes,
and you would need up to 7 output tapes, of which one would be active at
any given time.

Those extra idle output tapes are practically free in our
implementation. The "pre-read buffers" are only needed for input tapes,
the number of output tapes doesn't matter. Likewise, maintaining the
heap is cheaper if you only merge a small number of tapes at a time, but
that's also dependent on the number of *input* tapes, not the total
number of tapes.

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Michálek 2017-04-18 08:13:57 Re: Other formats in pset like markdown, rst, mediawiki
Previous Message Kang Yuzhe 2017-04-18 07:54:23 Re: On How To Shorten the Steep Learning Curve Towards PG Hacking...