Quick Links

Re: Parallel tuplesort (for parallel B-Tree index creation)

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Parallel tuplesort (for parallel B-Tree index creation)
Date:	2016-10-21 10:57:55
Message-ID:	CAA4eK1JJBuzWH_vh8hZk4pGXOYNoiKNGgxhCs+n-xN6FOC3yAg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Oct 21, 2016 at 4:25 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Tue, Oct 18, 2016 at 3:48 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> On Mon, Oct 17, 2016 at 5:36 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>> I read the following paragraph from the Volcano paper just now:
>>
>> """
>> During implementation and benchmarking of parallel sorting, we added
>> two more features to exchange. First, we wanted to implement a merge
>> network in which some processors produce sorted streams merge
>> concurrently by other processors. Volcano’s sort iterator can be used
>> to generate a sorted stream. A merge iterator was easily derived from
>> the sort module. It uses a single level merge, instead of the cascaded
>> merge of runs used in sort. The input of a merge iterator is an
>> exchange. Differently from other operators, the merge iterator
>> requires to distinguish the input records by their producer. As an
>> example, for a join operation it does not matter where the input
>> records were created, and all inputs can be accumulated in a single
>> input stream. For a merge operation, it is crucial to distinguish the
>> input records by their producer in order to merge multiple sorted
>> streams correctly.
>> """
>>
>> I don't really understand this paragraph, but thought I'd ask: why the
>> need to "distinguish the input records by their producer in order to
>> merge multiple sorted streams correctly"? Isn't that talking about
>> partitioning, where each workers *ownership* of a range matters?
>>
>
> I think so, but it seems from above text that is mainly required for
> merge iterator which probably will be used in merge join.
>
>> My
>> patch doesn't care which values belong to which workers. And, it
>> focuses quite a lot on dealing well with the memory bandwidth bound,
>> I/O bound part of the sort where we write out the index itself, just
>> by piggy-backing on tuplesort.c. I don't think that that's useful for
>> a general-purpose executor node -- tuple-at-a-time processing when
>> fetching from workers would kill performance.
>>
>
> Right, but what is written in text quoted by you seems to be do-able
> with tuple-at-a-time processing.
>

To be clear, by saying above, I don't mean that we should try that
approach instead of what you are proposing, but it is worth some
discussion to see if that has any significant merits.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Re: Parallel tuplesort (for parallel B-Tree index creation) at 2016-10-21 10:55:04 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Kapila	2016-10-21 11:29:15	Re: Speed up Clog Access by increasing CLOG buffers
Previous Message	Pavel Stehule	2016-10-21 10:57:15	Re: [RFC] Transaction management overhaul is necessary?