Re: Parallel tuplesort (for parallel B-Tree index creation)

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Claudio Freire <klaussfreire(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: Parallel tuplesort (for parallel B-Tree index creation)
Date: 2016-09-08 17:13:21
Message-ID: CAM3SWZTQHvmPTL-mZ-2gOfPQu3f=MLT0c4HXQPC5=+ESS42hjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 8, 2016 at 8:53 AM, Claudio Freire <klaussfreire(at)gmail(dot)com> wrote:
> setup:
>
> create table lotsofitext(i text, j text, w text, z integer, z2 bigint);
> insert into lotsofitext select cast(random() * 1000000000.0 as text)
> || 'blablablawiiiiblabla', cast(random() * 1000000000.0 as text) ||
> 'blablablawjjjblabla', cast(random() * 1000000000.0 as text) ||
> 'blablabl
> awwwabla', random() * 1000000000.0, random() * 1000000000000.0 from
> generate_series(1, 10000000);
>
> timed:
>
> select count(*) FROM (select * from lotsofitext order by i, j, w, z, z2) t;
>
> Unpatched Time: 100351.251 ms
> Patched Time: 75180.787 ms
>
> That's like a 25% speedup on random input. As we say over here, rather
> badly translated, not a turkey's boogers (meaning "nice!")

Cool! What work_mem setting were you using here?

>> More than using "n" or "memtupcount" what I'm saying is to assert that
>> memtuples[imin] is inside the heap, which would catch the same errors
>> the original assert would, and more.
>>
>> Assert(imin < state->memtupcount)
>>
>> If you prefer.
>>
>> The original asserts allows any value of imin for memtupcount>1, and
>> that's my main concern. It shouldn't.
>
> So, for the assertions to properly avoid clobbering/reading out of
> bounds memory, you need both the above assert:
>
> + */
> + memtuples[i] = memtuples[imin];
> + i = imin;
> + }
> +
>>+ Assert(imin < state->memtupcount);
> + memtuples[imin] = *newtup;
> +}
>
> And another one at the beginning, asserting:
>
> + SortTuple *memtuples = state->memtuples;
> + int n,
> + imin,
> + i;
> +
>>+ Assert(state->memtupcount > 0 && memtuples[0].tupindex == newtup->tupindex);
> +
> + CHECK_FOR_INTERRUPTS();
>
> It's worth making that change, IMHO, unless I'm missing something.

You're supposed to just not call it with an empty heap, so the
assertions trust that much. I'll look into that.

Currently, producing a new revision of this entire patchset. Improving
the cost model (used when the parallel_workers storage parameter is
not specified within CREATE INDEX) is taking a bit of time, but hope
to have it out in the next couple of days.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-09-08 17:17:05 Re: pg_dump with tables created in schemas created by extensions
Previous Message Ashutosh Bapat 2016-09-08 17:11:01 Re: Aggregate Push Down - Performing aggregation on foreign server