Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Subject: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Date: 2018-01-13 12:32:42
Message-ID: CAA4eK1KGnxUQsW=jTJWM7xf0pV7w6jLH9ewjcL0k6k0VvzYHJw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 13, 2018 at 1:25 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Fri, Jan 12, 2018 at 6:14 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Fri, Jan 12, 2018 at 8:19 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>> 1.
>>> + if (!IsBootstrapProcessingMode() && !indexInfo->ii_Concurrent)
>>> {
>>> - snapshot = RegisterSnapshot(GetTransactionSnapshot());
>>> - OldestXmin = InvalidTransactionId; /* not used */
>>> + OldestXmin = GetOldestXmin(heapRelation, true);
>>>
>>> I think leader and workers should have the same idea of oldestXmin for
>>> the purpose of deciding the visibility of tuples. I think this is
>>> ensured in all form of parallel query as we do share the snapshot,
>>> however, same doesn't seem to be true for Parallel Index builds.
>>
>> Hmm. Does it break anything if they use different snapshots? In the
>> case of a query that would be disastrous because then you might get
>> inconsistent results, but if the snapshot is only being used to
>> determine what is and is not dead then I'm not sure it makes much
>> difference ... unless the different snapshots will create confusion of
>> some other sort.
>
> I think that this is fine. GetOldestXmin() is only used when we have a
> ShareLock on the heap relation, and the snapshot is SnapshotAny. We're
> only talking about the difference between HEAPTUPLE_DEAD and
> HEAPTUPLE_RECENTLY_DEAD here. Indexing a heap tuple when that wasn't
> strictly necessary by the time you got to it is normal.
>

Yeah, but this would mean that now with parallel create index, it is
possible that some tuples from the transaction would end up in index
and others won't. In general, this makes me slightly nervous mainly
because such a case won't be possible without the parallel option for
create index, but if you and Robert are okay with it as there is no
fundamental problem, then we might as well leave it as it is or maybe
add a comment saying so.

Another point is that the information about broken hot chains
indexInfo->ii_BrokenHotChain is getting lost. I think you need to
coordinate this information among backends that participate in
parallel create index. Test to reproduce the problem is as below:

create table tbrokenchain(c1 int, c2 varchar);
insert into tbrokenchain values(3, 'aaa');

begin;
set force_parallel_mode=on;
update tbrokenchain set c2 = 'bbb' where c1=3;
create index idx_tbrokenchain on tbrokenchain(c1);
commit;

Now, check the value of indcheckxmin in pg_index, it should be true,
but with patch it is false. You can try with patch by not changing
the value of force_parallel_mode;

The patch uses both parallel_leader_participation and
force_parallel_mode, but it seems the definition is different from
what we have in Gather. Basically, even with force_parallel_mode, the
leader is participating in parallel build. I see there is some
discussion above about both these parameters and still, there is not
complete agreement on the best way forward. I think we should have
parallel_leader_participation as that can help in testing if nothing
else.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2018-01-13 13:34:09 Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Previous Message John Naylor 2018-01-13 11:47:29 Re: WIP: a way forward on bootstrap data