Re: Parallel Index Scans

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Rahila Syed <rahilasyed(dot)90(at)gmail(dot)com>
Subject: Re: Parallel Index Scans
Date: 2017-01-20 14:29:35
Message-ID: CAA4eK1+vdxZQJMSW0LwQYnV3oHmB9BHbXs_EQh5+Nk7BkzO3yg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 20, 2017 at 3:41 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Jan 18, 2017 at 8:03 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>> Why do we need separate AM support for index_parallelrescan()? Why
>>> can't index_rescan() cover that case?
>>
>> The reason is that sometime index_rescan is called when we have to
>> just update runtime scankeys in index and we don't want to reset
>> parallel scan for that.
>
> Why not? I see the code, but the comments don't seem to offer any
> justification for that. And it seems wrong to me. If the scan keys
> change, surely that only makes sense if we restart the scan. You
> can't just blindly continue the same scan if the keys have changed,
> can you?
>

Sure, if scan keys have changed then we can't continue, but this is
the case where runtime keys are first time initialized.

if (node->iss_NumRuntimeKeys != 0 && !node->iss_RuntimeKeysReady)

In the above check, the second part of the check
(!node->iss_RuntimeKeysReady) ensures that it is for the first time.
Now, let me give you an example to explain what bad can happen if we
allow resetting parallel scan in this case. Consider a query like
select * from t1 where c1 < parallel_index(10);, in this if we allow
resetting parallel scan descriptor during first time initialization of
runtime keys, it can easily corrupt the parallel scan state. Suppose
leader has taken the lead and is scanning some page and worker reaches
to initialize its keys in ExecReScanIndexScan(), if worker resets the
parallel scan, then it will corrupt the state of the parallel scan
state.

If you agree with the above explanation, then I will expand the
comments in next update.

Just in case you want to debug the above query, below is the schema
and necessary steps.

create or replace function parallel_index(a integer) returns integer
as $$
begin
return a + 1;
end;
$$ language plpgsql STABLE PARALLEL SAFE;

create table t1(c1 int, c2 char(20));
insert into t1 values(generate_series(1,300000),'aaa');
create index idx_t1 on t1 (c1);

set max_parallel_workers_per_gather=1;
set parallel_setup_cost=0;
set parallel_tuple_cost=0;
set min_parallel_relation_size=0;
set enable_bitmapscan=off;
set enable_seqscan=off;

select * from t1 where c1 < parallel_index(1000);

> I think the reason that this isn't breaking for you is that it's
> difficult or impossible to get a parallel index scan someplace where
> the keys would change at runtime. Normally, the parallel scan is on
> the driving relation, and so there are no runtime keys. We currently
> have no way for a parallel scan to occur on the inner side of a nested
> loop unless there's an intervening Gather node - and in that case the
> keys can't change without relaunching the workers. It's hard to see
> how it could work otherwise. For example, suppose you had something
> like this:
>
> Gather
> -> Nested Loop
> -> Parallel Seq Scan on a
> -> Hash Join
> -> Seq Scan on b
> -> Parallel Shared Hash
> -> Parallel Index Scan on c
> Index Cond: c.x = a.x
>
> Well, the problem here is that there's nothing to ensure that various
> workers who are cooperating to build the hash table all have the same
> value for a.x, nor is there any thing to ensure that they'll all get
> done with the shared hash table at the same time. So this is just
> chaos. I think we have to disallow this kind of plan as nonsensical.
> Given that, I'm not sure a reset of the runtime keys can ever really
> happen. Have you investigated this?
>

Having parallelism on the right side under gather can only be possible
after Parallel hash patch of Robert, so maybe some investigation is
needed when we review that patch. Let me know if you want me to
investigate something more after my explanation above.

> I extracted the generic portions of this infrastructure (i.e. not the
> btree-specific stuff) and spent some time working on it today. The
> big thing I fixed was the documentation, which you added in a fairly
> illogical part of the file.
>

Hmm, it is not illogical. All the functions are described in the same
order as they are declared in IndexAmRoutine structure and I have
followed the same. I think both amestimateparallelscan and
aminitparallelscan should be added one para down which says (The
purpose of an index .. The scan-related functions that an index access
method must or may provide are:).

> You had all of the new functions for
> supporting parallel scans in a section that explicitly says it's for
> mandatory functions, and which precedes the section on regular
> non-parallel scans.
>

I think that section should say "must or may provide" instead of "must
provide" as the functions amcanreturn and amproperty are optional and
are described in that section.

> I moved the documentation for all three new
> methods to the same place, added some explanation of parallel scans in
> general, and rewrote the descriptions for the individual functions to
> be more clear.
>

I think the place where you have added these new functions breaks the
existing order which is to describe them in the order they are
declared in IndexAmRoutine. Apart from that extracted patch looks
good to me.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Joseph Krogh 2017-01-20 14:36:04 Error building docs
Previous Message Peter Eisentraut 2017-01-20 14:08:21 Re: Logical Replication WIP