|From:||Andres Freund <andres(at)anarazel(dot)de>|
|To:||Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>|
|Cc:||PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>|
|Subject:||Re: Parallel Inserts in CREATE TABLE AS|
|Views:||Raw Message | Whole Thread | Download mbox | Resend email|
On 2020-09-23 17:20:20 +0530, Bharath Rupireddy wrote:
> The idea of this patch is to allow the leader and each worker insert the
> tuples in parallel if the SELECT part of the CTAS is parallelizable.
> The design:
I think it'd be good if you could explain a bit more why you think this
safe to do in the way you have done it.
E.g. from a quick scroll through the patch, there's not even a comment
explaining that the only reason there doesn't need to be code dealing
with xid assignment because we already did the catalog changes to create
the table. But how does that work for SELECT INTO? Are you prohibiting
> Pass the into clause, object id, command id from the leader to
> workers, so that each worker can create its own CTAS dest
> receiver. Leader inserts it's share of tuples if instructed to do, and
> so are workers. Each worker writes atomically it's number of inserted
> tuples into a shared memory variable, the leader combines this with
> it's own number of inserted tuples and shares to the client.
> Below things are still pending. Thoughts are most welcome:
> 1. How better we can lift the "cannot insert tuples in a parallel worker"
> from heap_prepare_insert() for only CTAS cases or for that matter parallel
> copy? How about having a variable in any of the worker global contexts and
> use that? Of course, we can remove this restriction entirely in case we
> fully allow parallelism for INSERT INTO SELECT, CTAS, and COPY.
I have mentioned before that I think it'd be good if we changed the
insert APIs to have a more 'scan' like structure. I am thinking of
table_tuple_insert(TableInsertScan *is, other, args);
table_multi_insert(TableInsertScan *is, other, args);
that'd then replace the BulkInsertStateData logic we have right now. But
more importantly it'd allow an AM to optimize operations across multiple
inserts, which is important for column stores.
And for the purpose of your question, we could then have a
or an additional arg to table_begin_insert().
> 3. Need to restrict parallel inserts, if CTAS tries to create temp/global
> tables as the workers will not have access to those tables. Need to analyze
> whether to allow parallelism if CTAS has prepared statements or with no
In which case does CTAS not create a table? You definitely need to
ensure that the table is created before your workers are started, and
there needs to be in a different CommandId.
|Next Message||Michael Paquier||2020-09-24 02:53:14||scram-sha-256 broken with FIPS and OpenSSL 1.0.2|
|Previous Message||Amit Kapila||2020-09-24 02:38:18||Re: Parallel INSERT (INTO ... SELECT ...)|