From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | Luc Vlaming <luc(at)swarm64(dot)com> |
Cc: | Zhihong Yu <zyu(at)yugabyte(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel Inserts in CREATE TABLE AS |
Date: | 2021-01-05 12:57:57 |
Message-ID: | CALj2ACUiPr8pW-c+w639_NSQn-Jy1xZ5u5P7hq02zJ9YTuEg0w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jan 5, 2021 at 1:00 PM Luc Vlaming <luc(at)swarm64(dot)com> wrote:
> Reviewing further v20-0001:
>
> I would still opt for moving the code for the parallel worker into a
> separate function, and then setting rStartup of the dest receiver to
> that function in ExecParallelGetInsReceiver, as its completely
> independent code. Just a matter of style I guess.
If we were to have a intorel_startup_worker and assign it to
self->pub.rStartup, 1) we can do it in the CreateIntoRelDestReceiver,
we have to pass a parameter to CreateIntoRelDestReceiver as an
indication of parallel worker, which requires code changes in places
wherever CreateIntoRelDestReceiver is used. 2) we can also assign
intorel_startup_worker after CreateIntoRelDestReceiver in
ExecParallelGetInsReceiver, but that doesn't look good to me. 3) we
can duplicate CreateIntoRelDestReceiver and have a
CreateIntoRelParallelDestReceiver with the only change being that
self->pub.rStartup = intorel_startup_worker;
IMHO, the way it is currently, looks good. Anyways, I'm open to
changing that if we agree on any of the above 3 ways.
If we were to do any of the above, then we might have to do the same
thing for other commands Refresh Materialized View or Copy To where we
can parallelize.
Thoughts?
> Maybe I'm not completely following why but afaics we want parallel
> inserts in various scenarios, not just CTAS? I'm asking because code like
> + if (fpes->ins_cmd_type == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> + pg_atomic_add_fetch_u64(&fpes->processed,
> queryDesc->estate->es_processed);
> seems very specific to CTAS. For now that seems fine but I suppose that
> would be generalized soon after? Basically I would have expected the if
> to compare against PARALLEL_INSERT_CMD_UNDEF.
After this patch is reviewed and goes for commit, then the next thing
I plan to do is to allow parallel inserts in Refresh Materialized View
and it can be used for that. I think the processed variable can also
be used for parallel inserts in INSERT INTO SELECT [1] as well.
Currently, I'm keeping it for CTAS, maybe later (after this is
committed) it can be generalized.
Thoughts?
> Apart from these small things v20-0001 looks (very) good to me.
> v20-0003 and v20-0004:
> looks good to me.
Thanks.
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Dean Rasheed | 2021-01-05 14:10:08 | Re: PoC/WIP: Extended statistics on expressions |
Previous Message | Dmitry Dolgov | 2021-01-05 12:52:30 | Re: pg_stat_statements and "IN" conditions |