From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | vignesh C <vignesh21(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Parallel copy |
Date: | 2020-11-03 12:35:32 |
Message-ID: | 60fb6859-60f4-4a5b-1866-73c3cd2f0d16@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 03/11/2020 10:59, Amit Kapila wrote:
> On Mon, Nov 2, 2020 at 12:40 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> However, the point of parallel copy is to maximize bandwidth.
>
> Okay, but this first-phase (finding the line boundaries) can anyway
> be not done in parallel and we have seen in some of the initial
> benchmarking that this initial phase is a small part of work
> especially when the table has indexes, constraints, etc. So, I think
> it won't matter much if this splitting is done in a single process
> or multiple processes.
Right, it won't matter performance-wise. That's not my point. The
difference is in the complexity. If you don't store the line boundaries
in shared memory, you get away with much simpler shared memory structures.
> I think something close to that is discussed as you have noticed in
> your next email but IIRC, because many people (Andres, Ants, myself
> and author) favoured the current approach (single reader and multiple
> consumers) we decided to go with that. I feel this patch is very much
> in the POC stage due to which the code doesn't look good and as we
> move forward we need to see what is the better way to improve it,
> maybe one of the ways is to split it as you are suggesting so that it
> can be easier to review.
Sure. I think the roadmap here is:
1. Split copy.c [1]. Not strictly necessary, but I think it'd make this
nice to review and work with.
2. Refactor CopyReadLine(), so that finding the line-endings and the
rest of the line-parsing are separated into separate functions.
3. Implement parallel copy.
> I think the other important thing which this
> patch has not addressed properly is the parallel-safety checks as
> pointed by me earlier. There are two things to solve there (a) the
> lower-level code (like heap_* APIs, CommandCounterIncrement, xact.c
> APIs, etc.) have checks which doesn't allow any writes, we need to see
> which of those we can open now (or do some additional work to prevent
> from those checks) after some of the work done for parallel-writes in
> PG-13[1][2], and (b) in which all cases we can parallel-writes
> (parallel copy) is allowed, for example need to identify whether table
> or one of its partitions has any constraint/expression which is
> parallel-unsafe.
Agreed, that needs to be solved. I haven't given it any thought myself.
- Heikki
[1]
https://www.postgresql.org/message-id/8e15b560-f387-7acc-ac90-763986617bfb%40iki.fi
From | Date | Subject | |
---|---|---|---|
Next Message | Dilip Kumar | 2020-11-03 12:36:55 | Re: Re: parallel distinct union and aggregate support patch |
Previous Message | Heikki Linnakangas | 2020-11-03 12:05:58 | Re: ModifyTable overheads in generic plans |