Re: Parallel copy

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-02-20 13:36:02
Message-ID: 20200220133602.4lb5mlswfblitsx5@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 20, 2020 at 04:11:39PM +0530, Amit Kapila wrote:
>On Thu, Feb 20, 2020 at 5:12 AM David Fetter <david(at)fetter(dot)org> wrote:
>>
>> On Fri, Feb 14, 2020 at 01:41:54PM +0530, Amit Kapila wrote:
>> > This work is to parallelize the copy command and in particular "Copy
>> > <table_name> from 'filename' Where <condition>;" command.
>>
>> Apropos of the initial parsing issue generally, there's an interesting
>> approach taken here: https://github.com/robertdavidgraham/wc2
>>
>
>Thanks for sharing. I might be missing something, but I can't figure
>out how this can help here. Does this in some way help to allow
>multiple workers to read and tokenize the chunks?
>

I think the wc2 is showing that maybe instead of parallelizing the
parsing, we might instead try using a different tokenizer/parser and
make the implementation more efficient instead of just throwing more
CPUs on it.

I don't know if our code is similar to what wc does, maytbe parsing
csv is more complicated than what wc does.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Steele 2020-02-20 13:37:15 Re: [Patch] Make pg_checksums skip foreign tablespace directories
Previous Message Amit Langote 2020-02-20 12:38:26 Re: Minor improvement to partition_bounds_copy()