Re: Parallel copy

From: David Fetter <david(at)fetter(dot)org>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel copy
Date: 2020-02-20 16:43:26
Message-ID: 20200220164326.GW24870@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 20, 2020 at 02:36:02PM +0100, Tomas Vondra wrote:
> On Thu, Feb 20, 2020 at 04:11:39PM +0530, Amit Kapila wrote:
> > On Thu, Feb 20, 2020 at 5:12 AM David Fetter <david(at)fetter(dot)org> wrote:
> > >
> > > On Fri, Feb 14, 2020 at 01:41:54PM +0530, Amit Kapila wrote:
> > > > This work is to parallelize the copy command and in particular "Copy
> > > > <table_name> from 'filename' Where <condition>;" command.
> > >
> > > Apropos of the initial parsing issue generally, there's an interesting
> > > approach taken here: https://github.com/robertdavidgraham/wc2
> > >
> >
> > Thanks for sharing. I might be missing something, but I can't figure
> > out how this can help here. Does this in some way help to allow
> > multiple workers to read and tokenize the chunks?
>
> I think the wc2 is showing that maybe instead of parallelizing the
> parsing, we might instead try using a different tokenizer/parser and
> make the implementation more efficient instead of just throwing more
> CPUs on it.

That was what I had in mind.

> I don't know if our code is similar to what wc does, maytbe parsing
> csv is more complicated than what wc does.

CSV parsing differs from wc in that there are more states in the state
machine, but I don't see anything fundamentally different.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alex Malek 2020-02-20 17:01:45 Fwd: bad wal on replica / incorrect resource manager data checksum in record / zfs
Previous Message Bernd Helmle 2020-02-20 16:38:15 Re: [Patch] Make pg_checksums skip foreign tablespace directories