Re: An idea for parallelizing COPY within one backend

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
Subject: Re: An idea for parallelizing COPY within one backend
Date: 2008-02-27 10:47:29
Message-ID: 1204109249.4252.477.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 2008-02-27 at 09:09 +0100, Dimitri Fontaine wrote:
> Hi,
>
> Le mercredi 27 février 2008, Florian G. Pflug a écrit :
> > Upon reception of a COPY INTO command, a backend would
> > .) Fork off a "dealer" and N "worker" processes that take over the
> > client connection. The "dealer" distributes lines received from the
> > client to the N workes, while the original backend receives them
> > as tuples back from the workers.
>
> This looks so much like what pgloader does now (version 2.3.0~dev2, release
> candidate) at the client side, when configured for it, that I can't help
> answering the mail :)
> http://pgloader.projects.postgresql.org/dev/pgloader.1.html#_parallel_loading
> section_threads = N
> split_file_reading = False
>
> Of course, the backends still have to parse the input given by pgloader, which
> only pre-processes data. I'm not sure having the client prepare the data some
> more (binary format or whatever) is a wise idea, as you mentionned and wrt
> Tom's follow-up. But maybe I'm all wrong, so I'm all ears!

ISTM the external parallelization approach is more likely to help us
avoid bottlenecks, so I support Dimitri's approach.

We also need error handling which pgloader also has.

Writing error handling and parallelization into COPY isn't going to be
easy, and not very justifiable either if we already have both.

There might be a reason to re-write it in C one day, but that will be
fairly easy task if we ever need to do it.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Huxton 2008-02-27 11:30:57 Full text search - altering the default parser
Previous Message Dimitri Fontaine 2008-02-27 10:19:28 Re: pg_dump additional options for performance