Re: An idea for parallelizing COPY within one backend

From: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Brian Hurt <bhurt(at)janestcapital(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Postgresql-Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: An idea for parallelizing COPY within one backend
Date: 2008-02-27 17:03:34
Message-ID: 47C597E6.5060609@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Dunstan wrote:
> Florian G. Pflug wrote:
>>> Would it be possible to determine when the copy is starting that this
>>> case holds, and not use the parallel parsing idea in those cases?
>>
>> In theory, yes. In pratice, I don't want to be the one who has to
>> answer to an angry user who just suffered a major drop in COPY
>> performance after adding an ENUM column to his table.
>>
> I am yet to be convinced that this is even theoretically a good path to
> follow. Any sufficiently large table could probably be partitioned and
> then we could use the parallelism that is being discussed for pg_restore
> without any modification to the backend at all. Similar tricks could be
> played by an external bulk loader for third party data sources.

That assumes that some specific bulkloader like pg_restore, pgloader
or similar is used to perform the load. Plain libpq-users would either
need to duplicate the logic these loaders contain, or wouldn't be able
to take advantage of fast loads.

Plus, I'd see this as a kind of testbed for gently introducing
parallelism into postgres backends (especially thinking about sorting
here). CPU gain more and more cores, so in the long run I fear that we
will have to find ways to utilize more than one of those to execute a
single query.

But of course the architectural details need to be sorted out before any
credible judgement about the feasability of this idea can be made...

regards, Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-02-27 17:11:32 Re: An idea for parallelizing COPY within one backend
Previous Message Alvaro Herrera 2008-02-27 16:56:24 ResourceOwners for Snapshots? holdable portals