Quick Links

Re: Parallel copy

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, vignesh C <vignesh21(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Parallel copy
Date:	2020-04-09 19:29:09
Message-ID:	CA+TgmoZw+F3y+oaxEsHEZBxdL1x1KAJ7pRMNgCqX0WjmjGNLrA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Apr 9, 2020 at 2:55 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> I'm fairly certain that we do *not* want to distribute input data between processes on a single tuple basis. Probably not even below a few hundred kb. If there's any sort of natural clustering in the loaded data - extremely common, think timestamps - splitting on a granular basis will make indexing much more expensive. And have a lot more contention.

That's a fair point. I think the solution ought to be that once any
process starts finding line endings, it continues until it's grabbed
at least a certain amount of data for itself. Then it stops and lets
some other process grab a chunk of data.

Or are you are arguing that there should be only one process that's
allowed to find line endings for the entire duration of the load?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: Parallel copy at 2020-04-09 18:55:47 from Andres Freund

Responses

Re: Parallel copy at 2020-04-09 20:00:39 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2020-04-09 19:29:49	Re: BUG #16345: ts_headline does not find phrase matches correctly
Previous Message	Robert Haas	2020-04-09 19:26:36	Re: Default setting for enable_hashagg_disk