RE: [POC] Fast COPY FROM command for the table with foreign partitions

From: "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
To: "'Andrey V(dot) Lepikhov'" <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: "tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>, Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>, Michael Paquier <michael(at)paquier(dot)xyz>, Ashutosh Bapat <ashutosh(dot)bapat(at)2ndquadrant(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>, PostgreSQL-Dev <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "houzj(dot)fnst(at)cn(dot)fujitsu(dot)com" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>
Subject: RE: [POC] Fast COPY FROM command for the table with foreign partitions
Date: 2021-02-09 07:47:05
Message-ID: TYAPR01MB2990D01994E41A47D69F3F0EFE8E9@TYAPR01MB2990.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Andrey V. Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
> On 2/9/21 9:35 AM, tsunakawa(dot)takay(at)fujitsu(dot)com wrote:
> > * Why is a separate FDW connection established for each COPY? To avoid
> using the same FDW connection for multiple foreign table partitions in a single
> COPY run?
> With separate connection you can init a 'COPY FROM' session for each
> foreign partition just one time on partition initialization.
> >
> > * In what kind of test did you get 2-4x performance gain? COPY into many
> foreign table partitions where the input rows are ordered randomly enough that
> many rows don't accumulate in the COPY buffer?
> I used 'INSERT INTO .. SELECT * FROM generate_series(1, N)' to generate
> test data and HASH partitioning to avoid skews.

I guess you used many hash partitions. Sadly, The current COPY implementation only accumulates either 1,000 rows or 64 KB of input data (very small!) before flushing all CopyMultiInsertBuffers. One CopyMultiInsertBuffer corresponds to one partition. Flushing a CopyMultiInsertBuffer calls ExecForeignCopy() once, which connects to a remote database, runs COPY FROM STDIN, and disconnects. Here, the flushing trigger (1,000 rows or 64 KB input data, whichever comes first) is so small that if there are many target partitions, the amount of data for each partition is small.

Looking at the triggering threshold values, the description (of MAX_BUFFERED_TUPLES at least) seems to indicate that they take effect per CopyMultiInsertBuffer:

/*
* No more than this many tuples per CopyMultiInsertBuffer
*
* Caution: Don't make this too big, as we could end up with this many
* CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's
* multiInsertBuffers list. Increasing this can cause quadratic growth in
* memory requirements during copies into partitioned tables with a large
* number of partitions.
*/
#define MAX_BUFFERED_TUPLES 1000

/*
* Flush buffers if there are >= this many bytes, as counted by the input
* size, of tuples stored.
*/
#define MAX_BUFFERED_BYTES 65535

But these threshold take effect across all CopyMultiInsertBuffers:

/*
* Returns true if the buffers are full
*/
static inline bool
CopyMultiInsertInfoIsFull(CopyMultiInsertInfo *miinfo)
{
if (miinfo->bufferedTuples >= MAX_BUFFERED_TUPLES ||
miinfo->bufferedBytes >= MAX_BUFFERED_BYTES)
return true;
return false;
}

So, I think the direction to take is to allow more data to accumulate before flushing. I'm not very excited about the way 0003 and 0004 establishes a new connection for each partition; it adds flags to many places, and postgresfdw_xact_callback() has to be aware of COPY-specific processing. Plus, we have to take care of the message difference you found in the regression test.

Why don't we focus on committing the basic part and addressing the extended part (0003 and 0004) separately later? As Tang-san and you showed, the basic part already demonstrated impressive improvement. If there's no objection, I'd like to make this ready for committer in a few days.

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2021-02-09 08:07:34 Re: Single transaction in the tablesync worker?
Previous Message kuroda.hayato@fujitsu.com 2021-02-09 07:38:17 RE: parse mistake in ecpg connect string