Quick Links

Re: Parallel copy

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Parallel copy
Date:	2020-10-20 09:55:14
Message-ID:	CALj2ACWrQz-=PWc0e5QOwetVNoBOaOTKvTWyz4=2y0=NVOcOcg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Oct 9, 2020 at 2:52 PM Bharath Rupireddy <
bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Tue, Sep 29, 2020 at 6:30 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> >
> > 2. Do we have tests for toast tables? I think if you implement the
> > previous point some existing tests might cover it but I feel we should
> > have at least one or two tests for the same.
> >
> Toast table use case 1: 10000 tuples, 9.6GB data, 3 indexes 2 on integer
columns, 1 on text column(not the toast column), csv file, each row is >
1320KB:
> (222.767, 0, 1X), (134.171, 1, 1.66X), (93.749, 2, 2.38X), (93.672, 4,
2.38X), (94.827, 8, 2.35X), (93.766, 16, 2.37X), (98.153, 20, 2.27X),
(122.721, 30, 1.81X)
>
> Toast table use case 2: 100000 tuples, 96GB data, 3 indexes 2 on integer
columns, 1 on text column(not the toast column), csv file, each row is >
1320KB:
> (2255.032, 0, 1X), (1358.628, 1, 1.66X), (901.170, 2, 2.5X), (912.743, 4,
2.47X), (988.718, 8, 2.28X), (938.000, 16, 2.4X), (997.556, 20, 2.26X),
(1000.586, 30, 2.25X)
>
> Toast table use case3: 10000 tuples, 9.6GB, no indexes, binary file, each
row is > 1320KB:
> (136.983, 0, 1X), (136.418, 1, 1X), (81.896, 2, 1.66X), (62.929, 4,
2.16X), (52.311, 8, 2.6X), (40.032, 16, 3.49X), (44.097, 20, 3.09X),
(62.310, 30, 2.18X)
>
> In the case of a Toast table, we could achieve upto 2.5X for csv files,
and 3.5X for binary files. We are analyzing this point and will post an
update on our findings soon.
>

I analyzed the above point of getting only upto 2.5X performance
improvement for csv files with a toast table with 3 indexers - 2 on integer
columns and 1 on text column(not the toast column). Reason is that workers
are fast enough to do the work and they are waiting for the leader to fill
in the data blocks and in this case the leader is able to serve the workers
at its maximum possible speed. Hence most of the time the workers are
waiting not doing any beneficial work.

Having observed the above point, I tried to make workers perform more work
to avoid waiting time. For this, I added a gist index on the toasted text
column. The use and results are as follows.

Toast table use case4: 10000 tuples, 9.6GB, 4 indexes - 2 on integer
columns, 1 on non-toasted text column and 1 gist index on toasted text
column, csv file, each row is ~ 12.2KB:

(1322.839, 0, 1X), (1261.176, 1, 1.05X), (632.296, 2, 2.09X), (321.941, 4,
4.11X), (181.796, 8, 7.27X), *(105.750, 16, 12.51X)*, (107.099, 20,
12.35X), (123.262, 30, 10.73X)

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Re: Parallel copy at 2020-10-09 09:22:22 from Bharath Rupireddy

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2020-10-20 09:57:25	Re: dynamic result sets support in extended query protocol
Previous Message	John Naylor	2020-10-20 09:33:43	Re: speed up unicode normalization quick check