Re: Make COPY format extendable: Extract COPY TO format implementations

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Sutou Kouhei <kou(at)clear-code(dot)com>
Cc: sawada(dot)mshk(at)gmail(dot)com, zhjwpku(at)gmail(dot)com, andrew(at)dunslane(dot)net, nathandbossart(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Date: 2024-02-01 01:57:58
Message-ID: Zbr6piWuVHDtFFOl@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 31, 2024 at 02:39:54PM +0900, Michael Paquier wrote:
> Thanks, I'm looking into that now.

I have much to say about the patch, but for now I have begun running
some performance tests using the patches, because this thread won't
get far until we are sure that the callbacks do not impact performance
in some kind of worst-case scenario. First, here is what I used to
setup a set of tables used for COPY FROM and COPY TO (requires [1] to
feed COPY FROM's data to the void, and note that default values is to
have a strict control on the size of the StringInfos used in the copy
paths):
CREATE EXTENSION blackhole_am;
CREATE OR REPLACE FUNCTION create_table_cols(tabname text, num_cols int)
RETURNS VOID AS
$func$
DECLARE
query text;
BEGIN
query := 'CREATE UNLOGGED TABLE ' || tabname || ' (';
FOR i IN 1..num_cols LOOP
query := query || 'a_' || i::text || ' int default 1';
IF i != num_cols THEN
query := query || ', ';
END IF;
END LOOP;
query := query || ')';
EXECUTE format(query);
END
$func$ LANGUAGE plpgsql;
-- Tables used for COPY TO
SELECT create_table_cols ('to_tab_1', 1);
SELECT create_table_cols ('to_tab_10', 10);
INSERT INTO to_tab_1 SELECT FROM generate_series(1, 10000000);
INSERT INTO to_tab_10 SELECT FROM generate_series(1, 10000000);
-- Data for COPY FROM
COPY to_tab_1 TO '/tmp/to_tab_1.bin' WITH (format binary);
COPY to_tab_10 TO '/tmp/to_tab_10.bin' WITH (format binary);
COPY to_tab_1 TO '/tmp/to_tab_1.txt' WITH (format text);
COPY to_tab_10 TO '/tmp/to_tab_10.txt' WITH (format text);
-- Tables used for COPY FROM
SELECT create_table_cols ('from_tab_1', 1);
SELECT create_table_cols ('from_tab_10', 10);
ALTER TABLE from_tab_1 SET ACCESS METHOD blackhole_am;
ALTER TABLE from_tab_10 SET ACCESS METHOD blackhole_am;

Then I have run a set of tests using HEAD, v7 and v10 with queries
like that (adapt them depending on the format and table):
COPY to_tab_1 TO '/dev/null' WITH (FORMAT text) \watch count=5
SET client_min_messages TO error; -- for blackhole_am
COPY from_tab_1 FROM '/tmp/to_tab_1.txt' with (FORMAT 'text') \watch count=5
COPY from_tab_1 FROM '/tmp/to_tab_1.bin' with (FORMAT 'binary') \watch count=5

All the patches have been compiled with -O2, without assertions, etc.
Postgres is run in tmpfs mode, on scissors, without fsync. Unlogged
tables help a bit in focusing on the execution paths as we don't care
about WAL, of course. I have also included v7 in the test of tests,
as this version uses more simple per-row callbacks.

And here are the results I get for text and binary (ms, average of 15
queries after discarding the three highest and three lowest values):
test | master | v7 | v10
-----------------+--------+------+------
from_bin_1col | 1575 | 1546 | 1575
from_bin_10col | 5364 | 5208 | 5230
from_text_1col | 1690 | 1715 | 1684
from_text_10col | 4875 | 4793 | 4757
to_bin_1col | 1717 | 1730 | 1731
to_bin_10col | 7728 | 7707 | 7513
to_text_1col | 1710 | 1730 | 1698
to_text_10col | 5998 | 5960 | 5987

I am getting an interesting trend here in terms of a speedup between
HEAD and the patches with a table that has 10 attributes filled with
integers, especially for binary and text with COPY FROM. COPY TO
binary also gets nice numbers, while text looks rather stable. Hmm.

These were on my buildfarm animal, but we need to be more confident
about all this. Could more people run these tests? I am going to do
a second session on a local machine I have at hand and see what
happens. Will publish the numbers here, the method will be the same.

[1]: https://github.com/michaelpq/pg_plugins/tree/main/blackhole_am
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message James Coleman 2024-02-01 02:04:27 set_cheapest without checking pathlist
Previous Message Kyotaro Horiguchi 2024-02-01 01:53:57 Re: More new SQL/JSON item methods