[PATCH] Simple progress reporting for COPY command

From: Josef Šimánek <josef(dot)simanek(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: [PATCH] Simple progress reporting for COPY command
Date: 2021-01-01 01:25:31
Message-ID: CAFp7Qwr6_FmRM6pCO0x_a0mymOfX_Gg+FEKet4XaTGSW=LitKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

finally I had some time to revisit patch and all comments from
https://www.postgresql.org/message-id/CAFp7QwqMGEi4OyyaLEK9DR0%2BE%2BoK3UtA4bEjDVCa4bNkwUY2PQ%40mail.gmail.com
and I have prepared simple version of COPY command progress reporting.

To keep the patch small as possible, I have introduced only a minimum
set of columns. It could be extended later if needed.

Columns are inspired by CREATE INDEX progress report system view.

pid - integer - PID of backend
datid - oid - OID of related database
datname - name - name of related database (this seems redundant, since
oid should be enough, but it is the same in CREATE INDEX)
relid - oid - oid of table related to COPY command, when not known
(for example when copying to file, it is 0)
bytes_processed - bigint - amount of bytes processed
bytes_total - bigint - file size in bytes if COPY FROM file (0 if not
COPY FROM file)
lines_processed - bigint - amount of tuples processed

example output of progress for common use case (import from CSV file):

first console:
yr=# COPY test FROM '/home/retro/test.csv' (FORMAT CSV);

second console:
yr=# SELECT * FROM pg_stat_progress_copy;
pid | datid | datname | relid | bytes_processed | bytes_total |
lines_processed
--------+-------+---------+-------+-----------------+-------------+-----------------
803148 | 16384 | yr | 16394 | 998965248 | 1777777796 |
56730126
(1 row)

It is simple to get progress in percents for example by:

yr=# SELECT (bytes_processed/bytes_total::decimal)*100 FROM
pg_stat_progress_copy WHERE pid = 803148;
?column?
-------------------------
50.04287948706048525800

^ ~50% of file processed already

I did some dead simple benchmarking as well. The difference is not
huge. Each command works with 100 millions of tuples. Times are in
seconds.

test with progress master (32d6287) difference
------------------------- --------------- ------------------ ------------
COPY table TO 46.102 47.499 -1.397
COPY query TO 52.168 49.822 2.346
COPY table TO PROGRAM 52.345 51.882 0.463
COPY query TO PROGRAM 54.141 52.763 1.378
COPY table FROM 88.970 85.161 3.809
COPY table FROM PROGRAM 94.393 90.346 4.047

Properly formatted table (since I'm not sure everyone here would be
able to see the table formatted well) and the benchmark source is
present at https://github.com/simi/postgres/pull/6. I have also
included an example output in there.

I'll add this to the current commitfest as well.

Attachment Content-Type Size
001-copy-progress.patch text/x-patch 14.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2021-01-01 01:30:20 Re: Tid scan improvements
Previous Message Zhihong Yu 2021-01-01 00:34:58 Re: Table AM modifications to accept column projection lists