[PATCH] Initial progress reporting for COPY command

From: Josef Šimánek <josef(dot)simanek(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: [PATCH] Initial progress reporting for COPY command
Date: 2020-06-14 12:32:33
Message-ID: CAFp7QwqMGEi4OyyaLEK9DR0+E+oK3UtA4bEjDVCa4bNkwUY2PQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, as proposed by Pavel Stěhule and discussed on local czech PostgreSQL
maillist (
https://groups.google.com/d/msgid/postgresql-cz/CAFj8pRCZ42CBCa1bPHr7htffSV%2BNAcgcHHG0dVqOog4bsu2LFw%40mail.gmail.com?utm_medium=email&utm_source=footer),
I have prepared an initial patch for COPY command progress reporting.

Few examples first:

"COPY (SELECT * FROM test) TO '/tmp/ids';"

yr=# SELECT * from pg_stat_progress_copy;
pid | datid | datname | relid | direction | file | program |
lines_processed | file_bytes_processed
---------+-------+---------+-------+-----------+------+---------+-----------------+----------------------
3347126 | 16384 | yr | 0 | TO | t | f |
3529943 | 24906226
(1 row)

"COPY test FROM '/tmp/ids';

yr=# SELECT * from pg_stat_progress_copy;
pid | datid | datname | relid | direction | file | program |
lines_processed | file_bytes_processed
---------+-------+---------+-------+-----------+------+---------+-----------------+----------------------
3347126 | 16384 | yr | 16385 | FROM | t | f |
121591999 | 957218816
(1 row)

Columns are inspired by CREATE INDEX progress report system view.

pid - integer - PID of backend
datid - oid - OID of related database
datname - name - name of related database (this seems redundant, since oid
should be enough, but it is the same in CREATE INDEX)
relid - oid - oid of table related to COPY command, when not known (for
example when copying to file, it is 0)
direction - text - one of "FROM" or "TO" depends on command used
file - bool - is file is used?
program - bool - is program used?
lines_processed - bigint - amount of processed lines, works for both
directions (FROM/TO)
file_bytes_processed - amount of bytes processed when file is used
(otherwise 0), works for both direction (
FROM/TO) when file is used (file = t)

Patch is attached and can be found also at
https://github.com/simi/postgres/pull/5.

Diff version: https://github.com/simi/postgres/pull/5.diff
Patch version: https://github.com/simi/postgres/pull/5.patch

I havefew initial notes and questions.

I'm using ftell to get current position in file to populate
file_bytes_processed without error handling (ftell can return -1L and also
populate errno on problems).

1. Is that a good way to get progress of file processing?
2. Is it safe in given context to not care about errors? If not, what to do
on error?

Some columns are not populated on certain COPY commands. For example when a
file is not used, file_bytes_processed is set to 0. Would it be better to
use NULL instead when the column is not related to the current command?
Same problem is for relid column.

I have not found any tests for progress reporting. Are there any? It would
need two backends running (one running COPY, one checking output of report
view). Is there any similar test I can inspire at? In theory, it should be
possible to use dblink_send_query to run async COPY command in the
background.

My initial (attached) patch also doesn't introduce documentation for this
system view. I can add that later once this patch is finalized (if that
happens).

Attachment Content-Type Size
copy-progress-v1.patch application/octet-stream 9.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2020-06-14 13:48:12 Re: problem with RETURNING and update row movement
Previous Message Fabien COELHO 2020-06-14 10:39:07 Re: Internal key management system