Re: [PATCH] Simple progress reporting for COPY command

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Josef Šimánek <josef(dot)simanek(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Simple progress reporting for COPY command
Date: 2021-01-08 13:30:22
Message-ID: CAEze2Wj62YGOK_d67LvfGoL=ZobfmUhPn+WRGfEhMtGHBaM1Xg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 7 Jan 2021 at 23:00, Josef Šimánek <josef(dot)simanek(at)gmail(dot)com> wrote:
>
> čt 7. 1. 2021 v 22:37 odesílatel Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> napsal:
> >
> > I'm not particularly attached to the "lines" naming, it just seemed OK
> > to me. So if there's consensus to rename this somehow, I'm OK with it.
>
> The problem I do see here is it depends on the "way" of COPY. If
> you're copying from CSV file to table, those are actually lines (since
> 1 line = 1 tuple). But copying from DB to file is copying tuples (but
> 1 tuple = 1 file line). Line works better here for me personally.
>
> Once I'll fix the problem with triggers (and also another cases if
> found), I think we can consider it lines. It will represent amount of
> lines processed from file on COPY FROM and amount of lines written to
> file in COPY TO form (at least in CSV format). I'm not sure how BINARY
> format works, I'll check.

Counterexample that 1 tuple need not be 1 line, in csv/binary:

/*
* create a table with one tuple containing 1 text field, which consists of
* 10 newline characters.
* If you want windows-style lines, replace '\x0A' (\n) with '\x0D0A' (\r\n).
*/
# CREATE TABLE ttab (val) AS
SELECT * FROM (values (
repeat(convert_from(E'\x0A'::bytea, 'UTF8'), 10)::text
)) as v;

# -- indeed, one unix-style line, according to $ wc -l copy.txt
# COPY ttab TO 'copy.txt' (format text);
COPY 1
# TRUNCATE ttab; COPY ttab FROM 'copy.txt' (format text);
COPY 1

# -- 11 lines
# COPY ttab TO 'copy.csv' (format csv);
COPY 1
# TRUNCATE ttab; COPY ttab FROM 'copy.csv' (format csv);
COPY 1

# -- 13 lines
# COPY ttab TO 'copy.bin' (format binary);
COPY 1
# TRUNCATE ttab; COPY ttab FROM 'copy.bin' (format binary);
COPY 1

All of the above copy statements would only report 'lines_processed = 1',
in the progress reporting, while csv/binary line counts are definatively
inconsistent with what the progress reporting shows, because progress
reporting counts tuples / table rows, not the amount of lines in the
external file.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2021-01-08 13:38:42 Re: WIP: System Versioned Temporal Table
Previous Message Masahiro Ikeda 2021-01-08 12:44:59 Re: Add session statistics to pg_stat_database