Re: [PATCH] Simple progress reporting for COPY command

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Josef Šimánek <josef(dot)simanek(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Simple progress reporting for COPY command
Date: 2021-01-07 18:51:12
Message-ID: CAEze2Wgx4UJZRNwesHwXzDW0WSMvG_0=EYDhPCxSo9EwLgaLjg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 6 Jan 2021 at 22:45, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> On 1/5/21 11:02 AM, Josef Šimánek wrote:
> > I'm attaching the whole patch since commitfest failed to ingest the
> > last incremental on CI.
> >
>
> Yeah, the whole patch needs to be attached for the commitfest tester to
> work correctly - it can't apply pieces from multiple messages, etc.
>
> Anyway, I pushed this last version of patch, after a couple more tweaks,
> mainly to the docs - one place used pg_stat_copy_progress, the section
> was not indexed properly, and so on.

Thank you all, I'd love to use this in the future to keep track of
(e.g.) my backup/restore progress.

For my previously mentioned extension to keep track of filtered tuples
that are excluded by the WHERE-clause, PFA patch 0001 that keeps track
of that, in line with the current column name style of lines.

If so desired, I'll split this off into a different thread & CF entry.

> I see Matthias proposed to change "lines" to "tuples" - I only saw the
> message after pushing, but I probably wouldn't make that change anyway.
> The CSV docs seem to talk about lines, newlines etc. so it seems fine.
> If not, we can change that.

The CSV docs, sure. But copy doesn't only process CSVs; it also has
text (which does have a # lines = # tuples / rows guarantee) and
binary (in which the 'line' vocabulary doesn't make sense, and in
which the 'tuples' vocabulary is used). Additionally, most mentions of
postgres' logical rows/tuples in the COPY documentation use the 'rows'
terminology ('tuples' for FORMAT BINARY), and use 'line' only for
external format's textual representation's strings delimited by
newlines (which I believe is not exactly what we're counting).

One common user of COPY is the pg_dump tool, and that uses binary
format by default (iirc).

Additionally, all comments surrounding the *LINES_PROCESSED updates
only mention 'tuples', so I'd like to strongly suggest (a variant of)
attached patch 0002 to keep the vocabulary consistent by using
'tuples' instead of 'lines'.

With regards,

Matthias van de Meent

Attachment Content-Type Size
v1-0001-Add-progress-reporting-for-filtered-rows.patch text/x-patch 3.2 KB
v1-0002-Rename-lines-to-tuples-in-COPY-progress-reporting.patch text/x-patch 4.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josef Šimánek 2021-01-07 18:56:12 Re: [PATCH] Simple progress reporting for COPY command
Previous Message Mark Dilger 2021-01-07 18:11:44 Re: new heapcheck contrib module