Re: [PATCH] Simple progress reporting for COPY command

From: Josef Šimánek <josef(dot)simanek(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Simple progress reporting for COPY command
Date: 2021-01-07 18:56:12
Message-ID: CAFp7QwpXNKo+toMWgTrOeaTa1f4LOtb1OTP8KeVH1eRa_jngtA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

čt 7. 1. 2021 v 19:51 odesílatel Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> napsal:
>
> On Wed, 6 Jan 2021 at 22:45, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> >
> > On 1/5/21 11:02 AM, Josef Šimánek wrote:
> > > I'm attaching the whole patch since commitfest failed to ingest the
> > > last incremental on CI.
> > >
> >
> > Yeah, the whole patch needs to be attached for the commitfest tester to
> > work correctly - it can't apply pieces from multiple messages, etc.
> >
> > Anyway, I pushed this last version of patch, after a couple more tweaks,
> > mainly to the docs - one place used pg_stat_copy_progress, the section
> > was not indexed properly, and so on.
>
> Thank you all, I'd love to use this in the future to keep track of
> (e.g.) my backup/restore progress.
>
> For my previously mentioned extension to keep track of filtered tuples
> that are excluded by the WHERE-clause, PFA patch 0001 that keeps track
> of that, in line with the current column name style of lines.

If I understand it well, this column could be used on COPY TO to track
skipped lines because of BEFORE TRIGGER, right? I can include this in
my following patch keeping lines_processed incremented even for
skipped lines as well.

> If so desired, I'll split this off into a different thread & CF entry.
>
> > I see Matthias proposed to change "lines" to "tuples" - I only saw the
> > message after pushing, but I probably wouldn't make that change anyway.
> > The CSV docs seem to talk about lines, newlines etc. so it seems fine.
> > If not, we can change that.
>
> The CSV docs, sure. But copy doesn't only process CSVs; it also has
> text (which does have a # lines = # tuples / rows guarantee) and
> binary (in which the 'line' vocabulary doesn't make sense, and in
> which the 'tuples' vocabulary is used). Additionally, most mentions of
> postgres' logical rows/tuples in the COPY documentation use the 'rows'
> terminology ('tuples' for FORMAT BINARY), and use 'line' only for
> external format's textual representation's strings delimited by
> newlines (which I believe is not exactly what we're counting).
>
> One common user of COPY is the pg_dump tool, and that uses binary
> format by default (iirc).
>
> Additionally, all comments surrounding the *LINES_PROCESSED updates
> only mention 'tuples', so I'd like to strongly suggest (a variant of)
> attached patch 0002 to keep the vocabulary consistent by using
> 'tuples' instead of 'lines'.
>
>
> With regards,
>
> Matthias van de Meent

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-01-07 19:08:49 Re: Key management with tests
Previous Message Matthias van de Meent 2021-01-07 18:51:12 Re: [PATCH] Simple progress reporting for COPY command