Re: [PATCH] Simple progress reporting for COPY command

From: Josef Šimánek <josef(dot)simanek(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Simple progress reporting for COPY command
Date: 2021-01-07 16:00:18
Message-ID: CAFp7QwoZ+nxp7Q1ycWtEe-3uw7Hgzb2ZY6ywTbfxNkz9hqM1eg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

čt 7. 1. 2021 v 16:54 odesílatel Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> napsal:
>
>
>
> On 1/7/21 12:06 PM, Josef Šimánek wrote:
> > st 6. 1. 2021 v 22:44 odesílatel Tomas Vondra
> > <tomas(dot)vondra(at)enterprisedb(dot)com> napsal:
> >>
> >> On 1/5/21 11:02 AM, Josef Šimánek wrote:
> >>> I'm attaching the whole patch since commitfest failed to ingest the
> >>> last incremental on CI.
> >>>
> >>
> >> Yeah, the whole patch needs to be attached for the commitfest tester to
> >> work correctly - it can't apply pieces from multiple messages, etc.
> >>
> >> Anyway, I pushed this last version of patch, after a couple more tweaks,
> >> mainly to the docs - one place used pg_stat_copy_progress, the section
> >> was not indexed properly, and so on.
> >>
> >> I see Matthias proposed to change "lines" to "tuples" - I only saw the
> >> message after pushing, but I probably wouldn't make that change anyway.
> >> The CSV docs seem to talk about lines, newlines etc. so it seems fine.
> >> If not, we can change that.
> >>
> >> One more question, though - I now realize the lines_processed ignores
> >> rows skipped because of BEFORE INSERT triggers. I wonder if that's the
> >> right thing to do? Imagine you know the number of lines in a file. You
> >> can't really use (lines_processed / total_lines) to measure progress,
> >> because that may ignore many "skipped" rows. So maybe this should be
> >> changed to count all rows. OTOH we still have bytes_processed.
> >
> > I think that should be fixed. It is called "lines_processed" not
> > "lines_inserted". I'll take a look.
> >
>
> So we may either rename the column to "lines_inserted", or tweak the
> code to count all processed lines. Or track both and have two columns.

First I'll ensure lines_processed represents the actual amount of
processed lines. If reading from file and some lines are skipped due
to before insert trigger, I consider that one processed as well, even
if it is not inserted. If welcomed, I can add lines_inserted later as
well. But I'm not sure about the use case.

Also thanks to mentioning triggers, I think those could be used to
test the COPY progress (at least some variants). I'll check if I would
be able to add some test cases as well.

> regarss
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Eckhardt 2021-01-07 16:20:48 Re: Zedstore - compressed in-core columnar storage
Previous Message Tomas Vondra 2021-01-07 15:56:26 Re: list of extended statistics on psql