Re: [PATCH] Simple progress reporting for COPY command

From: Josef Šimánek <josef(dot)simanek(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Simple progress reporting for COPY command
Date: 2021-01-05 10:02:25
Message-ID: CAFp7Qwop_j94NaZ4OoOSNpCiLY3m+eq6WO3RmCWCw7Y0tOL_Fw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm attaching the whole patch since commitfest failed to ingest the
last incremental on CI.

út 5. 1. 2021 v 2:32 odesílatel Josef Šimánek <josef(dot)simanek(at)gmail(dot)com> napsal:
>
> út 5. 1. 2021 v 0:46 odesílatel Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> napsal:
> >
> > Hi,
> >
> > I did take a quick look today, and I have a couple minor comments:
>
> Hi! Thanks for your time.
>
> > 1) The catalog sgml docs seem to mention bytes_processed twice (one of
> > that should be bytes_total), and line_processed (should be "lines_").
>
> Fixed in attached patch.
>
> >
> > 2) I'm not quite sure about not including any info about the command.
> > For example pg_stat_progress_create_index includes info about the
> > command, although it's not very detailed. Not sure how useful would it
> > be to show just COPY FROM / COPY TO, without more details.
> >
> > It's probably possible to extract this from pg_stat_activity, but that
> > may be rather cumbersome (parsing arbitrary SQL and all that). Also,
> > what if the COPY is called from a function etc.?
>
> Any idea where to discuss this? My usecase is really simple. I would
> like to be able to see progress of COPY command by pid. There is a lot
> of COPY info inside CopyToStateData and CopyFromStateData structs. The
> only limitation I see is support of only int64 values for progress
> reporting columns. I'm not sure if it is easily possible to expose for
> example filename this way.
>
> >
> > 3) I wonder if we should have something like lines_estimated. For COPY
> > TO it's pretty simple to calculate it as
> >
> > bytes_total / (bytes_processed / lines_processed)
> >
> > but what about
> >
> > COPY (query) TO file
> >
> > In that case we don't know the total amount of bytes / rows, so we can't
> > calculate any estimate. So maybe we could peek into the query plan? But
> > I agree this is something we can add later.
>
> If I remember well one of the original ideas was to be able to pass
> custom bytes_total (or lines_total) by COPY options to be stored in
> copy progress. I can add this in some following patch if still
> welcomed.
>
> For estimates I would prefer to add an additional column to not mix
> those two together (or at least boolean estimated = true/false and
> reuse bytes_total column). If query estimates are welcomed, I can take
> a look at how to reach the query plan and expose those numbers when
> the query is used to estimated_lines and potentially estimated_bytes
> columns. It would be probably a little tricky to calculate
> estimated_bytes for some column types.
>
> Also currently only COPY FROM file supports bytes_total (it is 0 for
> all other scenarios). I think it should be possible for PostgreSQL to
> know the actual amount of lines query returns for some kind of
> queries, but I have no idea where to look at this. If that's possible
> to get, it would be one of the next steps to introduce additional
> column lines_total.
>
> >
> > 4) This comment is a bit confusing, as it mixes "total" and "processed".
> > I'd just say "number of bytes processed so far" instead.
> >
> Fixed in attached patch.
> >
> > Other than that, it seems fine. I'm sure we could add more features, but
> > it seems like a good start - I plan to get this committed once I get a
> > patch fixing the docs issues.
>
> Patch is attached, it should be applied to the top of the previous
> patch. Overall patch (having both patches merged together) could be
> found at https://patch-diff.githubusercontent.com/raw/simi/postgres/pull/6.patch.
>
> > regards
> >
> > --
> > Tomas Vondra
> > EnterpriseDB: http://www.enterprisedb.com
> > The Enterprise PostgreSQL Company

Attachment Content-Type Size
001-copy-progress-revivew-1.patch text/x-patch 16.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-01-05 10:06:24 Re: New Table Access Methods for Multi and Single Inserts
Previous Message Masahiko Sawada 2021-01-05 09:56:29 Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion