Re: [PATCH] Simple progress reporting for COPY command

From: Josef Šimánek <josef(dot)simanek(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Simple progress reporting for COPY command
Date: 2021-01-05 01:32:55
Message-ID: CAFp7QwqKp6_iM0R7R+f8pjwWfD+xxgm2NbDLLzj3+2RhMhcJog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

út 5. 1. 2021 v 0:46 odesílatel Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> napsal:
>
> Hi,
>
> I did take a quick look today, and I have a couple minor comments:

Hi! Thanks for your time.

> 1) The catalog sgml docs seem to mention bytes_processed twice (one of
> that should be bytes_total), and line_processed (should be "lines_").

Fixed in attached patch.

>
> 2) I'm not quite sure about not including any info about the command.
> For example pg_stat_progress_create_index includes info about the
> command, although it's not very detailed. Not sure how useful would it
> be to show just COPY FROM / COPY TO, without more details.
>
> It's probably possible to extract this from pg_stat_activity, but that
> may be rather cumbersome (parsing arbitrary SQL and all that). Also,
> what if the COPY is called from a function etc.?

Any idea where to discuss this? My usecase is really simple. I would
like to be able to see progress of COPY command by pid. There is a lot
of COPY info inside CopyToStateData and CopyFromStateData structs. The
only limitation I see is support of only int64 values for progress
reporting columns. I'm not sure if it is easily possible to expose for
example filename this way.

>
> 3) I wonder if we should have something like lines_estimated. For COPY
> TO it's pretty simple to calculate it as
>
> bytes_total / (bytes_processed / lines_processed)
>
> but what about
>
> COPY (query) TO file
>
> In that case we don't know the total amount of bytes / rows, so we can't
> calculate any estimate. So maybe we could peek into the query plan? But
> I agree this is something we can add later.

If I remember well one of the original ideas was to be able to pass
custom bytes_total (or lines_total) by COPY options to be stored in
copy progress. I can add this in some following patch if still
welcomed.

For estimates I would prefer to add an additional column to not mix
those two together (or at least boolean estimated = true/false and
reuse bytes_total column). If query estimates are welcomed, I can take
a look at how to reach the query plan and expose those numbers when
the query is used to estimated_lines and potentially estimated_bytes
columns. It would be probably a little tricky to calculate
estimated_bytes for some column types.

Also currently only COPY FROM file supports bytes_total (it is 0 for
all other scenarios). I think it should be possible for PostgreSQL to
know the actual amount of lines query returns for some kind of
queries, but I have no idea where to look at this. If that's possible
to get, it would be one of the next steps to introduce additional
column lines_total.

>
> 4) This comment is a bit confusing, as it mixes "total" and "processed".
> I'd just say "number of bytes processed so far" instead.
>
Fixed in attached patch.
>
> Other than that, it seems fine. I'm sure we could add more features, but
> it seems like a good start - I plan to get this committed once I get a
> patch fixing the docs issues.

Patch is attached, it should be applied to the top of the previous
patch. Overall patch (having both patches merged together) could be
found at https://patch-diff.githubusercontent.com/raw/simi/postgres/pull/6.patch.

> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

Attachment Content-Type Size
002-copy-progress.patch text/x-patch 2.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-01-05 01:35:48 Re: New IndexAM API controlling index vacuum strategies
Previous Message Kyotaro Horiguchi 2021-01-05 01:07:24 Re: standby recovery fails (tablespace related) (tentative patch and discussion)