Re: [PATCH] Initial progress reporting for COPY command

From: Josef Šimánek <josef(dot)simanek(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: [PATCH] Initial progress reporting for COPY command
Date: 2020-06-22 13:33:00
Message-ID: CAFp7QwqWSwhmEcCEoJqRJofURMQ2Sffu0+-Brt+LBUqU-ds-cw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

po 22. 6. 2020 v 14:14 odesílatel Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
napsal:

> On Sun, Jun 21, 2020 at 01:40:34PM +0200, Josef Šimánek wrote:
> >Thanks for all comments. I have updated code to support more options
> >(including STDIN/STDOUT) and added some documentation.
> >
> >Patch is attached and can be found also at
> >https://github.com/simi/postgres/pull/5.
> >
> >Diff version: https://github.com/simi/postgres/pull/5.diff
> >Patch version: https://github.com/simi/postgres/pull/5.patch
> >
> >I'm also attaching screenshot of HTML documentation and html documentation
> >file.
> >
> >I'll do my best to get this to commitfest now.
> >
>
> I see we're not showing the total number of bytes the COPY is expected
> to process, which makes it hard to estimate how far we actually are.
> Clearly there are cases when we really don't know that (exports, import
> from stdin/program), but why not to show file size for imports from a
> file? I'd expect that to be the most common case.
>

For COPY FROM file fstat is done and info is available already at
https://github.com/postgres/postgres/blob/fe186b4c200b76a5c0f03379fe8645ed1c70a844/src/backend/commands/copy.c#L1934.
It should be easy to update some param (param6 for example) with file size
and expose it in report view. When not available, this column can be NULL.

Would that be enough?

On the other side everyone can check file size manually to get total value
expected and just compare to reported bytes_processed. Alt. "wc -l" can be
checked to get amount of lines and check lines_processed column to get
progress. Should it check amount of lines and populate another column with
lines total (using a configured separator) as well? AFAIK that would need
full file scan which can be slow for huge files.

> I wonder if it made sense to show some estimates in the other cases. For
> example when exporting query result, maybe we could show the estimated
> number of rows and size? Of course, that's prone to estimation errors
> and it's more a wild idea for the future, I don't expect this patch to
> implement that.
>

My plan here was to expose numbers not being currently available and let
clients get the rest of info on their own.

For example:
- for "COPY (query) TO file" - EXPLAIN or COUNT variant of query could be
executed before to get the amount of expected rows
- for "COPY table FROM file" - file size or amount of lines in file can be
inspected first to get amount of expected rows or bytes to be processed

I see the current system view in my patch (and also all other report views
currently available) more as a scaffold to build own tools.

For example CLI tools can use this to provide some kind of progress.

> regards
>
> --
> Tomas Vondra http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-06-22 14:16:54 Re: suggest to rename enable_incrementalsort
Previous Message Dilip Kumar 2020-06-22 13:08:01 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions