Re: adding status for COPY progress report

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: adding status for COPY progress report
Date: 2022-05-24 22:02:13
Message-ID: CALNJ-vTLYGgRjMQm-XJajpm7ovN27AHNNx81m_S1R+LExhmFgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 24, 2022 at 2:12 PM Matthias van de Meent <
boekewurm+postgres(at)gmail(dot)com> wrote:

> On Tue, 24 May 2022 at 22:12, Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
> >
> > On Tue, May 24, 2022 at 12:37 PM Matthias van de Meent <
> boekewurm+postgres(at)gmail(dot)com> wrote:
> >>
> >> On Tue, 24 May 2022 at 19:13, Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
> >> >
> >> > Hi,
> >> > Please see attached for enhancement to COPY command progress.
> >> >
> >> > The added status column would allow users to get the status of the
> most recent COPY command.
> >>
> >> I fail to see the merit of retaining completed progress reporting
> >> commands in their views after completion, other than making the
> >> behaviour of the pg_stat_progress-views much more complicated and
> >> adding overhead in places where we want the system to have as little
> >> overhead as possible.
> >>
> >> Trying to get the status of a COPY command after it finished on a
> >> different connection seems weird, as that other connection is likely
> >> to have already disconnected / started another task. To be certain
> >> that a backend can see the return status of the COPY command, you'd
> >> have to be certain that the connection doesn't run any other
> >> _progress-able commands in the following seconds / minutes, which
> >> implies control over the connection, which means you already have
> >> access to the resulting status of your COPY command.
> >>
> >> Regarding the patch: I really do not like that this leaks entries into
> >> all _progress views: I get garbage data from e.g. the _create_index
> >> and _copy views when VACUUM is running, etc, because you removed the
> >> filter on cmdtype.
> >> Also, the added fields in CopyToStateData / CopyFromStateData seem
> >> useless when a pgstat_progress_update_param in the right place should
> >> suffice.
> >>
> >> Kind regards,
> >>
> >> Matthias van de Meent
> >
> > Hi,
> > For #2 above, can you let me know where the
> pgstat_progress_update_param() call(s) should be added ?
> > In my patch, pgstat_progress_update_param() is called from error
> callback and EndCopy().
>
> In the places that the patch currently sets cstate->status it could
> instead directly call pgstat_progress_update_param(..., STATUS_VALUE).
> I'm fairly certain that EndCopy is not called when the error callback
> is called, so status reporting should not be overwritten when
> unconditionally setting the status to OK in EndCopy.
>
> > For #1, if I use param18 (which is not used by other views), would that
> be better ?
>
> No:
>
> /*
> - * Report values for only those backends which are running the
> given
> - * command.
> + * Report values for only those backends which are running or
> have run.
> */
> - if (!beentry || beentry->st_progress_command != cmdtype)
> + if (!beentry || beentry->st_progress_command_target == InvalidOid)
> continue;
>
> This change removes the filter that ensures that we only return the
> backends which have a st_progress_command of the correct cmdtype (i.e.
> for _progress_copy only those that have st_progress_command ==
> PROGRESS_COMMAND_COPY. Without that filter, you'll return all backends
> that have (or have had) their progress fields set at any point. Which
> means that the expectation of "the backends returned by
> pg_stat_get_progress_info are those running the requested command"
> will be incorrect - you'll violate the contract / documented behaviour
> of the function: "Returns command progress information for the named
> command.".
>
> The numerical index of the column thus doesn't matter, what matters is
> that you want special behaviour for only the COPY progress reporting
> that doesn't fit with the rest of the progress-reporting
> infrastructure, and that the patch as-is breaks all progress reporting
> views.
>
> Sidenote: The change is also invalid because the rows that we expect
> to return for pg_stat_progress_basebackup always have
> st_progress_command_target == InvalidOid, so the backends running
> BASE_BACKUP would never be visible with the change as-is. COPY (SELECT
> stuff) also would not show up, because that too reports a
> command_target of InvalidOid.
>
> Either way, I don't think that this idea is worth pursuing: the
> progress views are explicitly there to show the progress of currently
> active backends, and not to show the last progress state of backends
> that at some point ran a progress-reporting-enabled command.
>
> Kind regards,
>
> Matthias van de Meent
>
Hi,
Thanks for the comment.
w.r.t. `Returns command progress information for the named command`,
how about introducing PROGRESS_COMMAND_COPY_DONE which signifies that
PROGRESS_COMMAND_COPY was the previous command ?

In pgstat_progress_end_command():

if (beentry->st_progress_command == PROGRESS_COMMAND_COPY)
beentry->st_progress_command = PROGRESS_COMMAND_COPY_DONE;
else
beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
beentry->st_progress_command_target = InvalidOid;

In pg_stat_get_progress_info():

if (!beentry || (beentry->st_progress_command != cmdtype &&
(cmdtype == PROGRESS_COMMAND_COPY
&& beentry->st_progress_command !=
PROGRESS_COMMAND_COPY_DONE)))
continue;

Cheers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2022-05-24 22:16:43 Re: [RFC] Improving multi-column filter cardinality estimation using MCVs and HyperLogLog
Previous Message Nathan Bossart 2022-05-24 22:01:47 fix stats_fetch_consistency value in postgresql.conf.sample