Re: adding status for COPY progress report

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: adding status for COPY progress report
Date: 2022-05-25 08:20:36
Message-ID: CALNJ-vQLesCsSWig=bF8-S7=0ipLQKfQO+3ey6C7zq=x6X7B5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 24, 2022 at 3:02 PM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:

>
>
> On Tue, May 24, 2022 at 2:12 PM Matthias van de Meent <
> boekewurm+postgres(at)gmail(dot)com> wrote:
>
>> On Tue, 24 May 2022 at 22:12, Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
>> >
>> > On Tue, May 24, 2022 at 12:37 PM Matthias van de Meent <
>> boekewurm+postgres(at)gmail(dot)com> wrote:
>> >>
>> >> On Tue, 24 May 2022 at 19:13, Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
>> >> >
>> >> > Hi,
>> >> > Please see attached for enhancement to COPY command progress.
>> >> >
>> >> > The added status column would allow users to get the status of the
>> most recent COPY command.
>> >>
>> >> I fail to see the merit of retaining completed progress reporting
>> >> commands in their views after completion, other than making the
>> >> behaviour of the pg_stat_progress-views much more complicated and
>> >> adding overhead in places where we want the system to have as little
>> >> overhead as possible.
>> >>
>> >> Trying to get the status of a COPY command after it finished on a
>> >> different connection seems weird, as that other connection is likely
>> >> to have already disconnected / started another task. To be certain
>> >> that a backend can see the return status of the COPY command, you'd
>> >> have to be certain that the connection doesn't run any other
>> >> _progress-able commands in the following seconds / minutes, which
>> >> implies control over the connection, which means you already have
>> >> access to the resulting status of your COPY command.
>> >>
>> >> Regarding the patch: I really do not like that this leaks entries into
>> >> all _progress views: I get garbage data from e.g. the _create_index
>> >> and _copy views when VACUUM is running, etc, because you removed the
>> >> filter on cmdtype.
>> >> Also, the added fields in CopyToStateData / CopyFromStateData seem
>> >> useless when a pgstat_progress_update_param in the right place should
>> >> suffice.
>> >>
>> >> Kind regards,
>> >>
>> >> Matthias van de Meent
>> >
>> > Hi,
>> > For #2 above, can you let me know where the
>> pgstat_progress_update_param() call(s) should be added ?
>> > In my patch, pgstat_progress_update_param() is called from error
>> callback and EndCopy().
>>
>> In the places that the patch currently sets cstate->status it could
>> instead directly call pgstat_progress_update_param(..., STATUS_VALUE).
>> I'm fairly certain that EndCopy is not called when the error callback
>> is called, so status reporting should not be overwritten when
>> unconditionally setting the status to OK in EndCopy.
>>
>> > For #1, if I use param18 (which is not used by other views), would that
>> be better ?
>>
>> No:
>>
>> /*
>> - * Report values for only those backends which are running the
>> given
>> - * command.
>> + * Report values for only those backends which are running or
>> have run.
>> */
>> - if (!beentry || beentry->st_progress_command != cmdtype)
>> + if (!beentry || beentry->st_progress_command_target ==
>> InvalidOid)
>> continue;
>>
>> This change removes the filter that ensures that we only return the
>> backends which have a st_progress_command of the correct cmdtype (i.e.
>> for _progress_copy only those that have st_progress_command ==
>> PROGRESS_COMMAND_COPY. Without that filter, you'll return all backends
>> that have (or have had) their progress fields set at any point. Which
>> means that the expectation of "the backends returned by
>> pg_stat_get_progress_info are those running the requested command"
>> will be incorrect - you'll violate the contract / documented behaviour
>> of the function: "Returns command progress information for the named
>> command.".
>>
>> The numerical index of the column thus doesn't matter, what matters is
>> that you want special behaviour for only the COPY progress reporting
>> that doesn't fit with the rest of the progress-reporting
>> infrastructure, and that the patch as-is breaks all progress reporting
>> views.
>>
>> Sidenote: The change is also invalid because the rows that we expect
>> to return for pg_stat_progress_basebackup always have
>> st_progress_command_target == InvalidOid, so the backends running
>> BASE_BACKUP would never be visible with the change as-is. COPY (SELECT
>> stuff) also would not show up, because that too reports a
>> command_target of InvalidOid.
>>
>> Either way, I don't think that this idea is worth pursuing: the
>> progress views are explicitly there to show the progress of currently
>> active backends, and not to show the last progress state of backends
>> that at some point ran a progress-reporting-enabled command.
>>
>> Kind regards,
>>
>> Matthias van de Meent
>>
> Hi,
> Thanks for the comment.
> w.r.t. `Returns command progress information for the named command`,
> how about introducing PROGRESS_COMMAND_COPY_DONE which signifies that
> PROGRESS_COMMAND_COPY was the previous command ?
>
> In pgstat_progress_end_command():
>
> if (beentry->st_progress_command == PROGRESS_COMMAND_COPY)
> beentry->st_progress_command = PROGRESS_COMMAND_COPY_DONE;
> else
> beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
> beentry->st_progress_command_target = InvalidOid;
>
> In pg_stat_get_progress_info():
>
> if (!beentry || (beentry->st_progress_command != cmdtype &&
> (cmdtype == PROGRESS_COMMAND_COPY
> && beentry->st_progress_command !=
> PROGRESS_COMMAND_COPY_DONE)))
> continue;
>
> Cheers
>
Hi,
Patch v3 follows advice from Matthias (status field has been dropped).

Thanks

Attachment Content-Type Size
v3-copy-progress-status.patch application/octet-stream 4.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2022-05-25 09:07:01 Re: Improving connection scalability (src/backend/storage/ipc/procarray.c)
Previous Message Peter Smith 2022-05-25 08:11:38 Re: Multi-Master Logical Replication