RE: Failed transaction statistics to measure the logical replication progress

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>
Cc: "wangw(dot)fnst(at)fujitsu(dot)com" <wangw(dot)fnst(at)fujitsu(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "vignesh21(at)gmail(dot)com" <vignesh21(at)gmail(dot)com>, "sawada(dot)mshk(at)gmail(dot)com" <sawada(dot)mshk(at)gmail(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Greg Nancarrow <gregn4422(at)gmail(dot)com>
Subject: RE: Failed transaction statistics to measure the logical replication progress
Date: 2022-02-24 23:01:57
Message-ID: TYCPR01MB8373A240DA933A9B055AF551ED3D9@TYCPR01MB8373.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, February 23, 2022 3:30 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> On Tue, Feb 22, 2022 at 6:45 AM tanghy(dot)fnst(at)fujitsu(dot)com
> <tanghy(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > I found a problem when using it. When a replication workers exits, the
> > transaction stats should be sent to stats collector if they were not
> > sent before because it didn't reach PGSTAT_STAT_INTERVAL. But I saw
> > that the stats weren't updated as expected.
> >
> > I looked into it and found that the replication worker would send the
> > transaction stats (if any) before it exits. But it got invalid subid
> > in pgstat_send_subworker_xact_stats(), which led to the following result:
> >
> > postgres=# select pg_stat_get_subscription_worker(0, null);
> > pg_stat_get_subscription_worker
> > ---------------------------------
> > (0,,2,0,0,,,,0,"",)
> > (1 row)
> >
> > I think that's because subid has already been cleaned when trying to
> > send the stats. I printed the value of before_shmem_exit_list, the
> > functions in this list would be called in shmem_exit() when the worker exits.
> > logicalrep_worker_onexit() would clean up the worker info (including
> > subid), and
> > pgstat_shutdown_hook() would send stats if any.
> > logicalrep_worker_onexit() was called before calling
> pgstat_shutdown_hook().
> >
>
> Yeah, I think that is a problem and maybe we can think of solving it by sending
> the stats via logicalrep_worker_onexit before subid is cleared but not sure that
> is a good idea. I feel we need to go back to the idea of v21 for sending stats
> instead of using pgstat_report_stat.
> I think the one thing which we could improve is to avoid trying to send it each
> time before receiving each message by walrcv_receive and rather try to send it
> before we try to wait (WaitLatchOrSocket).
> Trying after each message doesn't seem to be required and could lead to some
> overhead as well. What do you think?
I agree. Fixed.

Kindly have a look at v24 shared in [1].

[1] - https://www.postgresql.org/message-id/TYCPR01MB8373A3E1BE237BAF38185BF2ED3D9%40TYCPR01MB8373.jpnprd01.prod.outlook.com

Best Regards,
Takamichi Osumi

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2022-02-24 23:10:59 why do hash index builds use smgrextend() for new splitpoint pages
Previous Message osumi.takamichi@fujitsu.com 2022-02-24 22:57:39 RE: Failed transaction statistics to measure the logical replication progress