Re: Adding a LogicalRepWorker type field

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Adding a LogicalRepWorker type field
Date: 2023-08-02 06:27:13
Message-ID: CAHut+PvqyR1y23xxxuuc8rPfGYB9ACA2BrwCon0H98XMwn6nZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 2, 2023 at 3:35 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Mon, Jul 31, 2023 at 10:47 AM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
> >
> > Hi hackers,
> >
> > BACKGROUND:
> >
> > The logical replication has different worker "types" (e.g. apply
> > leader, apply parallel, tablesync).
> >
> > They all use a common structure called LogicalRepWorker, but at times
> > it is necessary to know what "type" of worker a given LogicalRepWorker
> > represents.
> >
> > Once, there were just apply workers and tablesync workers - these were
> > easily distinguished because only tablesync workers had a valid
> > 'relid' field. Next, parallel-apply workers were introduced - these
> > are distinguishable from the apply leaders by the value of
> > 'leader_pid' field.
> >
> >
> > PROBLEM:
> >
> > IMO, deducing the worker's type by examining multiple different field
> > values seems a dubious way to do it. This maybe was reasonable enough
> > when there were only 2 types, but as more get added it becomes
> > increasingly complicated.
> >
> > SOLUTION:
> >
> > AFAIK none of the complications is necessary anyway; the worker type
> > is already known at launch time, and it never changes during the life
> > of the process.
> >
> > The attached Patch 0001 introduces a new enum 'type' field, which is
> > assigned during the launch.
> >
> > ~
> >
> > This change not only simplifies the code, but it also permits other
> > code optimizations, because now we can switch on the worker enum type,
> > instead of using cascading if/else. (see Patch 0002).
> >
> > Thoughts?
>
> I can see the problem you stated and it's true that the worker's type
> never changes during its lifetime. But I'm not sure we really need to
> add a new variable to LogicalRepWorker since we already have enough
> information there to distinguish the worker's type.
>
> Currently, we deduce the worker's type by checking some fields that
> never change such as relid and leader_piid. It's fine to me as long as
> we encapcel the deduction of worker's type by macros (or inline
> functions). Even with the proposed patch (0001 patch), we still need a
> similar logic to determine the worker's type.

Thanks for the feedback.

I agree that the calling code will not look very different
with/without this patch 0001, because everything is hidden by the
macros/functions. But those HEAD macros/functions are also hiding the
inefficiencies of the type deduction -- e.g. IMO it is quite strange
that we can only recognize the worker is an "apply worker" by
eliminating the other 2 possibilities.

As further background, the idea for this patch came from considering
another thread to "re-use" workers [1]. For example, when thinking
about re-using tablesync workers the HEAD am_tablesync_worker()
function begins to fall apart -- ie. it does not let you sensibly
disassociate a tablesync worker from its relid (which you might want
to do if tablesync workers were "pooled") because in the HEAD code any
tablesync worker without a relid is (by definition) NOT recognized as
a tablesync worker anymore!

Those above issues just disappear when a type field is added.

>
> If we want to change both process_syncing_tables() and
> should_apply_changes_for_rel() to use the switch statement instead of
> using cascading if/else, I think we can have a function, say
> LogicalRepWorkerType(), that returns LogicalRepWorkerType of the given
> worker.
>

The point of that "switch" in patch 0002 was to demonstrate that with
a worker-type enum we can write more *efficient* code in some places
than the current cascading if/else. I think making a "switch" just for
the sake of it, by adding more functions that hide yet more work, is
going in the opposite direction.

------
[1] reuse workers -
https://www.postgresql.org/message-id/flat/CAGPVpCTq%3DrUDd4JUdaRc1XUWf4BrH2gdSNf3rtOMUGj9rPpfzQ%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yuya Watari 2023-08-02 06:40:39 Re: [PoC] Reducing planning time when tables have many partitions
Previous Message Jeff Davis 2023-08-02 06:06:56 Re: Faster "SET search_path"