Re: some review comments on logical rep code

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: sawada(dot)mshk(at)gmail(dot)com
Cc: masao(dot)fujii(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: some review comments on logical rep code
Date: 2017-04-18 03:24:31
Message-ID: 20170418.122431.89192994.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thank you for the revised version.

At Mon, 17 Apr 2017 23:29:28 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoA+yFJTbw0PCw-ttmh9TsTygM3=iavXF+5Bx4O9-UXdsQ(at)mail(dot)gmail(dot)com>
> On Mon, Apr 17, 2017 at 9:13 PM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > On Mon, Apr 17, 2017 at 7:39 PM, Kyotaro HORIGUCHI
> > <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> >> At Mon, 17 Apr 2017 18:02:57 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoB3L50jWwuyaGLxJzmrmXmYJAOBHzDPR=FKwjw29mxj0Q(at)mail(dot)gmail(dot)com>
> >>> On Fri, Apr 14, 2017 at 4:47 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> >>> 1.
> >>> >
> >>> > In ApplyWorkerMain(), DatumGetInt32() should be used to get integer
> >>> > value from the argument, instead of DatumGetObjectId().
> >>>
> >>> Attached 001 patch fixes it.
> >>
> >> Hmm. I looked at the launcher side and found that it assigns bare
> >> integer to bgw_main_arg. It also should use Int32GetDatum.
> >
> > Yeah, I agreed. Will fix it.

Thanks.

> >>> 2.
> >>> >
> >>> > No one resets on_commit_launcher_wakeup flag to false.
> >>>
> >>> Attached 002 patch makes on_commit_launcher_wakeup turn off after woke
> >>> up the launcher process.
> >>
> >> It is omitting the abort case. Maybe it should be
> >> AtEOXact_ApplyLauncher(bool commit)?
> >
> > Should we wake up the launcher process even when abort?

No, I meant that on_commit_launcher_wakeup should be just reset
without waking up launcher on abort.

> >>> 3.
> >>> >
> >>> > ApplyLauncherWakeup() should be static function.
> >>>
> >>> Attached 003 patch fixes it (and also fixes #5 review comment).
> >>>
> >>> 4.
> >>> >
> >>> > Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
> >>> > subcommands (e.g., ENABLED one) should wake the launcher up on commit.
> >>>
> >>> This is also reported by Horiguchi-san on another thread[1], and patch exists.
> >>>
> >>> 5.
> >>> >
> >>> > Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?
> >>>
> >>> I also guess it's necessary. This change is included in #3 patch.
> >>
> >> IsBackendPid() is not free, I suppose that just ignoring failure
> >> with ESRCH is enough.
> >
> > After thought, since LogicalRepCtx->launcher_pid could be 0 should, we
> > check if launcher_pid != 0?

Yes. For example, code to send a signal to autovacuum launcher
looks as the follows. I don't think there's a reason to do
different thing here.

| avlauncher_needs_signal = false;
| if (AutoVacPID != 0)
| kill(AutoVacPID, SIGUSR2);

> >>> 6.
> >>> >
> >>> > Normal statements that the walsender for logical rep runs are logged
> >>> > by log_replication_commands. So if log_statement is also set,
> >>> > those commands are logged twice.
> >>>
> >>> Attached 006 patch fixes it by adding "-c log_statement = none" to
> >>> connection parameter of libpqrcv if logical = true.
> >>
> >> The control by log_replication_commands is performed on
> >> walsender, so this also shuld be done on the same place. Anyway
> >> log_statement is irrelevant to walsender.
> >
> > Patch 006 emits all logs executed by the apply worker as a log of
> > log_replication_command but perhaps you're right. It would be better
> > to leave the log of log_statement if the command executed by the apply
> > worker is a SQL command. Will fix.

Here, we can choose whether a SQL command is a part of
replication commands or not. The previous fix thought it is and
this doesn't. (They are emitted in different forms) Is this ok?

I'm not sure why you have moved the location of the code, but if
it should show all received commands, it might be better to be
placed before CHECK_FOR_INTERRUPTS..

The comment for the code seems should be more clearly.

- * compatibility. To prevent the command is logged doubly, we doesn't
- * log it when the command is a SQL command.
+ * compatibility. SQL command are logged later according
+ * to log_statement setting.

> >>> 7.
> >>> >
> >>> > LogicalRepCtx->launcher_pid isn't reset to 0 when the launcher emits
> >>> > an error. The callback function to reset it needs to be registered
> >>> > via on_shmem_exit().
> >>>
> >>> Attached 007 patch adds callback function to reset LogicalRepCtx->launcher_pid.
> >>>
> >>> 8.
> >>> >
> >>> > Typo: "an subscriber" should be "a subscriber" in some places.
> >>>
> >>> It seems that there is no longer these typo.
> >>>
> >>> 9.
> >>> > /* Get conninfo */
> >>> > datum = SysCacheGetAttr(SUBSCRIPTIONOID,
> >>> > tup,
> >>> > Anum_pg_subscription_subconninfo,
> >>> > &isnull);
> >>> > Assert(!isnull);
> >>> >
> >>> > This assertion makes me think that pg_subscription.subconnifo should
> >>> > have NOT NULL constraint. Also subslotname and subpublications
> >>> > should have NOT NULL constraint because of the same reason.
> >>>
> >>> Agreed. Attached 009 patch add NOT NULL constraint to all other
> >>> columns of pg_subscription. pg_subscription.subsynccommit should have
> >>> it as well.
> >>
> >> I'm not sure the policy here, but I suppose that the assertions
> >> there are still required irrelevantly from the nun-nullness of
> >> the attribute.
> >
> > You're right. Will fix it.
> >
> >>> 10.
> >>> >
> >>> > SpinLockAcquire(&MyLogicalRepWorker->relmutex);
> >>> > MyLogicalRepWorker->relstate =
> >>> > GetSubscriptionRelState(MyLogicalRepWorker->subid,
> >>> > MyLogicalRepWorker->relid,
> >>> > &MyLogicalRepWorker->relstate_lsn,
> >>> > false);
> >>> > SpinLockRelease(&MyLogicalRepWorker->relmutex);
> >>> >
> >>> > Non-"short-term" function like GetSubscriptionRelState() should not
> >>> > be called while spinlock is being held.
> >>> >
> >>>
> >>> One option is adding new LWLock for the relation state change but this
> >>> lock is used at many locations where the operation actually doesn't
> >>> take time. I think that the discussion would be needed to get
> >>> consensus, so patch for it is not attached.
> >>
> >> From the point of view of time, I agree that it doesn't seem to
> >> harm. Bt I thing it as more significant problem that
> >> potentially-throwable function is called within the mutex
> >> region. It potentially causes a kind of dead lock. (That said,
> >> theoretically dosn't occur and I'm not sure what happens by the
> >> dead lock..)
> >>
> >>
> >>> [1] https://www.postgresql.org/message-id/20170406.212441.177025736.horiguchi.kyotaro@lab.ntt.co.jp
> >>
> >> --
> >> Kyotaro Horiguchi
> >> NTT Open Source Software Center
> >>
> >
> >
>
> I've attached latest v2 three patches; 001, 006 and 009. The review
> comments I got so far are incorporated.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-04-18 03:40:04 Re: Different table schema in logical replication crashes
Previous Message Amos Bird 2017-04-18 03:13:29 Re: PATCH: psql show index with type info