Re: some review comments on logical rep code

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: some review comments on logical rep code
Date: 2017-04-18 16:35:52
Message-ID: 54949025-ce80-3d77-1e97-e4d6716907c5@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thank you for working on this!

On 18/04/17 10:16, Masahiko Sawada wrote:
> On Tue, Apr 18, 2017 at 12:24 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>
>>>>>> 3.
>>>>>>>
>>>>>>> ApplyLauncherWakeup() should be static function.
>>>>>>
>>>>>> Attached 003 patch fixes it (and also fixes #5 review comment).
>>>>>>
>>>>>> 4.
>>>>>>>
>>>>>>> Seems not only CREATE SUBSCRIPTION but also ALTER SUBSCRIPTION's
>>>>>>> subcommands (e.g., ENABLED one) should wake the launcher up on commit.
>>>>>>
>>>>>> This is also reported by Horiguchi-san on another thread[1], and patch exists.
>>>>>>
>>>>>> 5.
>>>>>>>
>>>>>>> Why is IsBackendPid() check necessary in ApplyLauncherWakeup()?
>>>>>>
>>>>>> I also guess it's necessary. This change is included in #3 patch.
>>>>>
>>>>> IsBackendPid() is not free, I suppose that just ignoring failure
>>>>> with ESRCH is enough.
>>>>
>>>> After thought, since LogicalRepCtx->launcher_pid could be 0 should, we
>>>> check if launcher_pid != 0?
>>
>> Yes. For example, code to send a signal to autovacuum launcher
>> looks as the follows. I don't think there's a reason to do
>> different thing here.
>>
>> | avlauncher_needs_signal = false;
>> | if (AutoVacPID != 0)
>> | kill(AutoVacPID, SIGUSR2);
>>
>
> Fixed.

Yes the IsBackendPid was needed only because we didn't reset
launcher_pid and it was causing issues otherwise obviously (and I didn't
realize we are not resetting it...)

>
>>
>>>>>> 6.
>>>>>>>
>>>>>>> Normal statements that the walsender for logical rep runs are logged
>>>>>>> by log_replication_commands. So if log_statement is also set,
>>>>>>> those commands are logged twice.
>>>>>>
>>>>>> Attached 006 patch fixes it by adding "-c log_statement = none" to
>>>>>> connection parameter of libpqrcv if logical = true.
>>>>>
>>>>> The control by log_replication_commands is performed on
>>>>> walsender, so this also shuld be done on the same place. Anyway
>>>>> log_statement is irrelevant to walsender.
>>>>
>>>> Patch 006 emits all logs executed by the apply worker as a log of
>>>> log_replication_command but perhaps you're right. It would be better
>>>> to leave the log of log_statement if the command executed by the apply
>>>> worker is a SQL command. Will fix.
>>
>> Here, we can choose whether a SQL command is a part of
>> replication commands or not. The previous fix thought it is and
>> this doesn't. (They are emitted in different forms) Is this ok?
>
> Yes, v2 patch logs other than T_SQLCmd as a replication command, and
> T_SQLCmd is logged later in exec_simple_query. The
> log_replication_command logs only replication commands[1], it doesn't
> mean to log commands executed on replication connection. I think such
> behavior is right.
>
>> I'm not sure why you have moved the location of the code, but if
>> it should show all received commands, it might be better to be
>> placed before CHECK_FOR_INTERRUPTS..
>
> Hmm, the reason why I've moved it is that we cannot know whether given
> cmd_string is a SQL command or a replication command before parsing.
>

Well the issue is that then the query is not logged if there was parsing
issue even when logging was specifically requested. I don't know what's
good solution here, maybe teaching exec_simple_query to not log the
query if it came from walsender.

BTW reading that function in walsender, there is :
> /*
> * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot until the next
> * command arrives. Clean up the old stuff if there's anything.
> */
> SnapBuildClearExportedSnapshot();

and then

> /*
> * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot. If it was
> * called outside of transaction the snapshot should be cleared here.
> */
> if (!IsTransactionBlock())
> SnapBuildClearExportedSnapshot();

The first one should not be there, it looks like a result of some merge
conflict being solved wrongly. Maybe your patch could fix that too?

>>>>
>>>>>> 10.
>>>>>>>
>>>>>>> SpinLockAcquire(&MyLogicalRepWorker->relmutex);
>>>>>>> MyLogicalRepWorker->relstate =
>>>>>>> GetSubscriptionRelState(MyLogicalRepWorker->subid,
>>>>>>> MyLogicalRepWorker->relid,
>>>>>>> &MyLogicalRepWorker->relstate_lsn,
>>>>>>> false);
>>>>>>> SpinLockRelease(&MyLogicalRepWorker->relmutex);
>>>>>>>
>>>>>>> Non-"short-term" function like GetSubscriptionRelState() should not
>>>>>>> be called while spinlock is being held.
>>>>>>>
>>>>>>
>>>>>> One option is adding new LWLock for the relation state change but this
>>>>>> lock is used at many locations where the operation actually doesn't
>>>>>> take time. I think that the discussion would be needed to get
>>>>>> consensus, so patch for it is not attached.
>>>>>
>>>>> From the point of view of time, I agree that it doesn't seem to
>>>>> harm. Bt I thing it as more significant problem that
>>>>> potentially-throwable function is called within the mutex
>>>>> region. It potentially causes a kind of dead lock. (That said,
>>>>> theoretically dosn't occur and I'm not sure what happens by the
>>>>> dead lock..)
>>>>>

Hmm, I think doing what I attached should be fine here. We don't need to
hold spinlock for table read, just for changing the
MyLogicalRepWorker->relstate.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
010_dont-read-catalog-inside-spinlock.patch text/plain 1.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-04-18 16:37:35 Re: Why does logical replication launcher set application_name?
Previous Message Peter Eisentraut 2017-04-18 16:34:28 Re: Passing values to a dynamic background worker