Re: Interval for launching the table sync worker

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Interval for launching the table sync worker
Date: 2017-04-14 10:18:27
Message-ID: CAD21AoDcgDC2+K=V9R7UNAYgbVGKY17MwkTUiJ7CNNnMQ-1ECg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 14, 2017 at 7:09 AM, Petr Jelinek
<petr(dot)jelinek(at)2ndquadrant(dot)com> wrote:
> On 13/04/17 12:23, Masahiko Sawada wrote:
>> On Thu, Apr 13, 2017 at 11:53 AM, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>>> On Wed, Apr 12, 2017 at 11:46 PM, Peter Eisentraut
>>> <peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
>>>> On 4/12/17 00:48, Masahiko Sawada wrote:
>>>>> On Wed, Apr 12, 2017 at 1:28 PM, Peter Eisentraut
>>>>>> Perhaps instead of a global last_start_time, we store a per relation
>>>>>> last_start_time in SubscriptionRelState?
>>>>>
>>>>> I was thinking the same. But a problem is that the list of
>>>>> SubscriptionRelState is refreshed whenever the syncing table state
>>>>> becomes invalid (table_state_valid = false). I guess we need to
>>>>> improve these logic including GetSubscriptionNotReadyRelations().
>>>>
>>>> The table states are invalidated on a syscache callback from
>>>> pg_subscription_rel, which happens roughly speaking when a table
>>>> finishes the initial sync. So if we're worried about failing tablesync
>>>> workers relaunching to quickly, this would only be a problem if a
>>>> tablesync of another table finishes right in that restart window. That
>>>> doesn't seem a terrible issue to me.
>>>>
>>>
>>> I think the table states are invalidated whenever the table sync
>>> worker starts, because the table sync worker updates its status of
>>> pg_subscription_rel and commits it before starting actual copy. So we
>>> cannot rely on that. I thought we can store last_start_time into
>>> pg_subscription_rel but it might be overkill. I'm now thinking to
>>> change GetSubscriptionNotReadyRealtions so that last_start_time in
>>> SubscriptionRelState is taken over to new list.
>>>
>>
>> Attached the latest patch. It didn't actually necessary to change
>> GetSubscriptionNotReadyRelations. I just changed the logic refreshing
>> the sync table state list.
>> Please give me feedback.
>>
>
> Hmm this might work. Although I was actually wondering if we could store
> the last start timestamp in the worker shared memory and do some magic
> with that (ie, not clearing subid and relid and try to then do rate
> limiting based on subid+relid+timestamp stored in shmem). That would
> then work same way for the main apply workers as well. It would have the
> disadvantage that if some tables were consistently failing, no other
> tables could get synchronized as the amount of workers is limited.

Hmm I guess that it's not a good design that a table sync worker and a
apply worker for a relation takes sole possession of a worker slot
until it successes. It would bother each other.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2017-04-14 10:57:56 Logical replication launcher uses wal_retrieve_retry_interval
Previous Message Kyotaro HORIGUCHI 2017-04-14 08:28:40 Re: Should pg_current_wal_location() become pg_current_wal_lsn()