Re: snapbuild woes

From: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
To: Erik Rijkers <er(at)xs4all(dot)nl>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: snapbuild woes
Date: 2017-05-09 09:50:35
Message-ID: 4ff19c9f-96ed-40ee-14f4-6286a56b5ac8@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/05/17 10:59, Erik Rijkers wrote:
> On 2017-05-09 10:50, Petr Jelinek wrote:
>> On 09/05/17 00:03, Erik Rijkers wrote:
>>> On 2017-05-05 02:00, Andres Freund wrote:
>>>>
>>>> Could you have a look?
>>>
>>> Running tests with these three patches:
>>>
>>>> 0001-WIP-Fix-off-by-one-around-GetLastImportantRecPtr.patch+
>>>> 0002-WIP-Possibly-more-robust-snapbuild-approach.patch +
>>>> fix-statistics-reporting-in-logical-replication-work.patch
>>> (on top of 44c528810)
>>>
>>> I test by 15-minute pgbench runs while there is a logical replication
>>> connection. Primary and replica are on the same machine.
>>>
>>> I have seen errors on 3 different machines (where error means: at least
>>> 1 of the 4 pgbench tables is not md5-equal). It seems better, faster
>>> machines yield less errors.
>>>
>>> Normally I see in pg_stat_replication (on master) one process in state
>>> 'streaming'.
>>>
>>> pid | wal | replay_loc | diff | state | app |
>>> sync_state
>>> 16495 | 11/EDBC0000 | 11/EA3FEEE8 | 58462488 | streaming | derail2 |
>>> async
>>>
>>> Often there are another two processes in pg_stat_replication that remain
>>> in state 'startup'.
>>>
>>> In the failing sessions the 'streaming'-state process is missing; in
>>> failing sessions there are only the two processes that are and remain in
>>> 'startup'.
>>
>> Hmm, startup is the state where slot creation is happening. I wonder if
>> it's just taking long time to create snapshot because of the 5th issue
>> which is not yet fixed (and the original patch will not apply on top of
>> this change). Alternatively there is a bug in this patch.
>>
>> Did you see high CPU usage during the test when there were those
>> "startup" state walsenders?
>>
>
> I haven't noticed but I didn't pay attention to that particularly.
>
> I'll try to get some CPU-info logged...
>

I rebased the above mentioned patch to apply to the patches Andres sent,
if you could try to add it on top of what you have and check if it still
fails, that would be helpful.

Thanks!

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
Skip-unnecessary-snapshot-builds.patch binary/octet-stream 6.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2017-05-09 09:54:52 Re: logical replication syntax (was DROP SUBSCRIPTION, query cancellations and slot handling)
Previous Message Masahiko Sawada 2017-05-09 09:44:42 Re: logical replication syntax (was DROP SUBSCRIPTION, query cancellations and slot handling)