Re: snapbuild woes

From: Erik Rijkers <er(at)xs4all(dot)nl>
To: Petr Jelinek <petr(dot)jelinek(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: snapbuild woes
Date: 2017-05-09 08:59:22
Message-ID: 169b7d221f3236b7cb790c8dd323dbe0@xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017-05-09 10:50, Petr Jelinek wrote:
> On 09/05/17 00:03, Erik Rijkers wrote:
>> On 2017-05-05 02:00, Andres Freund wrote:
>>>
>>> Could you have a look?
>>
>> Running tests with these three patches:
>>
>>> 0001-WIP-Fix-off-by-one-around-GetLastImportantRecPtr.patch+
>>> 0002-WIP-Possibly-more-robust-snapbuild-approach.patch +
>>> fix-statistics-reporting-in-logical-replication-work.patch
>> (on top of 44c528810)
>>
>> I test by 15-minute pgbench runs while there is a logical replication
>> connection. Primary and replica are on the same machine.
>>
>> I have seen errors on 3 different machines (where error means: at
>> least
>> 1 of the 4 pgbench tables is not md5-equal). It seems better, faster
>> machines yield less errors.
>>
>> Normally I see in pg_stat_replication (on master) one process in state
>> 'streaming'.
>>
>> pid | wal | replay_loc | diff | state | app |
>> sync_state
>> 16495 | 11/EDBC0000 | 11/EA3FEEE8 | 58462488 | streaming | derail2 |
>> async
>>
>> Often there are another two processes in pg_stat_replication that
>> remain
>> in state 'startup'.
>>
>> In the failing sessions the 'streaming'-state process is missing; in
>> failing sessions there are only the two processes that are and remain
>> in
>> 'startup'.
>
> Hmm, startup is the state where slot creation is happening. I wonder if
> it's just taking long time to create snapshot because of the 5th issue
> which is not yet fixed (and the original patch will not apply on top of
> this change). Alternatively there is a bug in this patch.
>
> Did you see high CPU usage during the test when there were those
> "startup" state walsenders?
>

I haven't noticed but I didn't pay attention to that particularly.

I'll try to get some CPU-info logged...

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeevan Ladhe 2017-05-09 09:10:06 Re: Bug in pg_dump --table and --exclude-table for declarative partition table handling.
Previous Message Petr Jelinek 2017-05-09 08:57:24 Re: logical replication syntax (was DROP SUBSCRIPTION, query cancellations and slot handling)