Re: Re: [pgadmin-support] Issue with a hanging apply process on the replica db after vacuum works on primary

From: Sergey Shchukin <shchukin(dot)s(dot)a(at)gmail(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Radovan Jablonovsky <radovan(dot)jablonovsky(at)replicon(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Re: [pgadmin-support] Issue with a hanging apply process on the replica db after vacuum works on primary
Date: 2015-02-27 11:11:14
Message-ID: 54F050D2.7030406@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgadmin-support pgsql-admin pgsql-general pgsql-hackers pgsql-performance

27.02.2015 11:52, Jim Nasby пишет:
> On 2/26/15 12:25 AM, Sergey Shchukin wrote:
>> Hi Radovan !
>>
>> Thank you for the reply. The question is that this table is not a
>> subject for a massive updates/deletes.
>>
>> Is there any additional traces except from perf or pg_top to trace what
>> replica is doing at the particular moment when we are lagging in replay?
>> To see locks or spins or sleeps etc..
>
> Please don't top-post.
>
> What version is this? What is max_standby_streaming_delay set to?
>
>> Thank you!
>>
>> -
>>
>> Best regards,
>> Sergey Shchukin
>>
>> 24.02.2015 19:05, Radovan Jablonovsky пишет:
>>> This looks like more issue for pgsql-general mailing list.
>>>
>>> Possible solutions
>>> 1) Set specific autovacuum parameters on the big table. The autovacuum
>>> could vacuum table on multiple runs based on the thresholds and cost
>>> settings
>>> Example of setting specific values of autovacuum and analyze for
>>> table. It should be adjusted for your system, work load, table
>>> usage, etc:
>>> alter table "my_schema"."my_big_table" set (fillfactor = 80,
>>> autovacuum_enabled = true, autovacuum_vacuum_threshold = 200,
>>> autovacuum_analyze_threshold = 400, autovacuum_vacuum_scale_factor =
>>> 0.05, autovacuum_analyze_scale_factor = 0.005,
>>> autovacuum_vacuum_cost_delay = 10, autovacuum_vacuum_cost_limit =
>>> 5000);
>>>
>>> 2) Could be to partition the large table on master site and vacuum it
>>> partition by partition.
>>>
>>> On Tue, Feb 24, 2015 at 6:42 AM, Sergey Shchukin
>>> <shchukin(dot)s(dot)a(at)gmail(dot)com <mailto:shchukin(dot)s(dot)a(at)gmail(dot)com>> wrote:
>>>
>>> Hi all!
>>>
>>> May someone help me with the issue in the apply process on the
>>> replica. We have a stream replication and after vacuum stops
>>> working with a big table we get a "freeze" in applying data on the
>>> replica database. It looks like this:
>>>
>>> Tue Feb 24 15:04:51 MSK 2015 Stream:
>>> MASTER-masterdb:79607136410456 SLAVE:79607136410456
>>> Replay:79607136339456 :: REPLAY 69 KBytes (00:00:00.294485 seconds)
>>> Tue Feb 24 15:04:52 MSK 2015 Stream:
>>> MASTER-masterdb:79607137892672 SLAVE:79607137715392
>>> Replay:79607137715392 :: REPLAY 173 KBytes (00:00:00.142605
>>> seconds)
>>> Tue Feb 24 15:04:53 MSK 2015 Stream:
>>> MASTER-masterdb:79607139327776 SLAVE:79607139241816
>>> Replay:79607139241816 :: REPLAY 84 KBytes (00:00:00.05223 seconds)
>>> Tue Feb 24 15:04:54 MSK 2015 Stream:
>>> MASTER-masterdb:79607141134776 SLAVE:79607141073344
>>> Replay:79607141080032 :: REPLAY 54 KBytes (00:00:00.010603 seconds)
>>> Tue Feb 24 15:04:55 MSK 2015 Stream:
>>> MASTER-masterdb:79607143085176 SLAVE:79607143026440
>>> Replay:79607143038040 :: REPLAY 46 KBytes (00:00:00.009506 seconds)
>>> Tue Feb 24 15:04:56 MSK 2015 Stream:
>>> MASTER-masterdb:79607145111280 SLAVE:79607145021384
>>> Replay:79607145025664 :: REPLAY 83 KBytes (00:00:00.006795 seconds)
>>> Tue Feb 24 15:04:57 MSK 2015 Stream:
>>> MASTER-masterdb:79607146564424 SLAVE:79607146478336
>>> Replay:79607146501264 :: REPLAY 61 KBytes (00:00:00.00701 seconds)
>>> Tue Feb 24 15:04:58 MSK 2015 Stream:
>>> MASTER-masterdb:79607148160680 SLAVE:79607148108352
>>> Replay:79607147369320 :: REPLAY 773 KBytes (00:00:00.449702
>>> seconds)
>>> Tue Feb 24 15:04:59 MSK 2015 Stream:
>>> MASTER-masterdb:79607150220688 SLAVE:79607150159632
>>> Replay:79607150171312 :: REPLAY 48 KBytes (00:00:00.006594 seconds)
>>> Tue Feb 24 15:05:00 MSK 2015 Stream:
>>> MASTER-masterdb:79607152365360 SLAVE:79607152262696
>>> Replay:79607152285240 :: REPLAY 78 KBytes (00:00:00.007042 seconds)
>>> Tue Feb 24 15:05:02 MSK 2015 Stream:
>>> MASTER-masterdb:79607154049848 SLAVE:79607154012624
>>> Replay:79607153446800 :: REPLAY 589 KBytes (00:00:00.513637
>>> seconds)
>>> Tue Feb 24 15:05:03 MSK 2015 Stream:
>>> MASTER-masterdb:79607155229992 SLAVE:79607155187864
>>> Replay:79607155188312 :: REPLAY 41 KBytes (00:00:00.004773 seconds)
>>> Tue Feb 24 15:05:04 MSK 2015 Stream:
>>> MASTER-masterdb:79607156833968 SLAVE:79607156764128
>>> Replay:79607156785488 :: REPLAY 47 KBytes (00:00:00.006846 seconds)
>>> Tue Feb 24 15:05:05 MSK 2015 Stream:
>>> MASTER-masterdb:79607158419848 SLAVE:79607158344856
>>> Replay:79607158396352 :: REPLAY 23 KBytes (00:00:00.005228 seconds)
>>> Tue Feb 24 15:05:06 MSK 2015 Stream:
>>> MASTER-masterdb:79607160004776 SLAVE:79607159962400
>>> Replay:79607159988888 :: REPLAY 16 KBytes (00:00:00.003162 seconds)
>>> *--here apply process just stops*
>>>
>>> Tue Feb 24 15:05:07 MSK 2015 Stream:
>>> MASTER-masterdb:79607161592048 SLAVE:79607161550576
>>> Replay:79607160986064 :: REPLAY 592 KBytes (00:00:00.398376
>>> seconds)
>>> Tue Feb 24 15:05:08 MSK 2015 Stream:
>>> MASTER-masterdb:79607163272840 SLAVE:79607163231384
>>> Replay:79607160986064 :: REPLAY 2233 KBytes (00:00:01.446759
>>> seconds)
>>> Tue Feb 24 15:05:09 MSK 2015 Stream:
>>> MASTER-masterdb:79607164958632 SLAVE:79607164904448
>>> Replay:79607160986064 :: REPLAY 3879 KBytes (00:00:02.497181
>>> seconds)
>>> Tue Feb 24 15:05:10 MSK 2015 Stream:
>>> MASTER-masterdb:79607166819560 SLAVE:79607166777712
>>> Replay:79607160986064 :: REPLAY 5697 KBytes (00:00:03.543107
>>> seconds)
>>> Tue Feb 24 15:05:11 MSK 2015 Stream:
>>> MASTER-masterdb:79607168595280 SLAVE:79607168566536
>>> Replay:79607160986064 :: REPLAY 7431 KBytes (00:00:04.589736
>>> seconds)
>>> Tue Feb 24 15:05:12 MSK 2015 Stream:
>>> MASTER-masterdb:79607170372064 SLAVE:79607170252480
>>> Replay:79607160986064 :: REPLAY 9166 KBytes (00:00:05.635918
>>> seconds)
>>> Tue Feb 24 15:05:13 MSK 2015 Stream:
>>> MASTER-masterdb:79607171829480 SLAVE:79607171714144
>>> Replay:79607160986064 :: REPLAY 10589 KBytes (00:00:06.688115
>>> seconds)
>>> Tue Feb 24 15:05:14 MSK 2015 Stream:
>>> MASTER-masterdb:79607173152488 SLAVE:79607173152488
>>> Replay:79607160986064 :: REPLAY 11881 KBytes (00:00:07.736993
>>> seconds)
>>> Tue Feb 24 15:05:15 MSK 2015 Stream:
>>> MASTER-masterdb:79607174149968 SLAVE:79607174149968
>>> Replay:79607160986064 :: REPLAY 12855 KBytes (00:00:08.78538
>>> seconds)
>>> Tue Feb 24 15:05:16 MSK 2015 Stream:
>>> MASTER-masterdb:79607176448344 SLAVE:79607176252088
>>> Replay:79607160986064 :: REPLAY 15100 KBytes (00:00:09.835184
>>> seconds)
>>> Tue Feb 24 15:05:17 MSK 2015 Stream:
>>> MASTER-masterdb:79607177632216 SLAVE:79607177608224
>>> Replay:79607160986064 :: REPLAY 16256 KBytes (00:00:10.926493
>>> seconds)
>>> Tue Feb 24 15:05:18 MSK 2015 Stream:
>>> MASTER-masterdb:79607179432960 SLAVE:79607179378096
>>> Replay:79607160986064 :: REPLAY 18015 KBytes (00:00:11.97989
>>> seconds)
>>> Tue Feb 24 15:05:19 MSK 2015 Stream:
>>> MASTER-masterdb:79607180893384 SLAVE:79607180874256
>>> Replay:79607160986064 :: REPLAY 19441 KBytes (00:00:13.028921
>>> seconds)
>>> Tue Feb 24 15:05:20 MSK 2015 Stream:
>>> MASTER-masterdb:79607182596224 SLAVE:79607182552272
>>> Replay:79607160986064 :: REPLAY 21104 KBytes (00:00:14.079497
>>> seconds)
>>> Tue Feb 24 15:05:21 MSK 2015 Stream:
>>> MASTER-masterdb:79607183935312 SLAVE:79607183902592
>>> Replay:79607160986064 :: REPLAY 22411 KBytes (00:00:15.127679
>>> seconds)
>>> Tue Feb 24 15:05:23 MSK 2015 Stream:
>>> MASTER-masterdb:79607185165880 SLAVE:79607185094032
>>> Replay:79607160986064 :: REPLAY 23613 KBytes (00:00:16.175132
>>> seconds)
>>> Tue Feb 24 15:05:24 MSK 2015 Stream:
>>> MASTER-masterdb:79607187196920 SLAVE:79607187169368
>>> Replay:79607160986064 :: REPLAY 25596 KBytes (00:00:17.221981
>>> seconds)
>>> Tue Feb 24 15:05:25 MSK 2015 Stream:
>>> MASTER-masterdb:79607188943856 SLAVE:79607188885952
>>> Replay:79607160986064 :: REPLAY 27302 KBytes (00:00:18.274362
>>> seconds)
>>> Tue Feb 24 15:05:26 MSK 2015 Stream:
>>> MASTER-masterdb:79607190489400 SLAVE:79607190443160
>>> Replay:79607160986064 :: REPLAY 28812 KBytes (00:00:19.319987
>>> seconds)
>>> Tue Feb 24 15:05:27 MSK 2015 Stream:
>>> MASTER-masterdb:79607192089312 SLAVE:79607192054048
>>> Replay:79607160986064 :: REPLAY 30374 KBytes (00:00:20.372305
>>> seconds)
>>> Tue Feb 24 15:05:28 MSK 2015 Stream:
>>> MASTER-masterdb:79607193736800 SLAVE:79607193690056
>>> Replay:79607160986064 :: REPLAY 31983 KBytes (00:00:21.421359
>>> seconds)
>>> Tue Feb 24 15:05:29 MSK 2015 Stream:
>>> MASTER-masterdb:79607195968648 SLAVE:79607195901296
>>> Replay:79607160986064 :: REPLAY 34163 KBytes (00:00:22.471334
>>> seconds)
>>> Tue Feb 24 15:05:30 MSK 2015 Stream:
>>> MASTER-masterdb:79607197808840 SLAVE:79607197737720
>>> Replay:79607160986064 :: REPLAY 35960 KBytes (00:00:23.52269
>>> seconds)
>>> Tue Feb 24 15:05:31 MSK 2015 Stream:
>>> MASTER-masterdb:79607199571144 SLAVE:79607199495976
>>> Replay:79607160986064 :: REPLAY 37681 KBytes (00:00:24.577615
>>> seconds)
>>> Tue Feb 24 15:05:32 MSK 2015 Stream:
>>> MASTER-masterdb:79607201206104 SLAVE:79607201100392
>>> Replay:79607160986064 :: REPLAY 39277 KBytes (00:00:25.624604
>>> seconds)
>>> Tue Feb 24 15:05:33 MSK 2015 Stream:
>>> MASTER-masterdb:79607203174208 SLAVE:79607203111136
>>> Replay:79607160986064 :: REPLAY 41199 KBytes (00:00:26.67059
>>> seconds)
>>> Tue Feb 24 15:05:34 MSK 2015 Stream:
>>> MASTER-masterdb:79607204792888 SLAVE:79607204741600
>>> Replay:79607160986064 :: REPLAY 42780 KBytes (00:00:27.719088
>>> seconds)
>>> Tue Feb 24 15:05:35 MSK 2015 Stream:
>>> MASTER-masterdb:79607206453216 SLAVE:79607206409032
>>> Replay:79607160986064 :: REPLAY 44401 KBytes (00:00:28.766647
>>> seconds)
>>> Tue Feb 24 15:05:36 MSK 2015 Stream:
>>> MASTER-masterdb:79607208225344 SLAVE:79607208142176
>>> Replay:79607160986064 :: REPLAY 46132 KBytes (00:00:29.811434
>>> seconds)
>>>
>>>
>>> perf shows the following functions on the top
>>> + 22.50% postmaster [kernel.kallsyms] [k]
>>> copy_user_generic_string
>>> + 8.48% postmaster postgres [.]
>>> hash_search_with_hash_value
>>>
>>>
>>> after 10 minutes or so the apply process continue to work
>>>
>>> Tue Feb 24 15:13:25 MSK 2015 Stream:
>>> MASTER-masterdb:79608758742560 SLAVE:79608758718008
>>> Replay:79607160986064 :: REPLAY 1560309 KBytes (00:08:19.009653
>>> seconds)
>>> Tue Feb 24 15:13:26 MSK 2015 Stream:
>>> MASTER-masterdb:79608759203608 SLAVE:79608759189680
>>> Replay:79607160986064 :: REPLAY 1560759 KBytes (00:08:20.057877
>>> seconds)
>>> Tue Feb 24 15:13:27 MSK 2015 Stream:
>>> MASTER-masterdb:79608759639680 SLAVE:79608759633224
>>> Replay:79607160986064 :: REPLAY 1561185 KBytes (00:08:21.104723
>>> seconds)
>>> Tue Feb 24 15:13:28 MSK 2015 Stream:
>>> MASTER-masterdb:79608760271200 SLAVE:79608760264128
>>> Replay:79607160986064 :: REPLAY 1561802 KBytes (00:08:22.148546
>>> seconds)
>>> Tue Feb 24 15:13:30 MSK 2015 Stream:
>>> MASTER-masterdb:79608760622920 SLAVE:79608760616656
>>> Replay:79607160986064 :: REPLAY 1562145 KBytes (00:08:23.196645
>>> seconds)
>>> Tue Feb 24 15:13:31 MSK 2015 Stream:
>>> MASTER-masterdb:79608761122040 SLAVE:79608761084584
>>> Replay:79607160986064 :: REPLAY 1562633 KBytes (00:08:24.240653
>>> seconds)
>>> Tue Feb 24 15:13:32 MSK 2015 Stream:
>>> MASTER-masterdb:79608761434200 SLAVE:79608761426080
>>> Replay:79607160986064 :: REPLAY 1562938 KBytes (00:08:25.289429
>>> seconds)
>>> Tue Feb 24 15:13:33 MSK 2015 Stream:
>>> MASTER-masterdb:79608761931008 SLAVE:79608761904808
>>> Replay:79607160986064 :: REPLAY 1563423 KBytes (00:08:26.338498
>>> seconds)
>>> *--apply starts*
>>> Tue Feb 24 15:13:34 MSK 2015 Stream:
>>> MASTER-masterdb:79608762360568 SLAVE:79608762325712
>>> Replay:79607163554680 :: REPLAY 1561334 KBytes (00:08:25.702423
>>> seconds)
>>> Tue Feb 24 15:13:35 MSK 2015 Stream:
>>> MASTER-masterdb:79608762891224 SLAVE:79608762885928
>>> Replay:79607166466488 :: REPLAY 1559008 KBytes (00:08:25.011046
>>> seconds)
>>> Tue Feb 24 15:13:36 MSK 2015 Stream:
>>> MASTER-masterdb:79608763681920 SLAVE:79608763667256
>>> Replay:79607167054056 :: REPLAY 1559207 KBytes (00:08:25.827531
>>> seconds)
>>> Tue Feb 24 15:13:37 MSK 2015 Stream:
>>> MASTER-masterdb:79608764207088 SLAVE:79608764197744
>>> Replay:79607175610296 :: REPLAY 1551364 KBytes (00:08:21.182428
>>> seconds)
>>> Tue Feb 24 15:13:38 MSK 2015 Stream:
>>> MASTER-masterdb:79608764857920 SLAVE:79608764832432
>>> Replay:79607183599632 :: REPLAY 1544197 KBytes (00:08:16.742467
>>> seconds)
>>> Tue Feb 24 15:13:39 MSK 2015 Stream:
>>> MASTER-masterdb:79608765323360 SLAVE:79608765281408
>>> Replay:79607186862176 :: REPLAY 1541466 KBytes (00:08:15.569874
>>> seconds)
>>> Tue Feb 24 15:13:40 MSK 2015 Stream:
>>> MASTER-masterdb:79608765848240 SLAVE:79608765824520
>>> Replay:79607186862176 :: REPLAY 1541978 KBytes (00:08:16.620932
>>> seconds)
>>>
>>>
>>> All this is a result of completion of "vacuum verbose analyze
>>> master_table" on the master site
>>>
>>> Any help would be appreciated
>>>
>>> --
>>> Best regards,
>>> Sergey Shchukin
>>>
>>>
>>>
>>>
>>> --
>>>
>>> *Radovan Jablonovsky* | SaaS DBA | Phone 1-403-262-6519 (ext. 7256) |
>>> Fax 1-403-233-8046
>>>
>>>
>>>
>>
>
>
Hi Jim,

The version is _PostgreSQL 9.3.6_ on x86_64 RHEL 6.6

show max_standby_streaming_delay;
max_standby_streaming_delay
-----------------------------
30s

-
Best regards,
Sergey Shchukin

In response to

Responses

Browse pgadmin-support by date

  From Date Subject
Next Message Sergey Shchukin 2015-02-27 11:42:39 Re: Re: [pgadmin-support] Issue with a hanging apply process on the replica db after vacuum works on primary
Previous Message Jim Nasby 2015-02-27 08:52:34 Re: Re: [pgadmin-support] Issue with a hanging apply process on the replica db after vacuum works on primary

Browse pgsql-admin by date

  From Date Subject
Next Message Sergey Shchukin 2015-02-27 11:42:39 Re: Re: [pgadmin-support] Issue with a hanging apply process on the replica db after vacuum works on primary
Previous Message Albe Laurenz 2015-02-27 10:33:44 Re: Security with V9.3.3 standby servers

Browse pgsql-general by date

  From Date Subject
Next Message Sergey Shchukin 2015-02-27 11:42:39 Re: Re: [pgadmin-support] Issue with a hanging apply process on the replica db after vacuum works on primary
Previous Message Jim Nasby 2015-02-27 08:59:04 Re: express composite type literal as text

Browse pgsql-hackers by date

  From Date Subject
Next Message Marc Cousin 2015-02-27 11:29:44 star schema and the optimizer
Previous Message Andrew Gierth 2015-02-27 11:10:54 Re: Reduce pinning in btree indexes

Browse pgsql-performance by date

  From Date Subject
Next Message Sergey Shchukin 2015-02-27 11:42:39 Re: Re: [pgadmin-support] Issue with a hanging apply process on the replica db after vacuum works on primary
Previous Message Jim Nasby 2015-02-27 08:52:34 Re: Re: [pgadmin-support] Issue with a hanging apply process on the replica db after vacuum works on primary