Re: Re: [pgadmin-support] Issue with a hanging apply process on the replica db after vacuum works on primary

From: Sergey Shchukin <shchukin(dot)s(dot)a(at)gmail(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Radovan Jablonovsky <radovan(dot)jablonovsky(at)replicon(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Re: [pgadmin-support] Issue with a hanging apply process on the replica db after vacuum works on primary
Date: 2015-02-27 11:42:39
Message-ID: 54F0582F.7010402@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgadmin-support pgsql-admin pgsql-general pgsql-hackers pgsql-performance

27.02.2015 14:11, Sergey Shchukin пишет:
> 27.02.2015 11:52, Jim Nasby пишет:
>> On 2/26/15 12:25 AM, Sergey Shchukin wrote:
>>> Hi Radovan !
>>>
>>> Thank you for the reply. The question is that this table is not a
>>> subject for a massive updates/deletes.
>>>
>>> Is there any additional traces except from perf or pg_top to trace what
>>> replica is doing at the particular moment when we are lagging in
>>> replay?
>>> To see locks or spins or sleeps etc..
>>
>> Please don't top-post.
>>
>> What version is this? What is max_standby_streaming_delay set to?
>>
>>> Thank you!
>>>
>>> -
>>>
>>> Best regards,
>>> Sergey Shchukin
>>>
>>> 24.02.2015 19:05, Radovan Jablonovsky пишет:
>>>> This looks like more issue for pgsql-general mailing list.
>>>>
>>>> Possible solutions
>>>> 1) Set specific autovacuum parameters on the big table. The autovacuum
>>>> could vacuum table on multiple runs based on the thresholds and cost
>>>> settings
>>>> Example of setting specific values of autovacuum and analyze for
>>>> table. It should be adjusted for your system, work load, table
>>>> usage, etc:
>>>> alter table "my_schema"."my_big_table" set (fillfactor = 80,
>>>> autovacuum_enabled = true, autovacuum_vacuum_threshold = 200,
>>>> autovacuum_analyze_threshold = 400, autovacuum_vacuum_scale_factor =
>>>> 0.05, autovacuum_analyze_scale_factor = 0.005,
>>>> autovacuum_vacuum_cost_delay = 10, autovacuum_vacuum_cost_limit =
>>>> 5000);
>>>>
>>>> 2) Could be to partition the large table on master site and vacuum it
>>>> partition by partition.
>>>>
>>>> On Tue, Feb 24, 2015 at 6:42 AM, Sergey Shchukin
>>>> <shchukin(dot)s(dot)a(at)gmail(dot)com <mailto:shchukin(dot)s(dot)a(at)gmail(dot)com>> wrote:
>>>>
>>>> Hi all!
>>>>
>>>> May someone help me with the issue in the apply process on the
>>>> replica. We have a stream replication and after vacuum stops
>>>> working with a big table we get a "freeze" in applying data on the
>>>> replica database. It looks like this:
>>>>
>>>> Tue Feb 24 15:04:51 MSK 2015 Stream:
>>>> MASTER-masterdb:79607136410456 SLAVE:79607136410456
>>>> Replay:79607136339456 :: REPLAY 69 KBytes (00:00:00.294485
>>>> seconds)
>>>> Tue Feb 24 15:04:52 MSK 2015 Stream:
>>>> MASTER-masterdb:79607137892672 SLAVE:79607137715392
>>>> Replay:79607137715392 :: REPLAY 173 KBytes (00:00:00.142605
>>>> seconds)
>>>> Tue Feb 24 15:04:53 MSK 2015 Stream:
>>>> MASTER-masterdb:79607139327776 SLAVE:79607139241816
>>>> Replay:79607139241816 :: REPLAY 84 KBytes (00:00:00.05223 seconds)
>>>> Tue Feb 24 15:04:54 MSK 2015 Stream:
>>>> MASTER-masterdb:79607141134776 SLAVE:79607141073344
>>>> Replay:79607141080032 :: REPLAY 54 KBytes (00:00:00.010603
>>>> seconds)
>>>> Tue Feb 24 15:04:55 MSK 2015 Stream:
>>>> MASTER-masterdb:79607143085176 SLAVE:79607143026440
>>>> Replay:79607143038040 :: REPLAY 46 KBytes (00:00:00.009506
>>>> seconds)
>>>> Tue Feb 24 15:04:56 MSK 2015 Stream:
>>>> MASTER-masterdb:79607145111280 SLAVE:79607145021384
>>>> Replay:79607145025664 :: REPLAY 83 KBytes (00:00:00.006795
>>>> seconds)
>>>> Tue Feb 24 15:04:57 MSK 2015 Stream:
>>>> MASTER-masterdb:79607146564424 SLAVE:79607146478336
>>>> Replay:79607146501264 :: REPLAY 61 KBytes (00:00:00.00701 seconds)
>>>> Tue Feb 24 15:04:58 MSK 2015 Stream:
>>>> MASTER-masterdb:79607148160680 SLAVE:79607148108352
>>>> Replay:79607147369320 :: REPLAY 773 KBytes (00:00:00.449702
>>>> seconds)
>>>> Tue Feb 24 15:04:59 MSK 2015 Stream:
>>>> MASTER-masterdb:79607150220688 SLAVE:79607150159632
>>>> Replay:79607150171312 :: REPLAY 48 KBytes (00:00:00.006594
>>>> seconds)
>>>> Tue Feb 24 15:05:00 MSK 2015 Stream:
>>>> MASTER-masterdb:79607152365360 SLAVE:79607152262696
>>>> Replay:79607152285240 :: REPLAY 78 KBytes (00:00:00.007042
>>>> seconds)
>>>> Tue Feb 24 15:05:02 MSK 2015 Stream:
>>>> MASTER-masterdb:79607154049848 SLAVE:79607154012624
>>>> Replay:79607153446800 :: REPLAY 589 KBytes (00:00:00.513637
>>>> seconds)
>>>> Tue Feb 24 15:05:03 MSK 2015 Stream:
>>>> MASTER-masterdb:79607155229992 SLAVE:79607155187864
>>>> Replay:79607155188312 :: REPLAY 41 KBytes (00:00:00.004773
>>>> seconds)
>>>> Tue Feb 24 15:05:04 MSK 2015 Stream:
>>>> MASTER-masterdb:79607156833968 SLAVE:79607156764128
>>>> Replay:79607156785488 :: REPLAY 47 KBytes (00:00:00.006846
>>>> seconds)
>>>> Tue Feb 24 15:05:05 MSK 2015 Stream:
>>>> MASTER-masterdb:79607158419848 SLAVE:79607158344856
>>>> Replay:79607158396352 :: REPLAY 23 KBytes (00:00:00.005228
>>>> seconds)
>>>> Tue Feb 24 15:05:06 MSK 2015 Stream:
>>>> MASTER-masterdb:79607160004776 SLAVE:79607159962400
>>>> Replay:79607159988888 :: REPLAY 16 KBytes (00:00:00.003162
>>>> seconds)
>>>> *--here apply process just stops*
>>>>
>>>> Tue Feb 24 15:05:07 MSK 2015 Stream:
>>>> MASTER-masterdb:79607161592048 SLAVE:79607161550576
>>>> Replay:79607160986064 :: REPLAY 592 KBytes (00:00:00.398376
>>>> seconds)
>>>> Tue Feb 24 15:05:08 MSK 2015 Stream:
>>>> MASTER-masterdb:79607163272840 SLAVE:79607163231384
>>>> Replay:79607160986064 :: REPLAY 2233 KBytes (00:00:01.446759
>>>> seconds)
>>>> Tue Feb 24 15:05:09 MSK 2015 Stream:
>>>> MASTER-masterdb:79607164958632 SLAVE:79607164904448
>>>> Replay:79607160986064 :: REPLAY 3879 KBytes (00:00:02.497181
>>>> seconds)
>>>> Tue Feb 24 15:05:10 MSK 2015 Stream:
>>>> MASTER-masterdb:79607166819560 SLAVE:79607166777712
>>>> Replay:79607160986064 :: REPLAY 5697 KBytes (00:00:03.543107
>>>> seconds)
>>>> Tue Feb 24 15:05:11 MSK 2015 Stream:
>>>> MASTER-masterdb:79607168595280 SLAVE:79607168566536
>>>> Replay:79607160986064 :: REPLAY 7431 KBytes (00:00:04.589736
>>>> seconds)
>>>> Tue Feb 24 15:05:12 MSK 2015 Stream:
>>>> MASTER-masterdb:79607170372064 SLAVE:79607170252480
>>>> Replay:79607160986064 :: REPLAY 9166 KBytes (00:00:05.635918
>>>> seconds)
>>>> Tue Feb 24 15:05:13 MSK 2015 Stream:
>>>> MASTER-masterdb:79607171829480 SLAVE:79607171714144
>>>> Replay:79607160986064 :: REPLAY 10589 KBytes (00:00:06.688115
>>>> seconds)
>>>> Tue Feb 24 15:05:14 MSK 2015 Stream:
>>>> MASTER-masterdb:79607173152488 SLAVE:79607173152488
>>>> Replay:79607160986064 :: REPLAY 11881 KBytes (00:00:07.736993
>>>> seconds)
>>>> Tue Feb 24 15:05:15 MSK 2015 Stream:
>>>> MASTER-masterdb:79607174149968 SLAVE:79607174149968
>>>> Replay:79607160986064 :: REPLAY 12855 KBytes (00:00:08.78538
>>>> seconds)
>>>> Tue Feb 24 15:05:16 MSK 2015 Stream:
>>>> MASTER-masterdb:79607176448344 SLAVE:79607176252088
>>>> Replay:79607160986064 :: REPLAY 15100 KBytes (00:00:09.835184
>>>> seconds)
>>>> Tue Feb 24 15:05:17 MSK 2015 Stream:
>>>> MASTER-masterdb:79607177632216 SLAVE:79607177608224
>>>> Replay:79607160986064 :: REPLAY 16256 KBytes (00:00:10.926493
>>>> seconds)
>>>> Tue Feb 24 15:05:18 MSK 2015 Stream:
>>>> MASTER-masterdb:79607179432960 SLAVE:79607179378096
>>>> Replay:79607160986064 :: REPLAY 18015 KBytes (00:00:11.97989
>>>> seconds)
>>>> Tue Feb 24 15:05:19 MSK 2015 Stream:
>>>> MASTER-masterdb:79607180893384 SLAVE:79607180874256
>>>> Replay:79607160986064 :: REPLAY 19441 KBytes (00:00:13.028921
>>>> seconds)
>>>> Tue Feb 24 15:05:20 MSK 2015 Stream:
>>>> MASTER-masterdb:79607182596224 SLAVE:79607182552272
>>>> Replay:79607160986064 :: REPLAY 21104 KBytes (00:00:14.079497
>>>> seconds)
>>>> Tue Feb 24 15:05:21 MSK 2015 Stream:
>>>> MASTER-masterdb:79607183935312 SLAVE:79607183902592
>>>> Replay:79607160986064 :: REPLAY 22411 KBytes (00:00:15.127679
>>>> seconds)
>>>> Tue Feb 24 15:05:23 MSK 2015 Stream:
>>>> MASTER-masterdb:79607185165880 SLAVE:79607185094032
>>>> Replay:79607160986064 :: REPLAY 23613 KBytes (00:00:16.175132
>>>> seconds)
>>>> Tue Feb 24 15:05:24 MSK 2015 Stream:
>>>> MASTER-masterdb:79607187196920 SLAVE:79607187169368
>>>> Replay:79607160986064 :: REPLAY 25596 KBytes (00:00:17.221981
>>>> seconds)
>>>> Tue Feb 24 15:05:25 MSK 2015 Stream:
>>>> MASTER-masterdb:79607188943856 SLAVE:79607188885952
>>>> Replay:79607160986064 :: REPLAY 27302 KBytes (00:00:18.274362
>>>> seconds)
>>>> Tue Feb 24 15:05:26 MSK 2015 Stream:
>>>> MASTER-masterdb:79607190489400 SLAVE:79607190443160
>>>> Replay:79607160986064 :: REPLAY 28812 KBytes (00:00:19.319987
>>>> seconds)
>>>> Tue Feb 24 15:05:27 MSK 2015 Stream:
>>>> MASTER-masterdb:79607192089312 SLAVE:79607192054048
>>>> Replay:79607160986064 :: REPLAY 30374 KBytes (00:00:20.372305
>>>> seconds)
>>>> Tue Feb 24 15:05:28 MSK 2015 Stream:
>>>> MASTER-masterdb:79607193736800 SLAVE:79607193690056
>>>> Replay:79607160986064 :: REPLAY 31983 KBytes (00:00:21.421359
>>>> seconds)
>>>> Tue Feb 24 15:05:29 MSK 2015 Stream:
>>>> MASTER-masterdb:79607195968648 SLAVE:79607195901296
>>>> Replay:79607160986064 :: REPLAY 34163 KBytes (00:00:22.471334
>>>> seconds)
>>>> Tue Feb 24 15:05:30 MSK 2015 Stream:
>>>> MASTER-masterdb:79607197808840 SLAVE:79607197737720
>>>> Replay:79607160986064 :: REPLAY 35960 KBytes (00:00:23.52269
>>>> seconds)
>>>> Tue Feb 24 15:05:31 MSK 2015 Stream:
>>>> MASTER-masterdb:79607199571144 SLAVE:79607199495976
>>>> Replay:79607160986064 :: REPLAY 37681 KBytes (00:00:24.577615
>>>> seconds)
>>>> Tue Feb 24 15:05:32 MSK 2015 Stream:
>>>> MASTER-masterdb:79607201206104 SLAVE:79607201100392
>>>> Replay:79607160986064 :: REPLAY 39277 KBytes (00:00:25.624604
>>>> seconds)
>>>> Tue Feb 24 15:05:33 MSK 2015 Stream:
>>>> MASTER-masterdb:79607203174208 SLAVE:79607203111136
>>>> Replay:79607160986064 :: REPLAY 41199 KBytes (00:00:26.67059
>>>> seconds)
>>>> Tue Feb 24 15:05:34 MSK 2015 Stream:
>>>> MASTER-masterdb:79607204792888 SLAVE:79607204741600
>>>> Replay:79607160986064 :: REPLAY 42780 KBytes (00:00:27.719088
>>>> seconds)
>>>> Tue Feb 24 15:05:35 MSK 2015 Stream:
>>>> MASTER-masterdb:79607206453216 SLAVE:79607206409032
>>>> Replay:79607160986064 :: REPLAY 44401 KBytes (00:00:28.766647
>>>> seconds)
>>>> Tue Feb 24 15:05:36 MSK 2015 Stream:
>>>> MASTER-masterdb:79607208225344 SLAVE:79607208142176
>>>> Replay:79607160986064 :: REPLAY 46132 KBytes (00:00:29.811434
>>>> seconds)
>>>>
>>>>
>>>> perf shows the following functions on the top
>>>> + 22.50% postmaster [kernel.kallsyms] [k]
>>>> copy_user_generic_string
>>>> + 8.48% postmaster postgres [.]
>>>> hash_search_with_hash_value
>>>>
>>>>
>>>> after 10 minutes or so the apply process continue to work
>>>>
>>>> Tue Feb 24 15:13:25 MSK 2015 Stream:
>>>> MASTER-masterdb:79608758742560 SLAVE:79608758718008
>>>> Replay:79607160986064 :: REPLAY 1560309 KBytes (00:08:19.009653
>>>> seconds)
>>>> Tue Feb 24 15:13:26 MSK 2015 Stream:
>>>> MASTER-masterdb:79608759203608 SLAVE:79608759189680
>>>> Replay:79607160986064 :: REPLAY 1560759 KBytes (00:08:20.057877
>>>> seconds)
>>>> Tue Feb 24 15:13:27 MSK 2015 Stream:
>>>> MASTER-masterdb:79608759639680 SLAVE:79608759633224
>>>> Replay:79607160986064 :: REPLAY 1561185 KBytes (00:08:21.104723
>>>> seconds)
>>>> Tue Feb 24 15:13:28 MSK 2015 Stream:
>>>> MASTER-masterdb:79608760271200 SLAVE:79608760264128
>>>> Replay:79607160986064 :: REPLAY 1561802 KBytes (00:08:22.148546
>>>> seconds)
>>>> Tue Feb 24 15:13:30 MSK 2015 Stream:
>>>> MASTER-masterdb:79608760622920 SLAVE:79608760616656
>>>> Replay:79607160986064 :: REPLAY 1562145 KBytes (00:08:23.196645
>>>> seconds)
>>>> Tue Feb 24 15:13:31 MSK 2015 Stream:
>>>> MASTER-masterdb:79608761122040 SLAVE:79608761084584
>>>> Replay:79607160986064 :: REPLAY 1562633 KBytes (00:08:24.240653
>>>> seconds)
>>>> Tue Feb 24 15:13:32 MSK 2015 Stream:
>>>> MASTER-masterdb:79608761434200 SLAVE:79608761426080
>>>> Replay:79607160986064 :: REPLAY 1562938 KBytes (00:08:25.289429
>>>> seconds)
>>>> Tue Feb 24 15:13:33 MSK 2015 Stream:
>>>> MASTER-masterdb:79608761931008 SLAVE:79608761904808
>>>> Replay:79607160986064 :: REPLAY 1563423 KBytes (00:08:26.338498
>>>> seconds)
>>>> *--apply starts*
>>>> Tue Feb 24 15:13:34 MSK 2015 Stream:
>>>> MASTER-masterdb:79608762360568 SLAVE:79608762325712
>>>> Replay:79607163554680 :: REPLAY 1561334 KBytes (00:08:25.702423
>>>> seconds)
>>>> Tue Feb 24 15:13:35 MSK 2015 Stream:
>>>> MASTER-masterdb:79608762891224 SLAVE:79608762885928
>>>> Replay:79607166466488 :: REPLAY 1559008 KBytes (00:08:25.011046
>>>> seconds)
>>>> Tue Feb 24 15:13:36 MSK 2015 Stream:
>>>> MASTER-masterdb:79608763681920 SLAVE:79608763667256
>>>> Replay:79607167054056 :: REPLAY 1559207 KBytes (00:08:25.827531
>>>> seconds)
>>>> Tue Feb 24 15:13:37 MSK 2015 Stream:
>>>> MASTER-masterdb:79608764207088 SLAVE:79608764197744
>>>> Replay:79607175610296 :: REPLAY 1551364 KBytes (00:08:21.182428
>>>> seconds)
>>>> Tue Feb 24 15:13:38 MSK 2015 Stream:
>>>> MASTER-masterdb:79608764857920 SLAVE:79608764832432
>>>> Replay:79607183599632 :: REPLAY 1544197 KBytes (00:08:16.742467
>>>> seconds)
>>>> Tue Feb 24 15:13:39 MSK 2015 Stream:
>>>> MASTER-masterdb:79608765323360 SLAVE:79608765281408
>>>> Replay:79607186862176 :: REPLAY 1541466 KBytes (00:08:15.569874
>>>> seconds)
>>>> Tue Feb 24 15:13:40 MSK 2015 Stream:
>>>> MASTER-masterdb:79608765848240 SLAVE:79608765824520
>>>> Replay:79607186862176 :: REPLAY 1541978 KBytes (00:08:16.620932
>>>> seconds)
>>>>
>>>>
>>>> All this is a result of completion of "vacuum verbose analyze
>>>> master_table" on the master site
>>>>
>>>> Any help would be appreciated
>>>>
>>>> --
>>>> Best regards,
>>>> Sergey Shchukin
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Radovan Jablonovsky* | SaaS DBA | Phone 1-403-262-6519 (ext. 7256) |
>>>> Fax 1-403-233-8046
>>>>
>>>>
>>>>
>>>
>>
>>
> Hi Jim,
>
> The version is _PostgreSQL 9.3.6_ on x86_64 RHEL 6.6
>
> show max_standby_streaming_delay;
> max_standby_streaming_delay
> -----------------------------
> 30s
>

Again, after the vacuum finished on my table I got locks in apply
process on replica - *see lag_byte *

masterdb01d/masterdb M # vacuum verbose rtable.rtable_uidl;
INFO: 00000: vacuuming "rtable.rtable_uidl"
LOCATION: lazy_scan_heap, vacuumlazy.c:438
INFO: 00000: scanned index "pk_rtable_uidl" to remove 6 row versions
DETAIL: CPU 240.80s/183.19u sec elapsed 703.85 sec.
LOCATION: lazy_vacuum_index, vacuumlazy.c:1335
INFO: 00000: "rtable_uidl": removed 6 row versions in 6 pages
DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec.
LOCATION: lazy_vacuum_heap, vacuumlazy.c:1169
INFO: 00000: index "pk_rtable_uidl" now contains 3763411079 row
versions in 32755911 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
LOCATION: lazy_cleanup_index, vacuumlazy.c:1387
INFO: 00000: "rtable_uidl": found 6 removable, 1426488 nonremovable row
versions in 12734 out of 26047416 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 3 unused item pointers.
0 pages are entirely empty.
CPU 241.37s/184.04u sec elapsed 705.45 sec.
LOCATION: lazy_scan_heap, vacuumlazy.c:1101
VACUUM
Time: 705685.954 ms

masterdb01d/postgres M # \g
-[ RECORD 1 ]----+------------------------------
procpid | 21487
usesysid | 16413
usename | repl
application_name | walreceiver
client_addr |
client_hostname | masterdb01e
client_port | 35261
backend_start | 2015-02-27 13:02:27.203938+03
state | streaming
sent_location | 494B/CB30B530
write_location | 494B/CB30B530
flush_location | 494B/CB30B530
replay_location | *494B/A02B9070 <<< stopped here!1*
sync_priority | 0
sync_state | async
total_lag_byte | 721757376
total_lag_nice | 688 MB
lag_byte | 721757376
lag_byte_nice | 688 MB

--
Best regards,
Sergey Shchukin

In response to

Responses

Browse pgadmin-support by date

  From Date Subject
Next Message Clodoaldo Neto 2015-02-27 14:36:28 Pass a URI as a pgAdmin parameter
Previous Message Sergey Shchukin 2015-02-27 11:11:14 Re: Re: [pgadmin-support] Issue with a hanging apply process on the replica db after vacuum works on primary

Browse pgsql-admin by date

  From Date Subject
Next Message Josef Springer 2015-02-27 12:02:07 Installing PostgreSQL fails
Previous Message Sergey Shchukin 2015-02-27 11:11:14 Re: Re: [pgadmin-support] Issue with a hanging apply process on the replica db after vacuum works on primary

Browse pgsql-general by date

  From Date Subject
Next Message gmb 2015-02-27 14:07:27 Performance on DISABLE TRIGGER
Previous Message Sergey Shchukin 2015-02-27 11:11:14 Re: Re: [pgadmin-support] Issue with a hanging apply process on the replica db after vacuum works on primary

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeevan Chalke 2015-02-27 12:04:06 Re: How about to have relnamespace and relrole?
Previous Message David Rowley 2015-02-27 11:30:00 Re: Performance improvement for joins where outer side is unique

Browse pgsql-performance by date

  From Date Subject
Next Message Josh Berkus 2015-02-28 01:28:06 Bad cost estimate with FALSE filter condition
Previous Message Sergey Shchukin 2015-02-27 11:11:14 Re: Re: [pgadmin-support] Issue with a hanging apply process on the replica db after vacuum works on primary