hot standby lagging vs warm that is up-to-date

From: MirrorX <mirrorx(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: hot standby lagging vs warm that is up-to-date
Date: 2012-08-29 09:16:19
Message-ID: 1346231779970-5721711.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

hello!

i am facing a rather 'weird' issue so please if you have ideas/thoughs share
them.

i have a setup of a master server and hot standby one. the database settings
on both are identical, the specs of the servers are the same except of the
disks. the disks on the standby are much slower than the master's.

what happens is that the standby regularly falls behind. when this happens,
streaming replication does not work any more and the server starts applying
the wal archives (that are being rsync-ed from the master) from the point
when streaming replication was interrupted. when this is over and no more
archives are there to be applied, streaming replication is back online.

the problem is that during the apply of the archives, sometimes the process
is being 'stuck' for too long on some archives (maybe even more than 30
minutes for a single archive or even 2 hours on some occasions). at that
point, running an 'iostat' command shows one of the disks(not always the
same disk) being used 100%. if i stop the standby server and bring it back
online in a 'warm standby setup' (by using the pg_standby utility into the
recovery.conf file) then the apply of all the archives is very fast (even
for those archives that were stuck in the hot-standby setup) and the iostat
never shows more than 10-20% util on the disks where the data reside.

has anyone seen anything similar?
pls let me know which extra information would be useful.
some specs for the servers are the following:
16 cpus,
64GB ram,
red hat 5.6

and from the postgreql.conf the settings from the master are these (those of
the stadby are the same, except the hot_standby option that is switched to
'on') ->
version | PostgreSQL 9.0.5 on
x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat
4.1.2-46), 64-bit
archive_command | cp %p /archives/%f
archive_mode | on
autovacuum_analyze_scale_factor | 0.1
autovacuum_max_workers | 5
autovacuum_vacuum_cost_delay | 10ms
autovacuum_vacuum_scale_factor | 0.2
bgwriter_delay | 400ms
bgwriter_lru_maxpages | 50
checkpoint_completion_target | 0.9
checkpoint_segments | 300
checkpoint_timeout | 8min
effective_cache_size | 50GB
hot_standby | on
lc_collate | en_US.UTF-8
lc_ctype | en_US.UTF-8
listen_addresses | *
log_checkpoints | on
log_destination | stderr
log_filename | postgresql-%a.log
log_line_prefix | %t [%p]: [%l-1] user=%u,db=%d,remote=%r
log_min_duration_statement | 1s
log_rotation_age | 1d
log_truncate_on_rotation | on
logging_collector | on
maintenance_work_mem | 2GB
max_connections | 1200
max_prepared_transactions | 1000
max_stack_depth | 6MB
max_wal_senders | 5
port | 5432
server_encoding | UTF8
shared_buffers | 10GB
synchronous_commit | off
temp_buffers | 12800
TimeZone | UTC
wal_buffers | 16MB
wal_keep_segments | 768
wal_level | hot_standby
work_mem | 30MB

thank you in advance!

--
View this message in context: http://postgresql.1045698.n5.nabble.com/hot-standby-lagging-vs-warm-that-is-up-to-date-tp5721711.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message stefan 2012-08-29 10:31:27 BUG #7509: x NOT IN (select x from z) extremely slow in compare to select x from y except select x from z;
Previous Message Chris Travers 2012-08-29 04:09:02 Re: BUG #6489: Alter table with composite type/table