Re: High replication lag - Stream Replication

From: Marcelo Kruger <marcelo(dot)kruger(at)neoway(dot)com(dot)br>
To: Shreeyansh Dba <shreeyansh2014(at)gmail(dot)com>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: High replication lag - Stream Replication
Date: 2017-11-08 16:04:02
Message-ID: CAN8toQNBr6h0JqWEpF+QHDqVh_Gwo7C+XvoG363ZgDwVzcQ8fQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi Shreeyansh,

Thank you for your help.

The lag is not occurring in the recording of the archive in production, nor
in sending to StandBy. The lag is occurring in the archive application in
StandBy. And this slowness increases with running queries in the StandBy
database.

The process of recovering is consuming little CPU, and little I / O.

postgres: startup process recovering 0000000100001B4300000006

This parameter *log_min_duration_statement *should be changed on the
StandBy server?

2017-11-08 12:09 GMT-02:00 Shreeyansh Dba <shreeyansh2014(at)gmail(dot)com>:

>
>
> On Wed, Nov 8, 2017 at 7:08 PM, Marcelo Kruger <
> marcelo(dot)kruger(at)neoway(dot)com(dot)br> wrote:
>
>> Good afternoon,
>>
>> I have two servers hosted on Azure with PostgreSQL 9.6 databases. One
>> database is production, the other is standby.
>> For replication I use stream replication between servers. However the
>> application of the archives in the standby database is slow, always
>> generating replication lag.
>>
>> The replication lag increases considerably when queries are performed on
>> the StandBy database. I wonder if there is any setting that can be made to
>> improve this performance.
>>
>> *Server Production*
>> SO: CentOS Linux release 7.4.1708 (Core) 64Bits
>> CPU: Intel(R) Xeon(R) CPU E5-2698B v3 @ 2.00GHz - 15 Cores
>> Memory: 240GB
>> I/O: 17TB RAID0 (17 Disks (1TB for disk) SSD with read and write cache
>> enabled)
>>
>> *Server StandBy*
>> SO: CentOS Linux release 7.4.1708 (Core) 64Bits
>> CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz - 20 Cores
>> Memory: 140GB
>> I/O: 22TB RAID0 (22 Disks (1TB for disk) SSD with read and write cache
>> enabled)
>>
>> *Configuration PostgreSQL Production*
>>
>> postgres.conf
>>
>> listen_addresses = '*'
>> port = 5433
>> max_connections = 2000
>> superuser_reserved_connections = 3
>> shared_buffers = 56GB
>> work_mem = 29360kB
>> maintenance_work_mem = 4GB
>> autovacuum_work_mem = 8GB
>> max_stack_depth = 5MB
>> dynamic_shared_memory_type = posix
>> bgwriter_delay = 10ms
>> bgwriter_lru_maxpages = 700
>> bgwriter_lru_multiplier = 2.0
>> fsync = off
>> synchronous_commit = on
>> full_page_writes = off
>> wal_buffers = 1500MB
>> wal_writer_delay = 1ms
>> checkpoint_timeout = 10min
>> max_wal_size = 4GB
>> min_wal_size = 2GB
>> checkpoint_completion_target = 0.9
>> effective_cache_size = 168GB
>> default_statistics_target = 500
>> log_destination = 'csvlog'
>> logging_collector = on
>> log_directory = 'pg_log'
>> log_filename = 'postgresql-%w.log'
>> log_file_mode = 0640
>> log_truncate_on_rotation = on
>> log_rotation_age = 1d
>> log_rotation_size = 600MB
>> log_min_duration_statement = 0
>> log_checkpoints = off
>> log_connections = off
>> log_disconnections = off
>> log_duration = off
>> log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u,app=%a,client=%h'
>> log_lock_waits = on
>> log_timezone = 'Brazil/East'
>> autovacuum = on
>> log_autovacuum_min_duration = -1
>> autovacuum_max_workers = 12
>> autovacuum_naptime = 30s
>> autovacuum_vacuum_threshold = 20
>> autovacuum_analyze_threshold = 20
>> autovacuum_vacuum_scale_factor = 0.2
>> autovacuum_analyze_scale_factor = 0.1
>> autovacuum_vacuum_cost_delay = 20
>> autovacuum_vacuum_cost_limit = 800
>> vacuum_cost_delay = 20
>> vacuum_cost_limit = 800
>> datestyle = 'iso, mdy'
>> timezone = 'Brazil/East'
>> lc_messages = 'en_US.UTF-8'
>> lc_monetary = 'en_US.UTF-8'
>> lc_numeric = 'en_US.UTF-8'
>> lc_time = 'en_US.UTF-8'
>> default_text_search_config = 'pg_catalog.english'
>> max_locks_per_transaction = 256
>> pgpool.pg_ctl = '/usr/pgsql-9.6/bin/pg_ctl'
>> effective_io_concurrency = 10
>> wal_level = hot_standby
>> max_wal_senders = 10
>> max_replication_slots = 3
>> wal_keep_segments = 38400
>>
>> *Configuration PostgreSQL StandBy*
>>
>> postgres.conf
>>
>> listen_addresses = '*'
>> port = 5433
>> max_connections = 2000
>> superuser_reserved_connections = 3
>> shared_buffers = 10GB
>> work_mem = 183500kB
>> maintenance_work_mem = 4GB
>> autovacuum_work_mem = 8GB
>> max_stack_depth = 5MB
>> dynamic_shared_memory_type = posix
>> bgwriter_delay = 10ms
>> bgwriter_lru_maxpages = 700
>> bgwriter_lru_multiplier = 2.0
>> fsync = off
>> synchronous_commit = on
>> full_page_writes = off
>> wal_buffers = 16MB
>> wal_writer_delay = 1ms
>> checkpoint_timeout = 10min
>> max_wal_size = 4GB
>> min_wal_size = 2GB
>> checkpoint_completion_target = 0.9
>> effective_cache_size = 104GB
>> default_statistics_target = 500
>> log_destination = 'csvlog'
>> logging_collector = on
>> log_directory = 'pg_log'
>> log_filename = 'postgresql-%w.log'
>> log_file_mode = 0640
>> log_truncate_on_rotation = on
>> log_rotation_age = 1d
>> log_rotation_size = 600MB
>> log_min_duration_statement = 0
>> log_checkpoints = off
>> log_connections = off
>> log_disconnections = off
>> log_duration = off
>> log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u,app=%a,client=%h'
>> log_lock_waits = on
>> log_timezone = 'Brazil/East'
>> autovacuum = on
>> log_autovacuum_min_duration = -1
>> autovacuum_max_workers = 3
>> autovacuum_naptime = 30s
>> autovacuum_vacuum_threshold = 20
>> autovacuum_analyze_threshold = 20
>> autovacuum_vacuum_scale_factor = 0.2
>> autovacuum_analyze_scale_factor = 0.1
>> autovacuum_vacuum_cost_delay = 20
>> autovacuum_vacuum_cost_limit = 800
>> vacuum_cost_delay = 20
>> vacuum_cost_limit = 800
>> datestyle = 'iso, mdy'
>> timezone = 'Brazil/East'
>> lc_messages = 'en_US.UTF-8'
>> lc_monetary = 'en_US.UTF-8'
>> lc_numeric = 'en_US.UTF-8'
>> lc_time = 'en_US.UTF-8'
>> default_text_search_config = 'pg_catalog.english'
>> max_locks_per_transaction = 256
>> pgpool.pg_ctl = '/usr/pgsql-9.6/bin/pg_ctl'
>> effective_io_concurrency = 10
>> max_worker_processes = 18
>> hot_standby = on
>> hot_standby_feedback = on
>>
>> recovery.conf
>>
>> standby_mode = 'on'
>> primary_slot_name = 'replicacao'
>> primary_conninfo = 'host=bdreplica00 port=5433 user=replicator
>> password='
>> trigger_file = '/var/lib/pgsql/9.6/data/master_db.conf'
>>
>> --
>>
>>
>> Marcelo Krüger <https://br.linkedin.com/in/marcelo-kr%C3%BCger-40aab127>
>>
>> Data Integration
>>
>> Office: +55 48 3333-2030 <+55%2048%203333-2030> | Mobile: +55 48
>> 999679585 <+55%2048%2099967-9585>
>>
>> R. Patrício Farias, 131 - Itacorubi - Florianópolis - 3º andar |
>> 88034-132 | Santa Catarina, Brasil
>> <https://maps.google.com/?q=R.+Patr%C3%ADcio+Farias,+131+-+Itacorubi+-+Florian%C3%B3polis+-+3%C2%BA+andar+%7C+%E2%80%A888034-132+%7C+Santa+Catarina,+Brasil&entry=gmail&source=g>
>>
>> São Paulo | Florianópolis | New York
>>
>> <https://www.linkedin.com/company/1235825/>
>> <https://www.facebook.com/neoway.nw/>
>>
>> Conteúdo confidencial. Caso você não seja o real destinatário deste
>> e-mail por favor notifique o remetente e elimine esta mensagem. Privileged
>> and confidential. If you are not the intended addressee of this e-mail
>> please notify the sender and promptly delete this message.
>>
>>
> Hi Marcelo Kruger,
>
> From server configuration details provided, looks like a heavy trans DB &
> more logging enabled which might be resulting more write act on Master and
> Slave.
> At our first glance, we see *log_min_duration_statement* is set to 0
> which might be writing all the queries into the log resulting heavy writing
> activity that maybe cause for replication lag.
>
>
> Hope this helps you.
>
>
> --
>
>

--

Marcelo Krüger <https://br.linkedin.com/in/marcelo-kr%C3%BCger-40aab127>

Data Integration

Office: +55 48 3333-2030 | Mobile: +55 48 999679585

R. Patrício Farias, 131 - Itacorubi - Florianópolis - 3º andar | 88034-132
| Santa Catarina, Brasil

São Paulo | Florianópolis | New York

<https://www.linkedin.com/company/1235825/>
<https://www.facebook.com/neoway.nw/>

Conteúdo confidencial. Caso você não seja o real destinatário deste e-mail
por favor notifique o remetente e elimine esta mensagem. Privileged and
confidential. If you are not the intended addressee of this e-mail please
notify the sender and promptly delete this message.

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message rammohan ganapavarapu 2017-11-08 16:14:03 Re: How to measure replication lag in logical replication setup.
Previous Message Igor Neyman 2017-11-08 15:39:33 Re: How to measure replication lag in logical replication setup.