Re: [ADMIN] Replication slots and isolation levels

From: Vladimir Borodin <root(at)simply(dot)name>
To: pgsql-admin <pgsql-admin(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [ADMIN] Replication slots and isolation levels
Date: 2015-10-29 08:42:57
Message-ID: BE95C564-0D49-462A-A57C-4C9DF6238F71@simply.name
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers


> 27 окт. 2015 г., в 19:45, Vladimir Borodin <root(at)simply(dot)name> написал(а):
>
> Hi all.
>
> I’m wondering why do I get conflicts with recovery on hot standby using replication slots and read commited isolation level? And if I start repeatable read transaction I don’t get any errors. Below is some diagnostics.

+hackers@

Could anybody explain, why this is happening?

>
> I’m using 9.4.4 (upgrade is planned) from yum.postgresql.org <http://yum.postgresql.org/> packages on both master and standby. Configs are the same on both master and standby:
>
> rpopdb01d/postgres M # SELECT name, setting FROM pg_settings
> WHERE category LIKE 'Replication%' or category LIKE 'Write-Ahead Log';
> name | setting
> ------------------------------+---------
> hot_standby | on
> hot_standby_feedback | on
> max_replication_slots | 1
> max_standby_archive_delay | 30000
> max_standby_streaming_delay | 30000
> max_wal_senders | 10
> synchronous_standby_names |
> vacuum_defer_cleanup_age | 200000
> wal_keep_segments | 64
> wal_receiver_status_interval | 1
> wal_receiver_timeout | 60000
> wal_sender_timeout | 3000
> (12 rows)
>
> Time: 1.583 ms
> rpopdb01d/postgres M #
>
> On the master I’ve created a physical replication slot and attached standby to it, I do see changing xmin and restart_lsn fields in pg_replication_slots view.
>
> rpopdb01d/postgres M # select * from pg_replication_slots ;
> slot_name | plugin | slot_type | datoid | database | active | xmin | catalog_xmin | restart_lsn
> ----------------------+--------+-----------+--------+----------+--------+------------+--------------+---------------
> rpopdb01e_domain_com | [null] | physical | [null] | [null] | t | 2127399287 | [null] | 960B/415C79C8
> (1 row)
>
> Time: 0.463 ms
> rpopdb01d/postgres M #
>
> When I start a read commited transaction on standby (or use autocommit mode, doesn’t matter) I still see that xmin in pg_replication_slots view on master increases. If I do run a heavy SELECT statement, at some point of time (presumably after vacuum_defer_cleanup_age expires) standby starts to lag replication apply and when it hits max_standby_streaming_delay I get 40001 sql code, either ERROR or FATAL:
>
> rpopdb01e/rpopdb R # SHOW transaction_isolation ;
> transaction_isolation
> -----------------------
> read committed
> (1 row)
>
> Time: 0.324 ms
> rpopdb01e/rpopdb R # SELECT count(*) FROM big_table;
> ERROR: 40001: canceling statement due to conflict with recovery
> DETAIL: User query might have needed to see row versions that must be removed.
> LOCATION: ProcessInterrupts, postgres.c:2990
> Time: 199791.339 ms
> rpopdb01e/rpopdb R #
>
>
>
> rpopdb01e/rpopdb R # SHOW transaction_isolation ;
> transaction_isolation
> -----------------------
> read committed
> (1 row)
>
> Time: 0.258 ms
> rpopdb01e/rpopdb R # BEGIN;
> BEGIN
> Time: 0.067 ms
> rpopdb01e/rpopdb R # SELECT count(*) FROM big_table;
> FATAL: 40001: terminating connection due to conflict with recovery
> DETAIL: User was holding a relation lock for too long.
> LOCATION: ProcessInterrupts, postgres.c:2857
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.
> Time: 307864.830 ms
> rpopdb01e/rpopdb R #
>
> The behavior is the same as expected to be without using replication slots.
>
> But when I start repeatable read transaction xmin field in pg_replication_slots view on master freezes (while restart_lsn is still increasing) and I don’t get any replication lag and conflicts with recovery. When I end this transaction, xmin starts increasing again.
>
> rpopdb01e/rpopdb R # begin transaction isolation level repeatable read;
> BEGIN
> Time: 0.118 ms
> rpopdb01e/rpopdb R # SELECT count(*) FROM big_table;
> count
> ------------
> 3106222429
> (1 row)
>
> Time: 411944.889 ms
> rpopdb01e/rpopdb R # ROLLBACK;
> ROLLBACK
> Time: 0.269 ms
> rpopdb01e/rpopdb R #
>
> And that is what I expect. Am I missing something or is it expected behavior in read commited mode?
>
> Thanks in advance.
>
> --
> May the force be with you…
> https://simply.name <https://simply.name/>

--
May the force be with you…
https://simply.name

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Michael Paquier 2015-10-29 10:12:39 Re: [ADMIN] Replication slots and isolation levels
Previous Message Tom Lane 2015-10-27 20:55:45 Re: pg_tblspc and fsm files

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2015-10-29 10:05:03 Re: Cross-check recent documentation changes
Previous Message Tatsuo Ishii 2015-10-29 08:39:36 Re: Re: [BUGS] BUG #13611: test_postmaster_connection failed (Windows, listen_addresses = '0.0.0.0' or '::')