Synchronous replication + pgPool: not all transactions immediately visible on standby

From: Thomas Kellerer <spam_eater(at)gmx(dot)net>
To: pgsql-general(at)postgresql(dot)org
Subject: Synchronous replication + pgPool: not all transactions immediately visible on standby
Date: 2014-09-24 08:08:59
Message-ID: lvtu6s$ivu$1@ger.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello,

we have a setup with Postgres 9.3.4 running on Ubuntu (don't know the exact version) using streaming replication with a hot standby and pgPool 3.3.3 as a loadbalancer in front of the two Postgres servers.

While running automated tests we noticed that despite the fact that replication is set to synchronous not all committed transactions are immediately visible on the standby.

The tests are Java programs using JPA (EclipseLink) to access the database.

The basic pattern is something like this:

* tests opens a new JPA sessions
* updates a table
* commits the transaction
* closes the JPA session
* opens a new JPA session (this is where it gets transferred to the slave)
* selects the modified data to verify everything
* closes the JPA session

If the commit and the following select are issued too quickly, the select doesn't see the changes.

If we either connect directly to the master to run the tests, or artificially sleep inside the tests (e.g. 100ms) then we can see the results of a previous transaction without problems. All connections use the default isolation level (read committed).

We enabled statement logging on the master and the server, and these are the relevant parts

Log on the on the master:

2014-09-24 09:13:24.774 CEST LOG: Execute <unnamed>: SET extra_float_digits = 3
2014-09-24 09:13:24.797 CEST LOG: Execute <unnamed>: BEGIN
2014-09-24 09:13:24.869 CEST LOG: Execute <unnamed>: UPDATE xxxx SET STATUS = $1 WHERE some_col IN (.....)
2014-09-24 09:13:24.869 CEST DETAIL: Parameter: $1 = '2', ........ $2001 = '1999'
2014-09-24 09:13:25.060 CEST LOG: Execute S_1: COMMIT
2014-09-24 09:13:25.120 CEST LOG: Anweisung: DISCARD ALL

Log on the slave:

2014-09-24 09:13:25.125 CEST LOG: Execute <unnamed>: SET extra_float_digits = 3
2014-09-24 09:13:25.131 CEST LOG: Execute <unnamed>: BEGIN
2014-09-24 09:13:25.133 CEST LOG: Execute <unnamed>: SELECT ... FROM xxxx WHERE ... AND (STATUS = $3) LIMIT $4 OFFSET $5
2014-09-24 09:13:25.133 CEST DETAIL: Parameter: $1 = '1000426553', $2 = '2014-09-24 09:14:18.114', $3 = '2', $4 = '2100', $5 = '0'
2014-09-24 09:13:25.137 CEST LOG: Execute S_1: COMMIT
2014-09-24 09:13:25.138 CEST LOG: Anweisung: DISCARD ALL

So the new session on the slave was initiated 65ms after the commit on the master was done.
But still the SELECT didn't return anything (the where clause includes the new values for the STATUS column updated in the previous transaction).

The IN clause of the UPDATE statement contains 2000 values. If we reduce the number of updated rows (e.g. to 20) things are working fine.
Everything sounds as if the replication is configured to be asynchronous, although it isn't

Here are some of the configuration settings that I can imagine would be important:

postgresql.conf (from the master)

wal_level = hot_standby
checkpoint_segments = 16
checkpoint_completion_target = 0.9
max_wal_senders = 5
wal_keep_segments = 50
synchronous_standby_names = 'test_slave'

The following entries are unchanged from a default configuration (they are still commented out)

#fsync = on
#synchronous_commit = on
#wal_sync_method = fsync
#full_page_writes = on
#wal_buffers = -1
#wal_writer_delay = 200ms
#commit_delay = 0
#commit_siblings = 5

The slave has

hot_standby = on
hot_standby_feedback = on

pgpool.conf

num_init_children = 400
max_pool = 2
child_life_time = 300
child_max_connections = 0
connection_life_time = 0
client_idle_limit = 0
connection_cache = on
reset_query_list = 'ABORT; DISCARD ALL'

replication_mode = off
replicate_select = off
insert_lock = on

load_balance_mode = on
master_slave_mode = on
master_slave_sub_mode = 'stream'
sr_check_period = 0
parallel_mode = off
use_watchdog = off
memory_cache_enabled = off

Our assumption is that it has something to do with the hot standby and/or the pgPool configuration.
But we are unsure where exactly the problem is.

This is our first time using pgPool (this is a pre-configured system from our customer) and we are unsure on where to look further.

Regards
Thomas

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2014-09-24 12:15:02 Re: pg_dump: [archiver] -C and -c are incompatible options
Previous Message Ilya I. Ashchepkov 2014-09-24 07:44:25 Re: JSONB spaces in text presentation