BUG #13725: Logical Decoding - wrong results with large transactions and unfortunate timing

From: ofir(dot)manor(at)gmail(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #13725: Logical Decoding - wrong results with large transactions and unfortunate timing
Date: 2015-10-25 19:14:36
Message-ID: 20151025191436.3017.36314@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 13725
Logged by: Ofir Manor
Email address: ofir(dot)manor(at)gmail(dot)com
PostgreSQL version: 9.4.5
Operating system: CentOS 7.1 64-bit
Description:

Nasty but easily reproducible.

Test case, script 1.
I clean up the previous test, create a new table and two replication slot.
Then, I run a script in the background (it will trigger the bug), insert
100K rows, try to get them with pg_logical_slot_get_changes from the first
slot.
This works correctly every time - I get back the 100,000 rows + one BEGIN +
one COMMIT.
psql --quiet << EOF
SELECT pg_drop_replication_slot('test1_slot');
SELECT pg_drop_replication_slot('test2_slot');
DROP TABLE test;
CREATE TABLE test (id int primary key, v varchar);
SELECT * FROM pg_create_logical_replication_slot('test1_slot',
'test_decoding');
SELECT * FROM pg_create_logical_replication_slot('test2_slot',
'test_decoding');
EOF
./get_loop.sh &
psql --quiet --tuples-only -o out1 << EOF
INSERT INTO test SELECT i, (i*3)::varchar FROM generate_series(1,100000)
i;
SELECT * FROM pg_logical_slot_get_changes('test1_slot', NULL, NULL);
EOF
cat --squeeze-blank out1 | wc -l

Script 2 (get_loop.sh) continuously polls for changes using a loop of
pg_logical_slot_get_changes, using the second replication slot. Since the
test pg database is idle, only one call returns.

for i in `seq 1 10000`; do
echo "SELECT * FROM pg_logical_slot_get_changes('test2_slot', NULL,
NULL);"
done | psql --quiet --tuples-only -o out2
cat --squeeze-blank out2 | wc -l

However, I get 116K or 148K rows in the output (after the 100K inserts).
I can clearly see the the output jumps back after 16K or 48K rows - starting
to stream the transaction from the beginning (see the first column - the
offsets):
...
1/68F6E1B8 | 450854 | table public.test: INSERT: id[integer]:49149
v[character varying]:'147447'
1/68F6E248 | 450854 | table public.test: INSERT: id[integer]:49150
v[character varying]:'147450'
1/68F6E2D8 | 450854 | table public.test: INSERT: id[integer]:49151
v[character varying]:'147453'
1/68F6E368 | 450854 | table public.test: INSERT: id[integer]:49152
v[character varying]:'147456'
1/68891010 | 450854 | table public.test: INSERT: id[integer]:1 v[character
varying]:'3'
1/688910D8 | 450854 | table public.test: INSERT: id[integer]:2 v[character
varying]:'6'
1/68891168 | 450854 | table public.test: INSERT: id[integer]:3 v[character
varying]:'9'
1/688911F8 | 450854 | table public.test: INSERT: id[integer]:4 v[character
varying]:'12'
...

When I have larger transaction (inserting 1M or 10M rows, it gets worse,
jumping back to the beginning multiple times.

BTW - I don't know if it is a bug in pg_logical_slot_get_changes or in the
test_decoding output plugin that I'm using to evaluate logical decoding.

Any fix / workaround?

Thanks in advance,
Ofir

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David G. Johnston 2015-10-25 21:37:38 Re: BUG #13723: "duplicate key" error occurs when update delete and insert the same row concurrently
Previous Message chenhj 2015-10-25 17:28:40 Re: BUG #13723: "duplicate key" error occurs when update delete and insert the same row concurrently