logical copy_replication_slot issues

From: Arseny Sher <a(dot)sher(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: logical copy_replication_slot issues
Date: 2020-02-09 16:28:59
Message-ID: 871rr3ohbo.fsf@ars-thinkpad
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

While jumping around partically decoded xacts questions [1], I've read
through the copy replication slots code (9f06d79ef) and found a couple
of issues.

1) It seems quite reckless to me to dive into
DecodingContextFindStartpoint without actual WAL reservation (donors
slot restart_lsn is used, but it is not acquired). Why risking erroring
out with WAL removal error if the function advances new slot position to
updated donors one in the end anyway?

2) In the end, restart_lsn of new slot is set to updated donors
one. However, confirmed_flush field is not updated. This is just wrong
-- we could start decoding too early and stream partially decoded
transaction.

I'd probably avoid doing DecodingContextFindStartpoint at all. Its only
purpose is to assemble consistent snapshot (and establish corresponding
<restart_lsn, confirmed_flush_lsn> pair), but donor slot must have
already done that and we could use it as well. Was this considered?

[1] https://www.postgresql.org/message-id/flat/AB5978B2-1772-4FEE-A245-74C91704ECB0%40amazon.com

--
Arseny Sher
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jose Luis Tallon 2020-02-09 18:28:49 Just for fun: Postgres 20?
Previous Message Arseny Sher 2020-02-09 16:07:50 Re: ERROR: subtransaction logged without previous top-level txn record