| From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
|---|---|
| To: | Matheus Alcantara <matheusssilv97(at)gmail(dot)com>, Arseniy Mukhin <arseniy(dot)mukhin(dot)dev(at)gmail(dot)com> |
| Cc: | Joel Jacobson <joel(at)compiler(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Álvaro Herrera <alvherre(at)kurilemu(dot)de> |
| Subject: | Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue |
| Date: | 2025-11-06 10:05:20 |
| Message-ID: | a8892a0e-7660-42d1-98bb-b677f6c15069@iki.fi |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 05/11/2025 21:02, Matheus Alcantara wrote:
> On Wed Nov 5, 2025 at 9:59 AM -03, Heikki Linnakangas wrote:
>>> In case of an error on TransactionIdDidCommit() I think that we will
>>> have the same behavior as we advance the "current" position before of
>>> DidCommit call the PG_FINALLY block will set the backend position past
>>> the failing notification entry.
>>
>> With my patch, if TransactionIdDidCommit() fails, we will lose all the
>> notifications that we have buffered in the local buffer but haven't
>> passed to NotifyMyFrontEnd() yet. On 'master', you only lose a single
>> notification, the one where TransactionIdDidCommit() failed.
>>
> Yeah, that's true.
>
>>> How bad would be to store the notification entries within a list and
>>> store the next position with the notification entry and then wrap the
>>> NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
>>> saved "next position" on PG_CATCH? Something like this:
>>
>> [ ...]
>>
>> That addresses the failure on NotifyMyFrontEnd, but not on
>> TransactionIdDidCommit.
>>
>> IMHO we should just make these errors FATAL. TransactionIdDidCommit()
>> should not really fail, after fixing the bug we're discussing.
>> NotifyMyFrontEnd() could fail on OOM, but that seems pretty unlikely on
>> an otherwise idle connection. Or it could fail if the client connection
>> is lost, but then the backend is about to die anyway. And arguably
>> closing the connection is better than losing even a single notification,
>> anyway.
>>
> My only concern with making these errors FATAL is that if a notification
> entry causes a different, recoverable error, all subsequent messages
> will be lost. This is because if backend die and the user open a new
> connection and execute LISTEN on the channel it will not see these
> notifications past the one that caused the error. I'm not sure if we are
> completely safe from this case of a recoverable error, what do you
> think?
I did some more testing of the current behavior, using the encoding
conversion to cause an error:
In backend A:
SET client_encoding='latin1';
LISTEN foo;
Backend b:
NOTIFY foo, 'ハ';
Observations:
- If the connection is idle when the notification is received, the ERROR
is turned into FATAL anyway:
postgres=# SET client_encoding='latin1';
SET
postgres=# LISTEN foo;
LISTEN
postgres=# select 1; -- do the NOTIFY in another connection before this
ERROR: character with byte sequence 0xe3 0x83 0x8f in encoding "UTF8"
has no equivalent in encoding "LATIN1"
FATAL: terminating connection because protocol synchronization was lost
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
- If there are multiple notifications pending, we stop processing the
subsequent notifications on the first error. The subsequent
notifications will only be processed when another notify interrupt is
received, i.e. when a backend sends yet another notification.
I'm getting more and more convinced that escalating all ERRORs to FATALs
during notify processing is the right way to go. Attached is a new patch
set that does that.
- Heikki
| Attachment | Content-Type | Size |
|---|---|---|
| v3-0001-Escalate-ERRORs-during-async-notify-processing-to.patch | text/x-patch | 5.1 KB |
| v3-0002-Fix-bug-where-we-truncated-CLOG-that-was-still-ne.patch | text/x-patch | 14.6 KB |
| v3-0003-Fix-remaining-race-condition-with-CLOG-truncation.patch | text/x-patch | 6.9 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Thomas Munro | 2025-11-06 10:10:00 | Re: [Patch] Windows relation extension failure at 2GB and 4GB |
| Previous Message | John Naylor | 2025-11-06 09:38:31 | Re: GiST README typos |