| From: | Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> |
|---|---|
| To: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz> |
| Subject: | Re: Use-after-free issue in postgres_fdw |
| Date: | 2026-03-21 11:40:50 |
| Message-ID: | CAPmGK16-L3UhkrMGUhL1DO3tFRrFaZ_8QSHE62ToJWQU_Gu0UQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Chao,
On Fri, Mar 20, 2026 at 5:27 PM Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
> I can reproduce the server crash following your procedure, and I traced the problem.
>
> The issue is that, during select * from ft1, pgfdw_reject_incomplete_xact_state_change() calls disconnect_pg_server(), which destroys conn and sets ConnCacheEntry->conn = NULL, but does not update PgFdwScanState->conn. As a result, when "close c1" is executed later, PgFdwScanState->conn points to stale memory with random contents.
That is right.
> I am not sure we should still allow further commands to run after select * from ft1, given that it has already raised: "ERROR: connection to server "loopback" was lost”. Maybe we should not keep going as if the connection were still there.
The ERROR is thrown within a subtransaction, not within the main
transaction, so we *should* allow commands (that don't use the
connection) to run *after* rolling back to the subtransaction.
(A transaction, like the example one, whose subtransactions failed
abort cleanup due to a connection issue needs to abort in the end, as
it can't commit the remote transaction anymore. Considering this
limitation, the above behavior might be surprising to users. We might
need to do something about this, but I think that that is another
issue and thus should be discussed separately.)
> I am not very familiar with the FDW code, so I am not ready to suggest a concrete fix. But it seems wrong to let later paths keep using PgFdwScanState->conn after select * from ft1 has already failed with connection loss.
Why? While we can't use the connection any further, with the patch we
can still use the cashed PGconn, as pointed out by Matheus downthread.
> My guess is that we either need to invalidate all dependent state when disconnect_pg_server() runs, or otherwise prevent later cleanup paths from touching the cached PGconn *.
I don't think that that is a good idea, because that 1) would require
invasive changes to the existing code, which isn't good especially for
back-patching, as also mentioned by Matheus downthread, and 2) would
result in something that is essentially the same as what I proposed;
in other words, that would be a lot of effort with little result.
Thanks for looking at this!
Best regards,
Etsuro Fujita
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Etsuro Fujita | 2026-03-21 11:44:47 | Re: Use-after-free issue in postgres_fdw |
| Previous Message | Amit Kapila | 2026-03-21 10:55:06 | Re: [19] CREATE SUBSCRIPTION ... SERVER |