Re: BUG #15367: Crash in pg_fe_scram_free when using foreign tables

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeremy Evans <code(at)jeremyevans(dot)net>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15367: Crash in pg_fe_scram_free when using foreign tables
Date: 2018-09-08 12:34:15
Message-ID: 18398.1536410055@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

I wrote:
> We're still no closer to an explanation of Jeremy's failure, though
> I'm now pretty sure that pg_saslprep itself isn't the issue.

I had an idea about that --- it's probably all wet, but the code as
written seems bulletproof enough that I'm forced to postulate something
very strange is happening.

Observe that there are two copies of pg_saslprep() in play: there is one
in the backend, which is compiled to allocate its result with palloc,
and there is one in libpq, which is compiled to allocate its result
with malloc. Could it be that somehow, when libpq is loaded into the
backend address space as it is here, libpq winds up calling the backend's
copy of pg_saslprep rather than its own? That would work just fine,
until libpq tried to free the returned string using free(), and then
we'd get exactly the reported error.

The main weakness in this theory is that it suggests that Jeremy's
postgres_fdw connections ought to be falling over more easily than
they are. However, I think that postgres_fdw will never explicitly
close a PGconn unless it's forced to; during a normal backend session
exit, the process just dies without going through PQfinish, so that
the problem would be masked. The only way to get to the PQfinish
call shown in the backtrace is for pgfdw_inval_callback to mark the
connection invalid, which'd require either an update on the relevant
foreign server object, an update on the user mapping in use, or a
SI cache reset. That explains how heavy DDL activity in an apparently
unrelated database can trigger the problem: it eventually results in
a SI message queue overrun and ensuing cache reset. In the absence
of SI cache resets, maybe indeed the problem is rare even if the
pg_saslprep result is misallocated every time.

Not sure about a good way to test this theory. Need more caffeine.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andrew Gierth 2018-09-08 13:22:22 Re: BUG #15367: Crash in pg_fe_scram_free when using foreign tables
Previous Message Tom Lane 2018-09-08 11:56:23 Re: BUG #15367: Crash in pg_fe_scram_free when using foreign tables