Re: [BUGS] bug or simply not enough stack space?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Albe Laurenz <all(at)adv(dot)magwien(dot)gv(dot)at>, Frank van Vugt <ftm(dot)van(dot)vugt(at)foxi(dot)nl>, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: [BUGS] bug or simply not enough stack space?
Date: 2019-10-04 16:06:34
Message-ID: 14790.1570205194@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> Reviving this ancient thread. I saw "did not find subXID" errors, in
> 9.6.12. Here is what happened.

> 2019-10-03 19:58:37 CDT [10.22.236.83: rms(at)cds2]
> [10.22.236.83(54943)]: WARNING: did not find subXID 384134 in MyProc
> 2019-10-03 19:58:37 CDT [10.22.236.83: rms(at)cds2]
> [10.22.236.83(54943)]: CONTEXT: PL/pgSQL function
> loadhistorydatafromysm_testing2() line 99 during exception cleanup
> 2019-10-03 19:58:37 CDT [10.22.236.83: rms(at)cds2]
> [10.22.236.83(54943)]: LOG: could not send data to client: Broken
> pipe
> 2019-10-03 19:58:37 CDT [10.22.236.83: rms(at)cds2]
> [10.22.236.83(54943)]: CONTEXT: PL/pgSQL function
> loadhistorydatafromysm_testing2() line 99 during exception cleanup
> 2019-10-03 19:58:37 CDT [10.22.236.83: rms(at)cds2]
> [10.22.236.83(54943)]: STATEMENT: select
> LoadHistoryDataFromYSM_testing2();
> 2019-10-03 19:58:37 CDT [10.22.236.83: rms(at)cds2]
> [10.22.236.83(54943)]: ERROR: failed to re-find shared lock object
> 2019-10-03 19:58:37 CDT [10.22.236.83: rms(at)cds2]
> [10.22.236.83(54943)]: CONTEXT: PL/pgSQL function
> loadhistorydatafromysm_testing2() line 99 during exception cleanup
> 2019-10-03 19:58:37 CDT [10.22.236.83: rms(at)cds2]
> [10.22.236.83(54943)]: STATEMENT: select
> LoadHistoryDataFromYSM_testing2();

[ and then we get into recursive error-during-error-cleanup failures ]

Yeah, something has left stuff in a bad state here.

> *) "loadhistorydatafromysm_testing2()" is using pl/sh, which is a
> known source of weird (but rare) instability issues (I'm assuming this
> is underlying cause of issue)

Hm. Yeah, I'd be way more interested if this could be reproduced
without pl/sh.

> I can't help but wonder if we have some kind of obscure issue that is
> related to C extension problems; just throwing a data point on the
> table.

Well, there's nothing too obscure about the rule that error cleanup
needs to avoid doing anything that might cause another error, for fear
of causing infinite recursion. I suspect that the underlying issue is
that pl/sh is violating that rule somewhere. The other thread you point
to suggests that maybe oracle_fdw also used to do that, and fixed it.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2019-10-04 19:28:28 BUG #16039: PANIC when activating replication slots in Postgres 12.0 64bit under Windows
Previous Message Andres Freund 2019-10-04 15:26:24 Re: BUG #16038: Alter table - SegFault