Re: backend crash on DELETE, reproducible locally

From: Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: backend crash on DELETE, reproducible locally
Date: 2018-11-03 12:14:19
Message-ID: 20181103121419.GA3885@hermes.hilbert.loc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Fri, Nov 02, 2018 at 11:56:58PM -0400, Tom Lane wrote:

> > On Thu, Nov 01, 2018 at 11:43:56AM -0400, Tom Lane wrote:
> >> Yeah, apparently we've passed a null OLD tuple to an RI_FKey_cascade_del
> >> trigger, which surely shouldn't happen. It'd be interesting to look at
> >> the set of triggers on this table. I don't entirely trust psql's \d
> >> to show us reality if there's something screwed up about the triggers,
> >> so in addition to \d output, could we see
> >> select * from pg_trigger where tgrelid = 'ref.auto_hint'::regclass;
>
> > [ pretty normal-looking trigger entries ]
>
> I was feeling baffled about this, but it suddenly occurs to me that maybe
> the bug fixed in 040a1df61/372102b81 explains this. If the particular
> tuple you're trying to delete predates the last ALTER TABLE ADD COLUMN
> operation on the table, then this bug would result in t_self getting
> set to "invalid", which'd cause AfterTriggerSaveEvent to save "ate_ctid1"
> as "invalid", which'd lead to a null tuple getting passed when the trigger
> eventually gets invoked.

Now, that is a theory I can follow up on -- all the
bootstrapping SQL scripts are under version control so I can
pin down the exact sequence of events.

... goes off to do some checks ...

And sure enough there's an ALTER TABLE ADD COLUMN related to
that table:

alter table ref.auto_hint
add column recommendation_query text;

-- (audit log table needs to get the new column, too)
alter table audit.log_auto_hint
add column recommendation_query text;

before the DELETE of the pre-existing tuple.

When running a rigged upgrade that stops right before those
ALTER TABLEs and then doing the following:

begin;
-- instrument:
insert into ref.auto_hint (title, query, hint, source, lang) values (
'DELETE test',
'select 1;',
'insertion before ADD COLUMN',
'testing',
'en'
);
-- works:
DELETE from ref.auto_hint where title = 'DELETE test';
-- instrument, again:
insert into ref.auto_hint (title, query, hint, source, lang) values (
'DELETE test',
'select 1;',
'insertion before ADD COLUMN',
'testing',
'en'
);
alter table ref.auto_hint
add column recommendation_query text;
-- audit log table needs to get the message
alter table audit.log_auto_hint
add column recommendation_query text;
-- fails:
DELETE from ref.auto_hint where title = 'DELETE test';
rollback;

the expected segfault does indeed occur.

Conversely, moving the offending

DELETE FROM ref.auto_hint WHERE title = 'Kontraindikation: ACE/Sartan <-> Schwangerschaft';

to right before the ALTER TABLEs makes the full upgrade run
through without further problems.

Looking at 040a1df61/372102b81 feels like it fits the bill.

So, I guess I can work around the issue by the above
manoeuvre and report back once 040a1df61/372102b81 is
released.

Anything else you'd want me to look into ?

Many thanks,
Karsten
--
GPG 40BE 5B0E C98E 1713 AFA6 5BC0 3BEA AC80 7D4F C89B

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2018-11-03 15:39:49 Re: backend crash on DELETE, reproducible locally
Previous Message Peter Eisentraut 2018-11-03 08:48:09 Re: Problem with stored procedure and nested transactions

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Verite 2018-11-03 12:53:37 Re: COPY FROM WHEN condition
Previous Message Andrew Dunstan 2018-11-03 12:12:27 Re: partitioned indexes and tablespaces