Re: BUG #16233: Yet another "logical replication worker" was terminated by signal 11: Segmentation fault

From: Johann du Toit <johann(at)winkreports(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #16233: Yet another "logical replication worker" was terminated by signal 11: Segmentation fault
Date: 2020-01-27 00:57:39
Message-ID: CAO7Fzi5Fw93tU_Seg+TNKOu82RE3qvprZfkceMa1+=_iO+mz7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, 27 Jan 2020 at 03:28, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> > I browsed around and noticed a few people having similar issues with logical
> > replication.
> > To my eyes it looks similar to the issue here:
> > https://www.postgresql.org/message-id/flat/16129-a0c0f48e71741e5f%40postgresql.org
> > which was fixed here:
> > https://git.postgresql.org/gitweb/?p=postgresql.git&a=commit&h=a2aa224e
>
> The stack trace does look the same, but do you have the same triggering
> condition we identified (that is, fairly wide column values in the
> subscriber table that are not getting replaced during a
> logical-replication update? or possibly dropped columns in the
> subscriber table?)

Thanks Tom.

I don't really think it's the same triggering condition. How wide is
"fairly wide"? My widest column is 1023 chars. Here's the table
description:

silverlining-# \d federated_banktransactionitem
Table
"public.federated_banktransactionitem"
Column | Type | Collation | Nullable
| Default
---------------------+--------------------------+-----------+----------+-----------------------------------------------------------
id | integer | | not null
| nextval('federated_banktransactionitem_id_seq'::regclass)
deleted | boolean | | not null |
created_date | timestamp with time zone | | |
updated_date | timestamp with time zone | | |
description | character varying(1023) | | |
quantity | numeric(15,4) | | not null |
line_amount | numeric(15,2) | | not null |
unit_amount | numeric(15,2) | | not null |
account_code | character varying(32) | | |
item_code | character varying(255) | | |
bank_transaction_id | integer | | |
ledger_account_id | integer | | |
organisation_id | integer | | |
product_id | integer | | |

I also saw in the previous thread you were asking Tomas to show the
attribute list to check for dropped columns. This is not a long-term
replication - I'm just trying to do a zero downtime upgrade (which has
worked fine using pglogical on older postgresql versions). So my
schema from the publisher is exported and loaded into the subscriber
"fresh" and no schema changes are done at all.

I have to note I'm not even sure this is the table causing the issues.
I've tried this replication a few times already from scratch and it's
failed at different tables / points in the process each time. But from
the logs I posted it looks like federated_banktransactionitem was
involved in this one? Here are the attribute lists for this table:

On Publisher:
silverlining=# SELECT attnum, attname, atttypid FROM pg_attribute
WHERE attrelid = 'public.federated_banktransactionitem'::regclass;
attnum | attname | atttypid
--------+---------------------+----------
-7 | tableoid | 26
-6 | cmax | 29
-5 | xmax | 28
-4 | cmin | 29
-3 | xmin | 28
-1 | ctid | 27
1 | id | 23
2 | deleted | 16
3 | created_date | 1184
4 | updated_date | 1184
5 | description | 1043
6 | quantity | 1700
7 | line_amount | 1700
8 | unit_amount | 1700
9 | account_code | 1043
10 | item_code | 1043
11 | bank_transaction_id | 23
12 | ledger_account_id | 23
13 | organisation_id | 23
14 | product_id | 23
(20 rows)

On Subscriber:
silverlining=# SELECT attnum, attname, atttypid FROM pg_attribute
WHERE attrelid = 'public.federated_banktransactionitem'::regclass;
attnum | attname | atttypid
--------+---------------------+----------
-7 | tableoid | 26
-6 | cmax | 29
-5 | xmax | 28
-4 | cmin | 29
-3 | xmin | 28
-1 | ctid | 27
1 | id | 23
2 | deleted | 16
3 | created_date | 1184
4 | updated_date | 1184
5 | description | 1043
6 | quantity | 1700
7 | line_amount | 1700
8 | unit_amount | 1700
9 | account_code | 1043
10 | item_code | 1043
11 | bank_transaction_id | 23
12 | ledger_account_id | 23
13 | organisation_id | 23
14 | product_id | 23
(20 rows)

When the crash occurred I didn't notice anything strange in the
publisher side logs.

> Couldn't say, but with most packaging systems it's relatively simple
> to rebuild a given package with a custom patch or two added. (If
> memory serves, with Debian's system you don't even have to modify
> any files, just add the patch into the relevant subdirectory.)
> That's a skill worth acquiring if you deal with open source a lot.

I was able to get a deb package built using a recent V12 Stable source
snapshot. I'm trying that as I'm typing this - will report back if it
crashed out as well.

Regards,
-J

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2020-01-27 03:16:05 Re: BUG #16227: Loss database tables automatically in a couple of days
Previous Message Eduardo Lúcio Amorim Costa 2020-01-26 20:43:55 Re: SQL/PostgreSQL - Error observed in the QUERY not caught by the “EXCEPTION” block in the stored procedure