pg11.1: dsa_area could not attach to segment

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Subject: pg11.1: dsa_area could not attach to segment
Date: 2018-12-31 22:17:34
Message-ID: 20181231221734.GB25379@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

In our query logs I saw:

postgres=# SELECT log_time, session_id, session_line, left(message,99), left(query,99) FROM postgres_log WHERE error_severity='ERROR' AND message NOT LIKE 'cancel%';
-[ RECORD 1 ]+----------------------------------------------------------------------------------------------------
log_time | 2018-12-31 15:39:11.917-05
session_id | 5c2a7e6f.1fa4
session_line | 1
left | dsa_area could not attach to segment
left | SELECT colcld.child c, parent p, array_agg(colpar.attname::text ORDER BY colpar.attnum) cols, array
-[ RECORD 2 ]+----------------------------------------------------------------------------------------------------
log_time | 2018-12-31 15:39:11.917-05
session_id | 5c2a7e6f.1fa3
session_line | 4
left | dsa_area could not attach to segment
left | SELECT colcld.child c, parent p, array_agg(colpar.attname::text ORDER BY colpar.attnum) cols, array

The full query + plan is:

|ts=# explain SELECT colcld.child c, parent p, array_agg(colpar.attname::text ORDER BY colpar.attnum) cols, array_agg(format_type(colpar.atttypid, colpar.atttypmod) ORDER BY colpar.attnum) AS types
|FROM queued_alters qa
|JOIN pg_attribute colpar ON to_regclass(qa.parent)=colpar.attrelid AND colpar.attnum>0 AND NOT colpar.attisdropped
|JOIN (SELECT *, attrelid::regclass::text AS child FROM pg_attribute) colcld ON to_regclass(qa.child) =colcld.attrelid AND colcld.attnum>0 AND NOT colcld.attisdropped
|WHERE colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid
|GROUP BY 1,2
|ORDER BY
|parent LIKE 'unused%', -- Do them last
|regexp_replace(colcld.child, '.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$', '\3\5') DESC,
| regexp_replace(colcld.child, '.*_', '') DESC
|LIMIT 1;

|QUERY PLAN
|Limit (cost=67128.06..67128.06 rows=1 width=307)
| -> Sort (cost=67128.06..67137.84 rows=3912 width=307)
| Sort Key: (((qa_1.parent)::text ~~ 'unused%'::text)), (regexp_replace((((pg_attribute.attrelid)::regclass)::text), '.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$'::text, '\3\5'::text)) DESC, (regexp_replace((((pg_attribute.attrelid)::regclass)::text), '.*_'::text, ''::text)) DESC
| -> GroupAggregate (cost=66893.34..67108.50 rows=3912 width=307)
| Group Key: (((pg_attribute.attrelid)::regclass)::text), qa_1.parent
| -> Sort (cost=66893.34..66903.12 rows=3912 width=256)
| Sort Key: (((pg_attribute.attrelid)::regclass)::text), qa_1.parent
| -> Gather (cost=40582.28..66659.91 rows=3912 width=256)
| Workers Planned: 2
| -> Parallel Hash Join (cost=39582.28..65268.71 rows=1630 width=256)
| Hash Cond: (((to_regclass((qa_1.child)::text))::oid = pg_attribute.attrelid) AND (colpar.attname = pg_attribute.attname))
| Join Filter: (colpar.atttypid <> pg_attribute.atttypid)
| -> Nested Loop (cost=0.43..25614.89 rows=11873 width=366)
| -> Parallel Append (cost=0.00..12.00 rows=105 width=292)
| -> Parallel Seq Scan on queued_alters_child qa_1 (cost=0.00..11.47 rows=147 width=292)
| -> Parallel Seq Scan on queued_alters qa (cost=0.00..0.00 rows=1 width=292)
| -> Index Scan using pg_attribute_relid_attnum_index on pg_attribute colpar (cost=0.43..242.70 rows=114 width=78)
| Index Cond: ((attrelid = (to_regclass((qa_1.parent)::text))::oid) AND (attnum > 0))
| Filter: (NOT attisdropped)
| -> Parallel Hash (cost=33651.38..33651.38 rows=395365 width=72)
| -> Parallel Seq Scan on pg_attribute (cost=0.00..33651.38 rows=395365 width=72)
| Filter: ((NOT attisdropped) AND (attnum > 0))
|

queued_alters is usually empty, and looks like it would've last been nonempty on 2018-12-10.

ts=# \d queued_alters
Table "public.queued_alters"
Column | Type | Collation | Nullable | Default
--------+-----------------------+-----------+----------+---------
child | character varying(64) | | |
parent | character varying(64) | | |
Indexes:
"queued_alters_child_parent_key" UNIQUE CONSTRAINT, btree (child, parent)
Number of child tables: 1 (Use \d+ to list them.)

I found this other log at that time:
2018-12-31 15:39:11.918-05 | 30831 | 5bf38e71.786f | 5 | background worker "parallel worker" (PID 8100) exited with exit code 1

Which is the postmaster, or its PID in any case..

[pryzbyj(at)telsasoft-db ~]$ ps -wwwf 30831
UID PID PPID C STIME TTY STAT TIME CMD
postgres 30831 1 0 Nov19 ? S 62:44 /usr/pgsql-11/bin/postmaster -D /var/lib/pgsql/11/data

postgres=# SELECT log_time, pid, session_line, left(message,99) FROM postgres_log WHERE session_id='5bf38e71.786f' ;
log_time | pid | session_line | left
----------------------------+-------+--------------+-------------------------------------------------------------------------
2018-12-31 15:39:11.918-05 | 30831 | 5 | background worker "parallel worker" (PID 8100) exited with exit code 1
2018-12-31 15:39:11.935-05 | 30831 | 6 | background worker "parallel worker" (PID 8101) exited with exit code 1
2018-12-31 16:40:47.42-05 | 30831 | 7 | background worker "parallel worker" (PID 7239) exited with exit code 1
2018-12-31 16:40:47.42-05 | 30831 | 8 | background worker "parallel worker" (PID 7240) exited with exit code 1
2018-12-31 16:41:00.151-05 | 30831 | 9 | background worker "parallel worker" (PID 7371) exited with exit code 1
2018-12-31 16:41:00.151-05 | 30831 | 10 | background worker "parallel worker" (PID 7372) exited with exit code 1
2018-12-31 16:41:04.024-05 | 30831 | 11 | background worker "parallel worker" (PID 7451) exited with exit code 1
2018-12-31 16:41:04.024-05 | 30831 | 12 | background worker "parallel worker" (PID 7450) exited with exit code 1
2018-12-31 16:41:23.845-05 | 30831 | 13 | background worker "parallel worker" (PID 7658) exited with exit code 1
2018-12-31 16:41:23.845-05 | 30831 | 14 | background worker "parallel worker" (PID 7659) exited with exit code 1
2018-12-31 16:43:58.854-05 | 30831 | 15 | background worker "parallel worker" (PID 10825) exited with exit code 1
2018-12-31 16:43:58.854-05 | 30831 | 16 | background worker "parallel worker" (PID 10824) exited with exit code 1

I seem to be missing logs for session_lines 1-4 , which would've been rotated
to oblivion if older than 24h.

I found these:
https://www.postgresql.org/message-id/flat/CAPa4P2YzgRTBHHRx2KAPUO1_xkmqQgT2xP0tS_e%3DphWvNzWdkA%40mail.gmail.com#4a1a94bc71d4c99c01babc759fe5b6ec
https://www.postgresql.org/message-id/CAEepm=0Mv9BigJPpribGQhnHqVGYo2+kmzekGUVJJc9Y_ZVaYA@mail.gmail.com

Which appears to have been commited so I think this error is not unexpected.

|commit fd7c0fa732d97a4b4ebb58730e6244ea30d0a618
|Author: Robert Haas <rhaas(at)postgresql(dot)org>
|Date: Mon Dec 18 12:17:37 2017 -0500
|
|Fix crashes on plans with multiple Gather (Merge) nodes.

I will try to reproduce and provide bt...but all I can think to do is run a
tight loop around that query and hope something else comes by and tickles it a
few more times.

Happy Grecian newyear.

Justin

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Adam Brightwell 2018-12-31 22:23:23 Re: Changing SQL Inlining Behaviour (or...?)
Previous Message Andrew Dunstan 2018-12-31 21:52:33 Re: stats_temp_directory conflicts