Re: pg11.1: dsa_area could not attach to segment

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg11.1: dsa_area could not attach to segment
Date: 2019-02-07 01:47:19
Message-ID: 20190207014719.GJ29720@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

FYI, I wasn't yet able to make this work yet.
(gdb) print *segment_map->header
Cannot access memory at address 0x7f347e554000

However I *did* reproduce the error in an isolated, non-production postgres
instance. It's a total empty, untuned v11.1 initdb just for this, running ONLY
a few simultaneous loops around just one query It looks like the simultaneous
loops sometimes (but not always) fail together. This has happened a couple
times.

It looks like one query failed due to "could not attach" in leader, one failed
due to same in worker, and one failed with "not pinned", which I hadn't seen
before and appears to be related to DSM, not DSA...

|ERROR: dsa_area could not attach to segment
|ERROR: cannot unpin a segment that is not pinned
|ERROR: dsa_area could not attach to segment
|CONTEXT: parallel worker
|
|[2] Done while PGHOST=/tmp PGPORT=5678 psql postgres -c "SELECT colcld.child c, parent p, array_agg(colpar.attname::text ORDER BY colpar.attnum) cols, array_agg(format_type(colpar.atttypid, colpar.atttypmod) ORDER BY colpar.attnum) AS types FROM queued_alters qa JOIN pg_attribute colpar ON to_regclass(qa.parent)=colpar.attrelid AND colpar.attnum>0 AND NOT colpar.attisdropped JOIN (SELECT *, attrelid::regclass::text AS child FROM pg_attribute) colcld ON to_regclass(qa.child) =colcld.attrelid AND colcld.attnum>0 AND NOT colcld.attisdropped WHERE colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid GROUP BY 1,2 ORDER BY parent LIKE 'unused%', regexp_replace(colcld.child, '.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$', '\\3\\5') DESC, regexp_replace(colcld.child, '.*_', '') DESC LIMIT 1"; do
| :;
|done > /dev/null
|[5]- Done while PGHOST=/tmp PGPORT=5678 psql postgres -c "SELECT colcld.child c, parent p, array_agg(colpar.attname::text ORDER BY colpar.attnum) cols, array_agg(format_type(colpar.atttypid, colpar.atttypmod) ORDER BY colpar.attnum) AS types FROM queued_alters qa JOIN pg_attribute colpar ON to_regclass(qa.parent)=colpar.attrelid AND colpar.attnum>0 AND NOT colpar.attisdropped JOIN (SELECT *, attrelid::regclass::text AS child FROM pg_attribute) colcld ON to_regclass(qa.child) =colcld.attrelid AND colcld.attnum>0 AND NOT colcld.attisdropped WHERE colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid GROUP BY 1,2 ORDER BY parent LIKE 'unused%', regexp_replace(colcld.child, '.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$', '\\3\\5') DESC, regexp_replace(colcld.child, '.*_', '') DESC LIMIT 1"; do
| :;
|done > /dev/null
|[6]+ Done while PGHOST=/tmp PGPORT=5678 psql postgres -c "SELECT colcld.child c, parent p, array_agg(colpar.attname::text ORDER BY colpar.attnum) cols, array_agg(format_type(colpar.atttypid, colpar.atttypmod) ORDER BY colpar.attnum) AS types FROM queued_alters qa JOIN pg_attribute colpar ON to_regclass(qa.parent)=colpar.attrelid AND colpar.attnum>0 AND NOT colpar.attisdropped JOIN (SELECT *, attrelid::regclass::text AS child FROM pg_attribute) colcld ON to_regclass(qa.child) =colcld.attrelid AND colcld.attnum>0 AND NOT colcld.attisdropped WHERE colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid GROUP BY 1,2 ORDER BY parent LIKE 'unused%', regexp_replace(colcld.child, '.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$', '\\3\\5') DESC, regexp_replace(colcld.child, '.*_', '') DESC LIMIT 1"; do

I'm also trying to reproduce on other production servers. But so far nothing
else has shown the bug, including the other server which hit our original
(other) DSA error with the queued_alters query. So I tentatively think there
really may be something specific to the server (not the hypervisor so maybe the
OS, libraries, kernel, scheduler, ??).

Find the schema for that table here:
https://www.postgresql.org/message-id/20181231221734.GB25379%40telsasoft.com

Note, for unrelated reasons, that query was also previously discussed here:
https://www.postgresql.org/message-id/20171110204043.GS8563%40telsasoft.com

Justin

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nagaura, Ryohei 2019-02-07 01:51:12 RE: Timeout parameters
Previous Message Michael Paquier 2019-02-07 01:44:59 Re: Location of pg_rewind/RewindTest.pm and ssl/ServerSetup.pm