Re: pg11.1: dsa_area could not attach to segment

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg11.1: dsa_area could not attach to segment
Date: 2019-02-11 04:01:32
Message-ID: 20190211040132.GV31721@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Mon, Feb 11, 2019 at 11:11:32AM +1100, Thomas Munro wrote:
> I haven't ever managed to reproduce that one yet. It's great you have
> a reliable repro... Let's discuss it on the #15585 thread.

I realized that I gave bad information (at least to Thomas). On the server
where I've been reproducing this, it wasn't in an empty DB cluster, but one
where I'd restored our DB schema. I think that's totally irrelevant, except
that pg_attribute needs to be big enough to get parallel scan.

Here's confirmed steps to reproduce

initdb -D /var/lib/pgsql/test
pg_ctl -c start -D /var/lib/pgsql/test -o '-c operator_precedence_warning=on -c maintenance_work_mem=1GB -c max_wal_size=16GB -c full_page_writes=off -c autovacuum=off -c fsync=off -c port=5678 -c unix_socket_directories=/tmp'
PGPORT=5678 PGHOST=/tmp psql postgres -c 'CREATE TABLE queued_alters(child text,parent text); CREATE TABLE queued_alters_child()INHERITS(queued_alters); ANALYZE queued_alters, pg_attribute'

# Inflate pg_attribute to nontrivial size:
echo "CREATE TABLE t(`for c in $(seq 1 222); do echo "c$c int,"; done |xargs |sed 's/,$//'`)" |PGHOST=/tmp PGPORT=5678 psql postgres
for a in `seq 1 999`; do echo "CREATE TABLE t$a() INHERITS(t);"; done |PGHOST=/tmp PGPORT=5678 psql -q postgres

while PGOPTIONS='-cmin_parallel_table_scan_size=0' PGPORT=5678 PGHOST=/tmp psql postgres -c "explain analyze SELECT colcld.child c, parent p, array_agg(colpar.attname::text ORDER BY colpar.attnum) cols, array_agg(format_type(colpar.atttypid, colpar.atttypmod) ORDER BY colpar.attnum) AS types FROM queued_alters qa JOIN pg_attribute colpar ON to_regclass(qa.parent)=colpar.attrelid AND colpar.attnum>0 AND NOT colpar.attisdropped JOIN (SELECT *, attrelid::regclass::text AS child FROM pg_attribute) colcld ON to_regclass(qa.child) =colcld.attrelid AND colcld.attnum>0 AND NOT colcld.attisdropped WHERE colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid GROUP BY 1,2 ORDER BY parent LIKE 'unused%', regexp_replace(colcld.child, '.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$', '\3\5') DESC, regexp_replace(colcld.child, '.*_', '') DESC LIMIT 1"; do :; done >/dev/null &

# Verify this is planning parallel workers, then repeat 10-20x.

Typically fails on this server in under 10min.

Sorry for the error.

Justin

On Wed, Feb 06, 2019 at 07:47:19PM -0600, Justin Pryzby wrote:
> FYI, I wasn't yet able to make this work yet.
> (gdb) print *segment_map->header
> Cannot access memory at address 0x7f347e554000
>
> However I *did* reproduce the error in an isolated, non-production postgres
> instance. It's a total empty, untuned v11.1 initdb just for this, running ONLY
> a few simultaneous loops around just one query It looks like the simultaneous
> loops sometimes (but not always) fail together. This has happened a couple
> times.
>
> It looks like one query failed due to "could not attach" in leader, one failed
> due to same in worker, and one failed with "not pinned", which I hadn't seen
> before and appears to be related to DSM, not DSA...
>
> |ERROR: dsa_area could not attach to segment
> |ERROR: cannot unpin a segment that is not pinned
> |ERROR: dsa_area could not attach to segment
> |CONTEXT: parallel worker
> |
> |[2] Done while PGHOST=/tmp PGPORT=5678 psql postgres -c "SELECT colcld.child c, parent p, array_agg(colpar.attname::text ORDER BY colpar.attnum) cols, array_agg(format_type(colpar.atttypid, colpar.atttypmod) ORDER BY colpar.attnum) AS types FROM queued_alters qa JOIN pg_attribute colpar ON to_regclass(qa.parent)=colpar.attrelid AND colpar.attnum>0 AND NOT colpar.attisdropped JOIN (SELECT *, attrelid::regclass::text AS child FROM pg_attribute) colcld ON to_regclass(qa.child) =colcld.attrelid AND colcld.attnum>0 AND NOT colcld.attisdropped WHERE colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid GROUP BY 1,2 ORDER BY parent LIKE 'unused%', regexp_replace(colcld.child, '.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$', '\\3\\5') DESC, regexp_replace(colcld.child, '.*_', '') DESC LIMIT 1"; do
> | :;
> |done > /dev/null
> |[5]- Done while PGHOST=/tmp PGPORT=5678 psql postgres -c "SELECT colcld.child c, parent p, array_agg(colpar.attname::text ORDER BY colpar.attnum) cols, array_agg(format_type(colpar.atttypid, colpar.atttypmod) ORDER BY colpar.attnum) AS types FROM queued_alters qa JOIN pg_attribute colpar ON to_regclass(qa.parent)=colpar.attrelid AND colpar.attnum>0 AND NOT colpar.attisdropped JOIN (SELECT *, attrelid::regclass::text AS child FROM pg_attribute) colcld ON to_regclass(qa.child) =colcld.attrelid AND colcld.attnum>0 AND NOT colcld.attisdropped WHERE colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid GROUP BY 1,2 ORDER BY parent LIKE 'unused%', regexp_replace(colcld.child, '.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$', '\\3\\5') DESC, regexp_replace(colcld.child, '.*_', '') DESC LIMIT 1"; do
> | :;
> |done > /dev/null
> |[6]+ Done while PGHOST=/tmp PGPORT=5678 psql postgres -c "SELECT colcld.child c, parent p, array_agg(colpar.attname::text ORDER BY colpar.attnum) cols, array_agg(format_type(colpar.atttypid, colpar.atttypmod) ORDER BY colpar.attnum) AS types FROM queued_alters qa JOIN pg_attribute colpar ON to_regclass(qa.parent)=colpar.attrelid AND colpar.attnum>0 AND NOT colpar.attisdropped JOIN (SELECT *, attrelid::regclass::text AS child FROM pg_attribute) colcld ON to_regclass(qa.child) =colcld.attrelid AND colcld.attnum>0 AND NOT colcld.attisdropped WHERE colcld.attname=colpar.attname AND colpar.atttypid!=colcld.atttypid GROUP BY 1,2 ORDER BY parent LIKE 'unused%', regexp_replace(colcld.child, '.*_((([0-9]{4}_[0-9]{2})_[0-9]{2})|(([0-9]{6})([0-9]{2})?))$', '\\3\\5') DESC, regexp_replace(colcld.child, '.*_', '') DESC LIMIT 1"; do
>
> I'm also trying to reproduce on other production servers. But so far nothing
> else has shown the bug, including the other server which hit our original
> (other) DSA error with the queued_alters query. So I tentatively think there
> really may be something specific to the server (not the hypervisor so maybe the
> OS, libraries, kernel, scheduler, ??).
>
> Find the schema for that table here:
> https://www.postgresql.org/message-id/20181231221734.GB25379%40telsasoft.com
>
> Note, for unrelated reasons, that query was also previously discussed here:
> https://www.postgresql.org/message-id/20171110204043.GS8563%40telsasoft.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-02-11 04:08:13 Re: anole's failed timeouts test
Previous Message Thomas Munro 2019-02-11 03:50:43 anole's failed timeouts test