Re: BUG #13490: Segmentation fault on pg_stat_activity

From: Michael Bommarito <michael(at)bommaritollc(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13490: Segmentation fault on pg_stat_activity
Date: 2015-07-18 16:09:55
Message-ID: CAN=rtBhSfzWC4H10pdFLevEWTnu8U8yn_2G14-MmewbjoVPMrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Compiled from source with --debug using 9.5alpha1 with -O0 -ggdb
-fno-omit-frame-pointer -mno-red-zone. Reset to default postgresql.conf.
Was able to generate segfaults repeatedly loading pghero dashboard. First,
some errors from the pg logs.
========================================================================
========================================================
2015-07-18 15:07:58 UTC [27112-1] postgres(at)database ERROR: attribute
number 2 exceeds number of columns 0

2015-07-18 15:07:58 UTC [27112-2] postgres(at)database STATEMENT: SELECT
application_name AS source, client_addr AS ip, COUNT(*) AS
total_connections FROM pg_stat_activity WHERE pid <> pg_backend_pid() GROUP
BY application_name, ip ORDER BY COUNT(*) DESC, application_name ASC,
client_addr ASC

2015-07-18 15:08:23 UTC [27112-3] postgres(at)database ERROR: invalid
varattno 66

2015-07-18 15:08:23 UTC [27112-4] postgres(at)database STATEMENT: SELECT
relname AS table, indexrelname AS index,
pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size, idx_scan as
index_scans FROM pg_stat_user_indexes ui INNER JOIN pg_index i ON
ui.indexrelid = i.indexrelid WHERE NOT indisunique AND idx_scan < 50 ORDER
BY pg_relation_size(i.indexrelid) DESC, relname ASC

2015-07-18 15:17:19 UTC [3605-1] postgres(at)database ERROR: tupdesc
reference 0x2bdd8a8 is not owned by resource owner Portal
========================================================================
========================================================

Next, attached to the backend PID under sudo gdb and `cont`:
========================================================================
========================================================
2015-07-18 15:48:38 UTC [10281-1] postgres(at)database ERROR: tupdesc
reference 0xf77248 is not owned by resource owner Portal
2015-07-18 15:48:54 UTC [8812-4] LOG: server process (PID 10538) was
terminated by signal 11: Segmentation fault
2015-07-18 15:48:54 UTC [8812-5] LOG: terminating any other active server
processes
2015-07-18 15:48:54 UTC [10523-1] postgres(at)database WARNING: terminating
connection because of crash of another server process
2015-07-18 15:48:54 UTC [10523-2] postgres(at)database DETAIL: The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
2015-07-18 15:48:54 UTC [10523-3] postgres(at)database HINT: In a moment you
should be able to reconnect to the database and repeat your command.
2015-07-18 15:48:54 UTC [10239-1] postgres(at)database WARNING: terminating
connection because of crash of another server process
2015-07-18 15:48:54 UTC [10239-2] postgres(at)database DETAIL: The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
2015-07-18 15:48:54 UTC [10239-3] postgres(at)database HINT: In a moment you
should be able to reconnect to the database and repeat your command.
2015-07-18 15:48:54 UTC [10522-1] postgres(at)database WARNING: terminating
connection because of crash of another server process
2015-07-18 15:48:54 UTC [10522-2] postgres(at)database DETAIL: The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
2015-07-18 15:48:54 UTC [10522-3] postgres(at)database HINT: In a moment you
should be able to reconnect to the database and repeat your command.
2015-07-18 15:48:54 UTC [10409-1] postgres(at)database WARNING: terminating
connection because of crash of another server process
2015-07-18 15:48:54 UTC [10409-2] postgres(at)database DETAIL: The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
2015-07-18 15:48:54 UTC [10409-3] postgres(at)database HINT: In a moment you
should be able to reconnect to the database and repeat your command.
2015-07-18 15:48:54 UTC [10408-1] postgres(at)database WARNING: terminating
connection because of crash of another server process
2015-07-18 15:48:54 UTC [10408-2] postgres(at)database DETAIL: The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.
2015-07-18 15:48:54 UTC [10408-3] postgres(at)database HINT: In a moment you
should be able to reconnect to the database and repeat your command.

Program received signal SIGQUIT, Quit.
0x00007f84fe78c110 in __poll_nocancel () at
../sysdeps/unix/syscall-template.S:81
81 in ../sysdeps/unix/syscall-template.S
(gdb) bt full
Python Exception <class 'gdb.MemoryError'> Cannot access memory at address
0x7ffc65188bd8:
#0 0x00007f84fe78c110 in __poll_nocancel () at
../sysdeps/unix/syscall-template.S:81
No locals.
Cannot access memory at address 0x7ffc65188bd8
========================================================================
========================================================

Thanks,
Michael J. Bommarito II, CEO
Bommarito Consulting, LLC
*Web:* http://www.bommaritollc.com
*Mobile:* +1 (646) 450-3387

On Tue, Jul 14, 2015 at 8:45 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Michael Bommarito <michael(at)bommaritollc(dot)com> writes:
> > If you can provide a patch that performs input validation in
> > get_tle_by_resno and logs the condition, I can compile and test with it.
>
> Wouldn't prove anything one way or another. Somehow, a corrupt query tree
> is being fed to the planner; what we need to understand is what conditions
> cause that to happen. I doubt that getting more details at the point
> where the code trips over the corruption will teach us that.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Julien Rouhaud 2015-07-18 16:16:09 Re: BUG #12379: pgbench should hint to pgbench -i
Previous Message Alvaro Herrera 2015-07-18 15:54:21 Re: Lack of Sanity Checking in file 'pctcl.c' for PostgreSQL 9.4.x