Quick Links

Re: SIGSEGV in BRIN autosummarize

From:	Justin Pryzby <pryzby(at)telsasoft(dot)com>
To:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject:	Re: SIGSEGV in BRIN autosummarize
Date:	2017-10-15 01:56:56
Message-ID:	20171015015656.GC22678@telsasoft.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Oct 13, 2017 at 10:57:32PM -0500, Justin Pryzby wrote:
> > Also notice the vacuum process was interrupted, same as yesterday (think
> > goodness for full logs). Our INSERT script is using python
> > multiprocessing.pool() with "maxtasksperchild=1", which I think means we load
> > one file and then exit the subprocess, and pool() creates a new subproc, which
> > starts a new PG session and transaction. Which explains why autovacuum starts
> > processing the table only to be immediately interrupted.

On Sun, Oct 15, 2017 at 01:57:14AM +0200, Tomas Vondra wrote:
> I don't follow. Why does it explain that autovacuum gets canceled? I
> mean, merely opening a new connection/session should not cancel
> autovacuum. That requires a command that requires table-level lock
> conflicting with autovacuum (so e.g. explicit LOCK command, DDL, ...).

I was thinking that INSERT would do it, but I gather you're right about
autovacuum. Let me get back to you about this..

> > Due to a .."behavioral deficiency" in the loader for those tables, the crashed
> > backend causes the loader to get stuck, so the tables should be untouched since
> > the crash, should it be desirable to inspect them.
> >
>
> It's a bit difficult to guess what went wrong from this backtrace. For
> me gdb typically prints a bunch of lines immediately before the frames,
> explaining what went wrong - not sure why it's missing here.

Do you mean this ?

...
Loaded symbols for /lib64/libnss_files-2.12.so
Core was generated by `postgres: autovacuum worker process gtt '.
Program terminated with signal 11, Segmentation fault.
#0 pfree (pointer=0x298c740) at mcxt.c:954
954 (*context->methods->free_p) (context, pointer);

> Perhaps some of those pointers are bogus, the memory was already pfree-d
> or something like that. You'll have to poke around and try dereferencing
> the pointers to find what works and what does not.
>
> For example what do these gdb commands do in the #0 frame?
>
> (gdb) p *(MemoryContext)context

(gdb) p *(MemoryContext)context
Cannot access memory at address 0x7474617261763a20

> (gdb) p *GetMemoryChunkContext(pointer)

(gdb) p *GetMemoryChunkContext(pointer)
No symbol "GetMemoryChunkContext" in current context.

I had to do this since it's apparently inlined/macro:
(gdb) p *(MemoryContext *) (((char *) pointer) - sizeof(void *))
$8 = (MemoryContext) 0x7474617261763a20

I uploaded the corefile:
http://telsasoft.com/tmp/coredump-postgres-autovacuum-brin-summarize.gz

Justin

In response to

Re: SIGSEGV in BRIN autosummarize at 2017-10-14 23:57:14 from Tomas Vondra

Responses

Re: SIGSEGV in BRIN autosummarize at 2017-10-15 12:44:58 from Tomas Vondra
Re: SIGSEGV in BRIN autosummarize at 2017-10-15 17:08:05 from Justin Pryzby

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Pavel Stehule	2017-10-15 10:06:11	Re: proposal - Default namespaces for XPath expressions (PostgreSQL 11)
Previous Message	Joe Conway	2017-10-15 01:51:39	Re: pg_regress help output