Re: Current CVS tip segfaulting

From: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Current CVS tip segfaulting
Date: 2004-04-24 21:31:26
Message-ID: 20040424213126.GA5312@dcc.uchile.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 24, 2004 at 12:27:14AM -0400, Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > It could be a bug, but if it is, it is a different fix than the one I
> > did, I think.
>
> Re-reading Alvaro's message, I wondered if cranking logging up to a
> higher-than-default setting was needed to reproduce the bug. A quick
> experiment in that line didn't show a problem, but maybe I missed the
> critical setting. Alvaro, what postgresql.conf settings are you using?

I don't touch the standard settings ... log values are from the default
installation.

In another mail you asked:

> Which PS_USE_FOO option does your platform use? (See
> src/backend/utils/misc/ps_status.c)

PS_USE_CLOBBER_ARGV AFAICS (ugh, sure uppercase is ugly) ;-)

The relevant strace extract is this (3448 is the backend, 3443 is
postmaster):

3448 write(2, "FATAL: database \"asd\" does not exist\n", 38) = 38
3448 send(10, "R\0\0\0\10\0\0\0\0E\0\0\0\217SFATAL\0C3D000\0Mdatabase \"asd\" does not exist\0F/home/alvherre/CVS/pgsql/source/00orig/src/backend/utils/init/postinit.c \0L264\0RInitPostgres\0\0", 153, 0) = 153
3448 --- SIGSEGV (Segmentation fault) @ 0 (0) ---
3443 <... select resumed> ) = ? ERESTARTNOHAND (To be restarted)
3443 --- SIGCHLD (Child exited) @ 0 (0) ---

Note that the ereport() did get the line number, file and function name, the
correct database name, etc. I don't know if the code is changing the ps status
after that; it's difficult to attach a debugger to this ... huh wait, I'll try the
backend's developer switches.

... plays for a while ...

Heh, the -s switch to postmaster seems to behave funny. The bgwriter process
appears in T status in ps (stopped), but not the postmaster; if I then send
SIGCONT to the bgwriter it seems to continue, it returns to S status but
then postmaster doesn't respond correctly to signals (INT or TERM don't shut
it down). Has it been always like this? I haven't used this switch before.

Anyway, this doesn't allow me to examine the dead backend. Trying
postmaster -o "-W 60"
allows me to attach gdb to the backend before it dies:

(gdb) bt
#0 0xffffe410 in ?? ()
#1 0xbfffeda8 in ?? ()
#2 0x4025f800 in ?? () from /lib/tls/libc.so.6
#3 0xbfffec04 in ?? ()
#4 0x401cb460 in nanosleep () from /lib/tls/libc.so.6
#5 0x401cb263 in sleep () from /lib/tls/libc.so.6
#6 0x0818791e in PostgresMain (argc=6, argv=0x82dff18,
username=0x82dfee0 "alvherre") at stdlib.h:382
#7 0x0815fab0 in BackendRun (port=0x82ed050)
at /home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:2664
#8 0x0815f371 in BackendStartup (port=0x82ed050)
at /home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:2297
#9 0x0815db6e in ServerLoop ()
at /home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:1167
#10 0x0815d157 in PostmasterMain (argc=3, argv=0x82deb80)
at /home/alvherre/CVS/pgsql/source/00orig/src/backend/postmaster/postmaster.c:928
#11 0x0812f030 in main (argc=3, argv=0x82deb80)
at /home/alvherre/CVS/pgsql/source/00orig/src/backend/main/main.c:257
(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00000000 in ?? ()
(gdb) bt
#0 0x00000000 in ?? ()

Whoa! New backend, new gdb, try again:

(gdb) break InitPostgres
Breakpoint 1 at 0x81f3c3c: file /home/alvherre/CVS/pgsql/source/00orig/src/backend/utils/init/postinit.c, line 230.
(gdb) cont
Continuing.

Breakpoint 1, InitPostgres (dbname=0xc <Address 0xc out of bounds>,
username=0x80e2540 "U\211åSPè\222Îøÿ\200= ±*\b")
at /home/alvherre/CVS/pgsql/source/00orig/src/backend/utils/init/postinit.c:230
230 bool bootstrap = IsBootstrapProcessingMode();
(gdb)

This surely looks suspicious ...

(gdb) p dbname
$2 = 0xc <Address 0xc out of bounds>
(gdb) frame 1
#1 0x08187581 in PostgresMain (argc=6, argv=0x82dff18,
username=0x82dfee0 "alvherre")
at /home/alvherre/CVS/pgsql/source/00orig/src/backend/tcop/postgres.c:2745
2745 InitPostgres(dbname, username);
(gdb) p argv
$3 = (char **) 0x82dff18
(gdb) p argv[0]
$5 = 0x8265402 "postgres"
(gdb) p argv[1]
$6 = 0x82aa301 "-W"
(gdb) p argv[2]
$7 = 0x82aa304 "60"
(gdb) p argv[3]
$8 = 0xbfffee60 "-v196608"
(gdb) p argv[4]
$9 = 0x826d97a "-p"
(gdb) p argv[5]
$10 = 0x82dfefc "asd"
(gdb) p argv[6]
$11 = 0x0
(gdb) p dbname
$12 = 0x82ea848 "asd"

-- Note that this is not the same as argv[5], it's a copy, and as far as
I can see, it's set by the -p option in the switch/case, in tcop/postgres.c
line 2391, using strdup.

What else?

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Syntax error: function hell() needs an argument.
Please choose what hell you want to involve.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2004-04-24 21:41:14 Re: Current CVS tip segfaulting
Previous Message Tom Lane 2004-04-24 20:13:19 Re: Invalid pg_hba.conf => Postgres crash