Re: 9.1.1 hot standby startup gets sigbus

From: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pg Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: 9.1.1 hot standby startup gets sigbus
Date: 2011-12-02 14:31:55
Message-ID: CAK3UJREBcyVBtr8D7vMfU=uDdkjXkrPnGcuy8erYB0tMfKe1LA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

[Moving back on-list. Tom generously offered to look at the server in
question, since it seemed likely that a testcase would be difficult or
impossible to reproduce in this case]

On Fri, Dec 2, 2011 at 12:07 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Well, poking around in the process at the moment of SIGBUS, I find that
> it's computed a buffer address of
>
>        (gdb) p bufBlock
>        $12 = (Block) 0xb6c49060
>
> which appears to be perfectly reasonable: it's working with buffer
> header number 15514 and that's equal to BufferBlocks + (15514 * 8192).
> However, trying to dump that 8K area out stops before it gets to the
> end:
>
>        0xb6c4afc0:     0       0       0       0
>        0xb6c4afd0:     0       0       0       0
>        0xb6c4afe0:     0       0       0       0
>        0xb6c4aff0:     0       0       0       0
>        0xb6c4b000:     Cannot access memory at address 0xb6c4b000
>
>        (gdb) p 0xb6c4b000 - 0xb6c49060
>        $13 = 8096
>
> that is, there isn't any memory mapped beyond 0xb6c4b000, which is why
> we're getting SIGBUS.  So a significant fraction of the intended shared
> buffer array (16384 buffers) simply Ain't There.  The reason it crashes
> at this exact spot seems to be simply that this is where we've used up
> enough of the buffers to need one that's not (all) there.
>
> So the question is, has Postgres gone nuts about how big it needs its
> shared memory to be?  I don't think so.  Looking into the memory map
> for the process, we've got
>
> postgres(at)S04001011820ASA:~$ cat /proc/11187/maps
> 00110000-00112000 r-xp 00000000 fd:04 412754092                          /lib/i386-linux-gnu/libdl-2.13.so
> 00112000-00113000 r--p 00001000 fd:04 412754092                          /lib/i386-linux-gnu/libdl-2.13.so
> 00113000-00114000 rw-p 00002000 fd:04 412754092                          /lib/i386-linux-gnu/libdl-2.13.so
> 00247000-0026b000 r-xp 00000000 fd:04 412754075                          /lib/i386-linux-gnu/libm-2.13.so
> 0026b000-0026c000 r--p 00023000 fd:04 412754075                          /lib/i386-linux-gnu/libm-2.13.so
> 0026c000-0026d000 rw-p 00024000 fd:04 412754075                          /lib/i386-linux-gnu/libm-2.13.so
> 0026d000-003c7000 r-xp 00000000 fd:04 412754120                          /lib/i386-linux-gnu/libc-2.13.so
> 003c7000-003c8000 ---p 0015a000 fd:04 412754120                          /lib/i386-linux-gnu/libc-2.13.so
> 003c8000-003ca000 r--p 0015a000 fd:04 412754120                          /lib/i386-linux-gnu/libc-2.13.so
> 003ca000-003cb000 rw-p 0015c000 fd:04 412754120                          /lib/i386-linux-gnu/libc-2.13.so
> 003cb000-003ce000 rw-p 00000000 00:00 0
> 00611000-0062d000 r-xp 00000000 fd:04 412754069                          /lib/i386-linux-gnu/ld-2.13.so
> 0062d000-0062e000 r--p 0001b000 fd:04 412754069                          /lib/i386-linux-gnu/ld-2.13.so
> 0062e000-0062f000 rw-p 0001c000 fd:04 412754069                          /lib/i386-linux-gnu/ld-2.13.so
> 00995000-0099f000 r-xp 00000000 fd:04 412754174                          /lib/i386-linux-gnu/libnss_files-2.13.so
> 0099f000-009a0000 r--p 00009000 fd:04 412754174                          /lib/i386-linux-gnu/libnss_files-2.13.so
> 009a0000-009a1000 rw-p 0000a000 fd:04 412754174                          /lib/i386-linux-gnu/libnss_files-2.13.so
> 00e57000-00e58000 r-xp 00000000 00:00 0                                  [vdso]
> 08048000-08548000 r-xp 00000000 fd:04 419563409                          /home/postgres/runtime/bin/postgres
> 08548000-08549000 r--p 004ff000 fd:04 419563409                          /home/postgres/runtime/bin/postgres
> 08549000-08551000 rw-p 00500000 fd:04 419563409                          /home/postgres/runtime/bin/postgres
> 08551000-08591000 rw-p 00000000 00:00 0
> 08c03000-08c5d000 rw-p 00000000 00:00 0                                  [heap]
> aed2f000-b7503000 rw-s 00000000 91:8f 1212416
> b7503000-b7703000 r--p 00000000 fd:04 412486363                          /usr/lib/locale/locale-archive
> b7703000-b7705000 rw-p 00000000 00:00 0
> b7709000-b770a000 r--p 002dd000 fd:04 412486363                          /usr/lib/locale/locale-archive
> b770a000-b770c000 rw-p 00000000 00:00 0
> bff75000-bff97000 rw-p 00000000 00:00 0                                  [stack]
> postgres(at)S04001011820ASA:~$
>
> ie, there's alleged to be shared memory from 0xaed2f000 up to
> 0xb7503000, and that squares exactly with where PG thinks its
> shared memory is:
>
> (gdb) p UsedShmemSegAddr
> $34 = (void *) 0xaed2f000
> (gdb) p *(PGShmemHeader *) UsedShmemSegAddr
> $35 = {magic = 679834894, creatorPID = 11185, totalsize = 142426112, freeoffset = 142125504, index = 0xaedafb48,
>  device = 37259, inode = 414974343}
> (gdb) p 0xb7503000 - 0xaed2f000
> $36 = 142426112
> (gdb)
>
> So basically, you've got a broken kernel here: it claimed to give PG
> circa 135MB of memory, but what's actually there is only about 128MB.
> I don't see any connection between those numbers and the shmmax/shmall
> settings, either --- so I think this must be some busted implementation
> of a VM-level limitation.
>
> I see no Postgres bug here.  You need to take this up with your hosting
> provider, who have given you a faulty kernel.
>
>                        regards, tom lane
>

On Fri, Dec 2, 2011 at 1:03 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> ... btw, if I'm right in guessing that this is just an
> arbitrarily-imposed limit on the amount of shared memory available to
> Postgres, then knocking down shared_buffers by a quarter or so ought
> to do as a workaround, till you figure out what's going on.
>
>                        regards, tom lane
>

Wow, that's interesting, though I can't say I'm completely surprised.
You were spot on about turning down shared_buffers - I'm trying it at
96MB, down from 128MB, and the recovery process is chugging along.
I'll probably just ditch this VM and hosting provider (chvps aka
privatelayer, in case anyone wants to stay away).

Thanks for the investigation!

Josh

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Magnus Hagander 2011-12-02 15:41:12 Re: BUG #6314: The like command does not handle a long string of special chars
Previous Message laurenz.albe 2011-12-02 11:06:41 BUG #6318: pg_dump for non-template languages is broken