Re: Storage Manager crash at mdwrite()

From: Tareq Aljabban <dee(dot)jay23(dot)me(at)gmail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Storage Manager crash at mdwrite()
Date: 2012-03-27 14:33:40
Message-ID: CAGOe0aLib5pRdHu3Nz06e=C9-k0T317RSHSxT06xMyY1-5tdhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 16, 2012 at 8:34 PM, Greg Stark <stark(at)mit(dot)edu> wrote:

> On Fri, Mar 16, 2012 at 11:29 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> > There is a lot of difference between those two. In particular, it looks
> > like the problem you are seeing is coming from the background writer,
> > which is not running during initdb.
>
> The difference that comes to mind is that the postmaster forks. If the
> library opens any connections prior to forking and then uses them
> after forking that would work at first but it would get confused
> quickly once more than one backend tries to use the same connection.
> The data being sent would all be mixed together and they would see
> responses to requests other processes sent.
> You need to ensure that any network connections are opened up *after*
> the new processes are forked.
>

It's true.. it turned out that the reason of the problem is that HDFS has
problems when dealing with forked processes.. However, there's no clear
suggestion on how to fix this.
I attached gdb to the writer process and got the following backtrace:

#0 0xb76f0430 in __kernel_vsyscall ()
#1 0xb6b2893d in ___newselect_nocancel () at
../sysdeps/unix/syscall-template.S:82
#2 0x0840ab46 in pg_usleep (microsec=200000) at pgsleep.c:43
#3 0x0829ca9a in BgWriterNap () at bgwriter.c:642
#4 0x0829c882 in BackgroundWriterMain () at bgwriter.c:540
#5 0x0811b0ec in AuxiliaryProcessMain (argc=2, argv=0xbf982308) at
bootstrap.c:417
#6 0x082a9af1 in StartChildProcess (type=BgWriterProcess) at
postmaster.c:4427
#7 0x082a75de in reaper (postgres_signal_arg=17) at postmaster.c:2390
#8 <signal handler called>
#9 0xb76f0430 in __kernel_vsyscall ()
#10 0xb6b2893d in ___newselect_nocancel () at
../sysdeps/unix/syscall-template.S:82
#11 0x082a5b62 in ServerLoop () at postmaster.c:1391
#12 0x082a53e2 in PostmasterMain (argc=3, argv=0xa525c28) at
postmaster.c:1092
#13 0x0822dfa8 in main (argc=3, argv=0xa525c28) at main.c:188

Any ideas?

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2012-03-27 14:47:44 Re: Command Triggers patch v18
Previous Message Robert Haas 2012-03-27 14:29:58 Re: Command Triggers patch v18