Re: Race conditions with checkpointer and shutdown

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ashwin Agrawal <aagrawal(at)pivotal(dot)io>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Race conditions with checkpointer and shutdown
Date: 2019-04-29 17:35:59
Message-ID: 29929.1556559359@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ashwin Agrawal <aagrawal(at)pivotal(dot)io> writes:
> For Greenplum (based on 9.4 but current master code looks the same) we
> did see deadlocks recently hit in CI many times for walreceiver which
> I believe confirms above finding.

> #0 __lll_lock_wait_private () at
> ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
> #1 0x00007f0637ee72bd in _int_free (av=0x7f063822bb20 <main_arena>,
> p=0x26bb3b0, have_lock=0) at malloc.c:3962
> #2 0x00007f0637eeb53c in __GI___libc_free (mem=<optimized out>) at
> malloc.c:2968
> #3 0x00007f0636629464 in ?? () from /usr/lib/x86_64-linux-gnu/libgnutls.so.30
> #4 0x00007f0636630720 in ?? () from /usr/lib/x86_64-linux-gnu/libgnutls.so.30
> #5 0x00007f063b5cede7 in _dl_fini () at dl-fini.c:235
> #6 0x00007f0637ea0ff8 in __run_exit_handlers (status=1,
> listp=0x7f063822b5f8 <__exit_funcs>,
> run_list_atexit=run_list_atexit(at)entry=true) at exit.c:82
> #7 0x00007f0637ea1045 in __GI_exit (status=<optimized out>) at exit.c:104
> #8 0x00000000008c72c7 in proc_exit ()
> #9 0x0000000000a75867 in errfinish ()
> #10 0x000000000089ea53 in ProcessWalRcvInterrupts ()
> #11 0x000000000089eac5 in WalRcvShutdownHandler ()
> #12 <signal handler called>
> #13 _int_malloc (av=av(at)entry=0x7f063822bb20 <main_arena>,
> bytes=bytes(at)entry=16384) at malloc.c:3802
> #14 0x00007f0637eeb184 in __GI___libc_malloc (bytes=16384) at malloc.c:2913
> #15 0x00000000007754c3 in makeEmptyPGconn ()
> #16 0x0000000000779686 in PQconnectStart ()
> #17 0x0000000000779b8b in PQconnectdb ()
> #18 0x00000000008aae52 in libpqrcv_connect ()
> #19 0x000000000089f735 in WalReceiverMain ()
> #20 0x00000000005c5eab in AuxiliaryProcessMain ()
> #21 0x00000000004cd5f1 in ServerLoop ()
> #22 0x000000000086fb18 in PostmasterMain ()
> #23 0x00000000004d2e28 in main ()

Cool --- that stack trace is *exactly* what you'd expect if this
were the problem. Thanks for sending it along!

Can you try applying a1a789eb5ac894b4ca4b7742f2dc2d9602116e46
to see if it fixes the problem for you?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-04-29 17:43:57 Re: CHAR vs NVARCHAR vs TEXT performance
Previous Message Tom Lane 2019-04-29 17:32:13 Re: "long" type is not appropriate for counting tuples