Re: XLogInsert scaling, revisited

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XLogInsert scaling, revisited
Date: 2013-06-22 11:32:46
Message-ID: 51C58B5E.7030102@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21.06.2013 21:55, Jeff Janes wrote:
> I think I'm getting an undetected deadlock between the checkpointer and a
> user process running a TRUNCATE command.
>
> This is the checkpointer:
>
> #0 0x0000003a73eeaf37 in semop () from /lib64/libc.so.6
> #1 0x00000000005ff847 in PGSemaphoreLock (sema=0x7f8c0a4eb730,
> interruptOK=0 '\000') at pg_sema.c:415
> #2 0x00000000004b0abf in WaitOnSlot (upto=416178159648) at xlog.c:1775
> #3 WaitXLogInsertionsToFinish (upto=416178159648) at xlog.c:2086
> #4 0x00000000004b657a in CopyXLogRecordToWAL (write_len=32, isLogSwitch=1
> '\001', rdata=0x0, StartPos=<value optimized out>, EndPos=416192397312)
> at xlog.c:1389
> #5 0x00000000004b6fb2 in XLogInsert (rmid=0 '\000', info=<value optimized
> out>, rdata=0x7fff00000020) at xlog.c:1209
> #6 0x00000000004b7644 in RequestXLogSwitch () at xlog.c:8748

Hmm, it looks like the xlog-switch is trying to wait for itself to
finish. The concurrent TRUNCATE is just being blocked behind the
xlog-switch, which is stuck on itself.

I wasn't able to reproduce exactly that, but I got a PANIC by running
pgbench and concurrently doing "select pg_switch_xlog()" many times in psql.

Attached is a new version that fixes at least the problem I saw. Not
sure if it fixes what you saw, but it's worth a try. How easily can you
reproduce that?

> This is using the same testing harness as in the last round of this patch.

This one?
http://www.postgresql.org/message-id/CAMkU=1xoA6Fdyoj_4fMLqpicZR1V9GP7cLnXJdHU+iGgqb6WUw@mail.gmail.com

> Is there a way for me to dump the list of held/waiting lwlocks from gdb?

You can print out the held_lwlocks array. Or to make it more friendly,
write a function that prints it out and call that from gdb. There's no
easy way to print out who's waiting for what that I know of.

Thanks for the testing!

- Heikki

Attachment Content-Type Size
xloginsert-scale-24.patch.gz application/x-gzip 26.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2013-06-22 12:37:58 Re: Support for REINDEX CONCURRENTLY
Previous Message Simon Riggs 2013-06-22 10:39:15 Re: MemoryContextAllocHuge(): selectively bypassing MaxAllocSize