Re: Postgres, fsync, and OSs (specifically linux)

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Postgres, fsync, and OSs (specifically linux)
Date: 2018-05-22 19:58:06
Message-ID: CA+q6zcV6Ckt0r3AgCzeqt74MR78u3p0+Nr6FNv===NuD3XzCTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 22 May 2018 at 20:59, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2018-05-22 20:54:46 +0200, Dmitry Dolgov wrote:
>> > On 22 May 2018 at 18:47, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> > On 2018-05-22 08:57:18 -0700, Andres Freund wrote:
>> >> Hi,
>> >>
>> >>
>> >> On 2018-05-22 17:37:28 +0200, Dmitry Dolgov wrote:
>> >> > Thanks for the patch. Out of curiosity I tried to play with it a bit.
>> >>
>> >> Thanks.
>> >>
>> >>
>> >> > `pgbench -i -s 100` actually hang on my machine, because the
>> >> > copy process ended up with waiting after `pg_uds_send_with_fd`
>> >> > had
>> >>
>> >> Hm, that had worked at some point...
>> >>
>> >>
>> >> > errno == EWOULDBLOCK || errno == EAGAIN
>> >> >
>> >> > as well as the checkpointer process.
>> >>
>> >> What do you mean with that latest sentence?
>>
>> To investigate what's happening I attached with gdb to two processes, COPY
>> process from pgbench and checkpointer (since I assumed it may be involved).
>> Both were waiting in WaitLatchOrSocket right after SendFsyncRequest.
>
> Huh? Checkpointer was in SendFsyncRequest()? Coudl you share the
> backtrace?

Well, that's what I've got from gdb:

#0 0x00007fae03fae9f3 in __epoll_wait_nocancel () at
../sysdeps/unix/syscall-template.S:84
#1 0x000000000077a979 in WaitEventSetWaitBlock (nevents=1,
occurred_events=0x7ffe37529ec0, cur_timeout=-1, set=0x23cddf8) at
latch.c:1048
#2 WaitEventSetWait (set=set(at)entry=0x23cddf8,
timeout=timeout(at)entry=-1,
occurred_events=occurred_events(at)entry=0x7ffe37529ec0,
nevents=nevents(at)entry=1, wait_event_info=wait_event_info(at)entry=0) at
latch.c:1000
#3 0x000000000077ad08 in WaitLatchOrSocket
(latch=latch(at)entry=0x0, wakeEvents=wakeEvents(at)entry=4, sock=8,
timeout=timeout(at)entry=-1, wait_event_info=wait_event_info(at)entry=0) at
latch.c:385
#4 0x00000000007152cb in SendFsyncRequest
(request=request(at)entry=0x7ffe37529f40, fd=fd(at)entry=-1) at
checkpointer.c:1345
#5 0x0000000000716223 in AbsorbAllFsyncRequests () at checkpointer.c:1207
#6 0x000000000079a5f0 in mdsync () at md.c:1339
#7 0x000000000079c672 in smgrsync () at smgr.c:766
#8 0x000000000076dd53 in CheckPointBuffers (flags=flags(at)entry=64)
at bufmgr.c:2581
#9 0x000000000051c681 in CheckPointGuts
(checkPointRedo=722254352, flags=flags(at)entry=64) at xlog.c:9079
#10 0x0000000000523c4a in CreateCheckPoint (flags=flags(at)entry=64)
at xlog.c:8863
#11 0x0000000000715f41 in CheckpointerMain () at checkpointer.c:494
#12 0x00000000005329f4 in AuxiliaryProcessMain (argc=argc(at)entry=2,
argv=argv(at)entry=0x7ffe3752a220) at bootstrap.c:451
#13 0x0000000000720c28 in StartChildProcess
(type=type(at)entry=CheckpointerProcess) at postmaster.c:5340
#14 0x0000000000721c23 in reaper (postgres_signal_arg=<optimized
out>) at postmaster.c:2875
#15 <signal handler called>
#16 0x00007fae03fa45b3 in __select_nocancel () at
../sysdeps/unix/syscall-template.S:84
#17 0x0000000000722968 in ServerLoop () at postmaster.c:1679
#18 0x0000000000723cde in PostmasterMain (argc=argc(at)entry=3,
argv=argv(at)entry=0x23a00e0) at postmaster.c:1388
#19 0x000000000068979f in main (argc=3, argv=0x23a00e0) at main.c:228

>> >> > Looks like with the default
>> >> > configuration and `max_wal_size=1GB` it writes more than reads to a
>> >> > socket, and a buffer eventually becomes full.
>> >>
>> >> That's intended to then wake up the checkpointer immediately, so it can
>> >> absorb the requests. So something isn't right yet.
>> >
>> > Doesn't hang here, but it's way too slow.
>>
>> Yep, in my case it was also getting slower, but eventually hang.
>>
>> > Reason for that is that I've wrongly resolved a merge conflict. Attached is a
>> > fixup patch - does that address the issue for you?
>>
>> Hm...is it a correct patch? I see the same committed in
>> 8c3debbbf61892dabd8b6f3f8d55e600a7901f2b, so I can't really apply it.
>
> Yea, sorry for that. Too many files in my patch directory... Right one
> attached.

Yes, this patch solves the problem, thanks.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-05-22 20:02:00 Re: Commit fest 2017-11
Previous Message Matthew Stickney 2018-05-22 19:53:09 Re: [PATCH] (Windows) psql echoes password when reading from pipe