Re: 9.4 regression

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
Cc: Hannu Krosing <hannu(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thom Brown <thom(at)linux(dot)com>
Subject: Re: 9.4 regression
Date: 2013-08-09 06:20:02
Message-ID: 20130809062002.GN14729@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-08-08 22:58:42 -0500, Jon Nelson wrote:
> On Thu, Aug 8, 2013 at 9:27 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2013-08-08 16:12:06 -0500, Jon Nelson wrote:
> ...
>
> >> At this point I'm convinced that the issue is a pathological case in
> >> ext4. The performance impact disappears as soon as the unwritten
> >> extent(s) are written to with real data. Thus, even though allocating
> >> files with posix_fallocate is - frequently - orders of magnitude
> >> quicker than doing it with write(2), the subsequent re-write can be
> >> more expensive. At least, that's what I'm gathering from the various
> >> threads.
> >
> >
> >> Why this issue didn't crop up in earlier testing and why I
> >> can't seem to make test_fallocate do it (even when I modify
> >> test_fallocate to write to the newly-allocated file in a mostly-random
> >> fashion) has me baffled.
> >
> > It might be kernel version specific and concurrency seems to play a
> > role. If you reproduce the problem, could you run a "perf record -ga" to
> > collect a systemwide profile?
>
> Finally, an excuse to learn how to use 'perf'! I'll try to provide
> that info when I am able.

Running perf record as above during the first minute and then doing a
perf report > somefile (redirected to get the noninteractive version)
should get you started.

> > There's some more things to test:
> > - is the slowdown dependent on the scale? I.e is it visible with -j 1 -c
> > 1?
>
> scale=1 (-j 1 -c 1):
> with fallocate: 685 tps
> without: 727
>
> scale=20
> with fallocate: 129
> without: 402
>
> scale=40
> with fallocate: 163
> without: 511

Ok, so there's some clear correlation with the amount of writers.

> > - Does it also occur in synchronous_commit=off configurations? Those
> > don't fdatasync() from so many backends, that might play a role.
>
> With synchronous_commit=off, the performance is vastly improved.
> Interestingly, the fallocate case is (immaterially) faster than the
> non-fallocate case: 3766tps vs 3700tps.

That's interesting because in the synchronous_commit=off case most of
the writing and syncing should be done by the wal writer. So there's
another hint that there's some scalability issue causing place,
presumably in the kernel.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Vik Fearing 2013-08-09 07:44:28 Re: [PATCH] Statistics collection for CLUSTER command
Previous Message Tom Lane 2013-08-09 05:48:43 Re: pg_dump and schema names