Re: Why we are going to have to go DirectIO

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Jonathan Corbet <corbet(at)lwn(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Why we are going to have to go DirectIO
Date: 2013-12-04 19:07:04
Message-ID: 529F7D58.1060301@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/04/2013 07:33 AM, Jonathan Corbet wrote:
> Wow, Josh, I'm surprised to hear this from you.

Well, I figured it was too angry to propose for an LWN article. ;-)

> The active/inactive list mechanism works great for the vast majority of
> users. The second-use algorithm prevents a lot of pathological behavior,
> like wiping out your entire cache by copying a big file or running a
> backup. We *need* that kind of logic in the kernel.

There's a large body of research on 2Q algorithms going back to the 80s,
which is what this is. As far as I can tell, the modification was
performed without any reading of this research, since that would have
easily shown that 50/50 was unlikely to be a good division, and that in
fact there is nothing which would work except a tunable setting, because
workloads are different. Certainly the "what happens if a single file
is larger than the entire recency bucket" question is addressed and debated.

As an example, PostgreSQL would want to shrink the frequency list to 0%,
because we already implement our own frequency list, and we already
demonstrated back in version 8.1 that a 3-list system was ineffective.

I can save Johannes some time: don't implement ARC. Not only is it
under IBM patent, it's not effective in real-world situations. Both
Postgres and Apache tried it in the early aughts.

However, this particular issue concerns me less than the general
attitude that it's OK to push in experimental IO changes which can't be
disabled by users into release kernels, as exemplified by several
problematic and inadequately tested IO changes in the 3.X kernels --
most notably the pdflush bug. It speaks of a policy that the Linux IO
stack is not production software, and it's OK to tinker with it in ways
that break things for many users.

I also wasn't exaggerating the reception I got when I tried to talk
about IO and PostgreSQL at LinuxCon and other events. The majority of
Linux hackers I've talked to simply don't want to be bothered with
PostgreSQL's performance needs, and I've heard similar things from my
collegues at the MySQL variants. Greg KH was the only real exception.

Heck, I went to a meeting of filesystem geeks at LinuxCon and the main
feedback I received, from Linux FS developers (Chris and Ted), was
"PostgreSQL should implement its own storage and use DirectIO, we don't
know why you're even trying to use the Linux IO stack." That's why I
gave up on working through community channels; I face enough uphill
battles in *this* project.

> This code has been a bit slow getting into the mainline for a few reasons,
> but one of the chief ones is this: nobody is saying from the sidelines
> that they need it! If somebody were saying "Postgres would work a lot
> better with this code in place" and had some numbers to demonstrate that,
> we'd be far more likely to see it get into an upcoming release.

Well, Citus did that; do you need more evidence?

> In the end, Linux is quite responsive to the people who participate in its
> development, even as testers and bug reporters. It responds rather less
> well to people who find problems in enterprise kernels years later,
> granted.

All infrastructure software, including Postgres, has the issue that most
enterprise users are using a version which was released years ago. As a
result, some performance issues simply aren't going to be found until
that version has been out for a couple of years. This leads to a
Catch-22: enterprise users are reluctant to upgrade because of potential
performance regressions, and as a result the median "enterprise" version
gets further and further behind current development, and as a result the
performance regressions are never fixed.

We encounter this in PostgreSQL (I have customers who are still on 8.4
or 9.1 because of specific regressions), and it's even worse in the
Linux world, where RHEL is still on 2.6. We work really hard to avoid
performance regressions in Postgres versions, because we know we can't
test for them adequately, and often can't fix them in release versions
after the fact.

But you know what? 2.6, overall, still performs better than any kernel
in the 3.X series, at least for Postgres.

> The amount of automated testing, including performance testing, has
> increased markedly in the last couple of years. I bet that it would not
> be hard at all to get somebody like Fengguang Wu to add some
> Postgres-oriented I/O tests to his automatic suite:
>
> https://lwn.net/Articles/571991/
>
> Then we would all have a much better idea of how kernel releases are
> affecting one of our most important applications; developers would pay
> attention to that information.

Oh, good! I was working with Greg on having an automated pgBench run,
but doing it on Wu's testing platform would be even better. I still
need to get some automated stats digestion, since I want to at least
make sure that the tests would show the three major issues which we
encountered in recent Linux kernels so far. Of course, I have a "free
time" issue, which is being discussed on the other fork of this thread.

In addition to testing, though, I have yet to find a way to learn about
new changes to IO or memory performance in the Linux Kernel without
reading all of the traffic on LKML and all Linux commit messages and
filtering them myself. If there were a better way to look for this
information, Linux would be more likely to get feedback in a timely
fashion. And yeah, I know that Postgres has the same issue.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2013-12-04 19:31:28 Re: WITHIN GROUP patch
Previous Message Álvaro Hernández Tortosa 2013-12-04 19:02:23 Re: RFC: programmable file format for postgresql.conf