Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance

From: Jeff Layton <jlayton(at)redhat(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jan Kara <jack(at)suse(dot)cz>, Andres Freund <andres(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Trond Myklebust <trondmy(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Dave Chinner <david(at)fromorbit(dot)com>, Joshua Drake <jd(at)commandprompt(dot)com>, Bottomley James <James(dot)Bottomley(at)hansenpartnership(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Mel Gorman <mgorman(at)suse(dot)de>, Jim Nasby <jim(at)nasby(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date: 2014-01-16 13:20:05
Message-ID: 20140116082005.68e865ac@tlielax.poochiereds.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 15 Jan 2014 21:37:16 -0500
Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Wed, Jan 15, 2014 at 8:41 PM, Jan Kara <jack(at)suse(dot)cz> wrote:
> > On Wed 15-01-14 10:12:38, Robert Haas wrote:
> >> On Wed, Jan 15, 2014 at 4:35 AM, Jan Kara <jack(at)suse(dot)cz> wrote:
> >> > Filesystems could in theory provide facility like atomic write (at least up
> >> > to a certain size say in MB range) but it's not so easy and when there are
> >> > no strong usecases fs people are reluctant to make their code more complex
> >> > unnecessarily. OTOH without widespread atomic write support I understand
> >> > application developers have similar stance. So it's kind of chicken and egg
> >> > problem. BTW, e.g. ext3/4 has quite a bit of the infrastructure in place
> >> > due to its data=journal mode so if someone on the PostgreSQL side wanted to
> >> > research on this, knitting some experimental ext4 patches should be doable.
> >>
> >> Atomic 8kB writes would improve performance for us quite a lot. Full
> >> page writes to WAL are very expensive. I don't remember what
> >> percentage of write-ahead log traffic that accounts for, but it's not
> >> small.
> > OK, and do you need atomic writes on per-IO basis or per-file is enough?
> > It basically boils down to - is all or most of IO to a file going to be
> > atomic or it's a smaller fraction?
>
> The write-ahead log wouldn't need it, but data files writes would. So
> we'd need it a lot, but not for absolutely everything.
>
> For any given file, we'd either care about writes being atomic, or we wouldn't.
>

Just getting caught up on this thread. One thing that you're just now
getting to here is that the different types of files in the DB have
different needs.

It might be good to outline each type of file (WAL, data files, tmp
files), what sort of I/O patterns are typically done to them, and what
sort of "special needs" they have (atomicity or whatever). Then we
could treat each file type as a separate problem, which may make some
of these problems easier to solve.

For instance, typically a WAL would be fairly sequential I/O, whereas
the data files are almost certainly random. It may make sense to
consider DIO for some of these use-cases, even if it's not suitable
everywhere.

For tempfiles, it may make sense to consider housing those on tmpfs.
They wouldn't go to disk at all that way, but if there is mem pressure
they could get swapped out (maybe this is standard practice already --
I don't know).

> > As Dave notes, unless there is HW support (which is coming with newest
> > solid state drives), ext4/xfs will have to implement this by writing data
> > to a filesystem journal and after transaction commit checkpointing them to
> > a final location. Which is exactly what you do with your WAL logs so
> > it's not clear it will be a performance win. But it is easy enough to code
> > for ext4 that I'm willing to try...
>
> Yeah, hardware support would be great.
>

--
Jeff Layton <jlayton(at)redhat(dot)com>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-01-16 13:21:34 Re: Why conf.d should be default, and auto.conf and recovery.conf should be in it
Previous Message Peter Eisentraut 2014-01-16 13:04:54 Re: Why conf.d should be default, and auto.conf and recovery.conf should be in it