Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance

From: Jan Kara <jack(at)suse(dot)cz>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Andres Freund <andres(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, "lsf-pc(at)lists(dot)linux-foundation(dot)org" <lsf-pc(at)lists(dot)linux-foundation(dot)org>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Dave Chinner <david(at)fromorbit(dot)com>, Joshua Drake <jd(at)commandprompt(dot)com>, Bottomley James <James(dot)Bottomley(at)HansenPartnership(dot)com>, Claudio Freire <klaussfreire(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mel Gorman <mgorman(at)suse(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Trond Myklebust <trondmy(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject: Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Date: 2014-01-15 09:35:44
Message-ID: 20140115093544.GB6732@quack.suse.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed 15-01-14 10:27:26, Heikki Linnakangas wrote:
> On 01/15/2014 06:01 AM, Jim Nasby wrote:
> >For the sake of completeness... it's theoretically silly that Postgres
> >is doing all this stuff with WAL when the filesystem is doing something
> >very similar with it's journal. And an SSD drive (and next generation
> >spinning rust) is doing the same thing *again* in it's own journal.
> >
> >If all 3 communities (or even just 2 of them!) could agree on the
> >necessary interface a tremendous amount of this duplicated technology
> >could be eliminated.
> >
> >That said, I rather doubt the Postgres community would go this route,
> >not so much because of the presumably massive changes needed, but more
> >because our community is not a fan of restricting our users to things
> >like "Thou shalt use a journaled FS or risk all thy data!"
>
> The WAL is also used for continuous archiving and replication, not
> just crash recovery. We could skip full-page-writes, though, if we
> knew that the underlying filesystem/storage is guaranteeing that a
> write() is atomic.
>
> It might be useful for PostgreSQL somehow tell the filesystem that
> we're taking care of WAL-logging, so that the filesystem doesn't
> need to.
Well, journalling fs generally cares about its metadata consistency. We
have much weaker guarantees regarding file data because those guarantees
come at a cost most people don't want to pay.

Filesystems could in theory provide facility like atomic write (at least up
to a certain size say in MB range) but it's not so easy and when there are
no strong usecases fs people are reluctant to make their code more complex
unnecessarily. OTOH without widespread atomic write support I understand
application developers have similar stance. So it's kind of chicken and egg
problem. BTW, e.g. ext3/4 has quite a bit of the infrastructure in place
due to its data=journal mode so if someone on the PostgreSQL side wanted to
research on this, knitting some experimental ext4 patches should be doable.

Honza
--
Jan Kara <jack(at)suse(dot)cz>
SUSE Labs, CR

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2014-01-15 09:37:42 Re: Failed assertion root->hasLateralRTEs on initsplan.c
Previous Message Marko Tiikkaja 2014-01-15 09:08:30 Re: plpgsql.warn_shadow