Re: WAL CPU overhead/optimization (was Master-slave visibility order)

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Ants Aasma <ants(at)cybertec(dot)at>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WAL CPU overhead/optimization (was Master-slave visibility order)
Date: 2013-08-30 00:02:43
Message-ID: 20130830000243.GH4283@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-08-30 02:53:54 +0300, Ants Aasma wrote:
> On Fri, Aug 30, 2013 at 1:30 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2013-08-30 01:10:40 +0300, Ants Aasma wrote:
> >> On Fri, Aug 30, 2013 at 12:33 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> >> > FWIW, WAL is still the major bottleneck for INSERT heavy workloads. The
> >> > per CPU overhead actually minimally increased (at least in my tests), it
> >> > just scales noticeably better than before.
> >>
> >> Interesting. Do you have any insight what is behind the CPU overhead?
> >> Maybe the solution is to make WAL insertion cheap enough to not
> >> matter. That won't be easy, but neither are the alternatives.
> >
> > Funnily by far the biggest thing I have seen in benchmarks is the CRC32
> > computation. I plan to brush up my ~3 year old CRC32 reimplementation
> > patch sometime, but afair you had a much better one?
> >
> > I have some doubts about weakening the hash function by also using FNV
> > or similar here, so I'd first like to try how much of a difference a
> > better CRC32 implementation can make with the current XLogInsert()
> > implementation.
>
> The CRC32 implementations mostly differ by the amount of lookups that
> are done in parallel. Postgresql does 1 lookup, IIRC zlib
> implementation does 4, Intel has a paper that recommends going up to
> 8. The tradeoff is that each level requires a 4KB lookup table - for
> small records the additional cache misses will probably kill any
> speedup.
>
> A quick overview of the hot cache large buffer performance of a few
> interesting options:
> [interesting data]

I am not sure "hot cache large buffer performance" is really the
interesting case. Most of the XLogInsert()s are pretty small in the
common workloads. I vaguely recall trying 8 and getting worse
performance on many workloads, but that might have been a problem of my
implementation.

The reason I'd like to go for a faster CRC32 implementation as a first
step is that it's easy. Easy to verify, easy to analyze, easy to
backout. I personally don't have enough interest/time in the 9.4 cycle
to purse conversion to a different algorithm (I find the idea of using
different ones on 32/64bit pretty bad), but I obviously won't stop
somebody else ;)

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message didier 2013-08-30 00:05:13 Re: Properly initialize negative/empty cache entries in relfilenodemap
Previous Message Ants Aasma 2013-08-29 23:53:54 Re: WAL CPU overhead/optimization (was Master-slave visibility order)