On Sunday 30 May 2010 04:56:09 Greg Stark wrote:
> This sounds familiar. If you search back in the archives around 2004
> or so I think you'll find a similar discussion when we replaced the
> crc32 implementation with what we have now. We put a fair amount of
> effort into searching for faster implementations so if you've found
> one 3x faster I'm pretty startled.
All of those didnt think of computing more than one byte at the same time.
Most if not all current architectures are more or less superscalar (explictly
by the compiler or implicitly by somewhat intelligent silicon) - the current
algorithm has an ordering restrictions that prevent any benefit from that.
Basically it needs the CRC of the last byte for the next one - the zlib/my
version computes 4 bytes independently and then squashes them together which
results in way much better overall usage.
> Are you sure it's faster on all
> architectures and not a win sometimes and a loss other times? And are
> you sure it's faster in our use case where we're crcing small
> sequences of data often and not crcing a large block?
I tried on several and it was never a loss at 16+ bytes, never worse at 8, and
most of the time equal if not better at 4. Sizes of 1-4 are somewhat slower as
they use the same algorithm as the old version but do have an additional jump.
Thats a difference of about 3-4cycles.
I will try to implement an updated patch sometime these days.
In response to
pgsql-hackers by date
|Next:||From: Marko Tiikkaja||Date: 2010-05-30 10:54:17|
|Subject: Re: small exclusion constraints patch|
|Previous:||From: Jesper Krogh||Date: 2010-05-30 07:08:32|
|Subject: Re: tsvector pg_stats seems quite a bit off.|