Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up
Date: 2010-05-30 02:33:16
Message-ID: 201005300233.o4U2XGp14143@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Added to TODO:

Consider a faster CRC32 algorithm

* http://archives.postgresql.org/pgsql-hackers/2010-05/msg01112.php

---------------------------------------------------------------------------

Andres Freund wrote:
> Hi,
>
> I started to analyze XLogInsert because it was the major bottleneck when
> creating some materialized view/cached tables/whatever.
> Analyzing it I could see that content of the COMP_CRC32 macro was taking most
> of the time which isn't immediately obvious when you profile because it
> obviously doesn't show up as a separate function.
> I first put it into functions to make it easier to profile. I couldn't measure
> any difference for COPY, CTAS and a simple pgbench run on 3 kinds of hardware
> (Core2, older Xeon, older Sparc systems).
>
> I looked a bit around for faster implementations of CRC32 and found one in
> zlib. After adapting it (pg uses slightly different computation (non-
> inverted)) I found that it increases the speed of the CRC32 calculation itself
> 3 fold.
> It does that by not only using one lookup table but four (one for each byte of
> a word). Those four calculations are independent and thus are considerably
> faster on somewhat recent hardware.
> Also it does memory lookups in 4 byte steps instead of 1 byte as the pg
> version (thats only about ~8% benefit in itself).
>
> I wrote a preliminary patch which includes both, the original implementation
> and the new one switchable via an #define.
>
>
> I tested performance differences in a small number of scenarios:
> - CTAS/INSERT ... SELECT (8-30%)
> - COPY (3-20%)
> - pgbench (no real difference unless directly after a checkpoint)
>
> Setup:
>
> CREATE TABLE blub (ai int, bi int, aibi int);
> CREATE TABLE speedtest (ai int, bi int, aibi int);
>
>
> INSERT ... SELECT:
>
> Statement:
> INSERT INTO blub SELECT a.i, b.i, a.i *b.i FROM generate_series(1, 10000)
> a(i), generate_series(1, 1000) b(i);
>
> legacy crc:
>
> 11526.588
> 11406.518
> 11412.182
> 11430.245
>
> zlib:
> 9977.394
> 9945.408
> 9840.907
> 9842.875
>
>
> COPY:
> Statement:
> ('blub' enlarged here 4 times, as otherwise the variances were to large)
>
> COPY blub TO '/tmp/b' BINARY;
> ...
> CHECKPOINT;TRUNCATE speedtest; COPY speedtest FROM '/tmp/b' BINARY;
>
> legacy:
> 44835.840
> 44832.876
>
> zlib:
> 39530.549
> 39365.109
> 39295.167
>
> The performance differences are bigger if the table rows are significantly
> bigger.
>
> Do you think something like that is sensible? If yes, I will make it into a
> proper patch and such.
>
> Thanks,
>
> Andres
>
> INSERT ... SELECT profile before patch:
>
> 20.22% postgres postgres [.] comp_crc32
> 5.77% postgres postgres [.] XLogInsert
> 5.55% postgres postgres [.] LWLockAcquire
> 5.21% postgres [kernel. [k] copy_user_generic_string
> 4.64% postgres postgres [.] LWLockRelease
> 4.39% postgres postgres [.] ReadBuffer_common
> 2.75% postgres postgres [.] heap_insert
> 2.22% postgres libc-2.1 [.] memcpy
> 2.09% postgres postgres [.] UnlockReleaseBuffer
> 1.85% postgres postgres [.] hash_any
> 1.77% postgres [kernel. [k] clear_page_c
> 1.69% postgres postgres [.] hash_search_with_hash_value
> 1.61% postgres postgres [.] heapgettup_pagemode
> 1.50% postgres postgres [.] PageAddItem
> 1.42% postgres postgres [.] MarkBufferDirty
> 1.28% postgres postgres [.] RelationGetBufferForTuple
> 1.15% postgres postgres [.] ExecModifyTable
> 1.06% postgres postgres [.] RelationPutHeapTuple
>
>
> After:
>
> 9.97% postgres postgres [.] comp_crc32
> 5.95% postgres [kernel. [k] copy_user_generic_string
> 5.94% postgres postgres [.] LWLockAcquire
> 5.64% postgres postgres [.] XLogInsert
> 5.11% postgres postgres [.] LWLockRelease
> 4.63% postgres postgres [.] ReadBuffer_common
> 3.45% postgres postgres [.] heap_insert
> 2.54% postgres libc-2.1 [.] memcpy
> 2.03% postgres postgres [.] UnlockReleaseBuffer
> 1.94% postgres postgres [.] hash_search_with_hash_value
> 1.84% postgres postgres [.] hash_any
> 1.73% postgres [kernel. [k] clear_page_c
> 1.68% postgres postgres [.] PageAddItem
> 1.62% postgres postgres [.] heapgettup_pagemode
> 1.52% postgres postgres [.] RelationGetBufferForTuple
> 1.47% postgres postgres [.] MarkBufferDirty
> 1.30% postgres postgres [.] ExecModifyTable
> 1.23% postgres postgres [.] RelationPutHeapTuple

[ Attachment, skipping... ]

>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-05-30 02:56:06 Re: small exclusion constraints patch
Previous Message Bruce Momjian 2010-05-30 02:23:46 Re: Specification for Trusted PLs?