Quick Links

Re: Enabling Checksums

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Enabling Checksums
Date:	2012-12-19 23:30:41
Message-ID:	1355959841.24766.286.camel@sussancws0025
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, 2012-12-04 at 01:03 -0800, Jeff Davis wrote:
> > 4. We need some general performance testing to show whether this is
> > insane or not.

I ran a few tests.

Test 1 - find worst-case overhead for the checksum calculation on write:

fsync = off
bgwriter_lru_maxpages = 0
shared_buffers = 1024MB
checkpoint_segments = 64
autovacuum = off

The idea is to prevent interference from the bgwriter or autovacuum.
Also, I turn of fsync so that it's measuring the calculation overhead,
not the effort of actually writing to disk.

drop table foo;
create table foo(i int, j int) with (fillfactor=50);
create index foo_idx on foo(i);
insert into foo select g%25, -1 from generate_series(1,10000000) g;
checkpoint;
-- during the following sleep, issue an OS "sync"
-- to make test results more consistent
select pg_sleep(30);
\timing on
update foo set j=-1 where i = 0;
select pg_sleep(2);
checkpoint;
update foo set j=-1 where i = 0;
select pg_sleep(2);
checkpoint;
update foo set j=-1 where i = 0;
select pg_sleep(2);
checkpoint;
\timing off

I am measuring the time of the CHECKPOINT command, not the update. The
update is just to dirty all of the pages (they should all be HOT
updates). Without checksums, it takes about 400ms. With checksums, it
takes about 500ms. That overhead is quite low, considering that the
bottleneck is almost always somewhere else (like actually writing to
disk).

Test 2 - worst-case overhead for calculating checksum while reading data

Same configuration as above. This time, just load a big table:

drop table foo;
create table foo(i int, j int) with (fillfactor=50);
insert into foo select g%25, -1 from generate_series(1,10000000) g;
-- make sure hint bits and PD_ALL_VISIBLE are set everywhere
select count(*) from foo;
vacuum;
vacuum;
vacuum;
select relfilenode from pg_class where relname='foo';

Then shut down the server and restart it. Then do a "cat
data/base/12055/XXXX* > /dev/null" to get the table loaded into the OS
buffer cache. Then do:

\timing on
SELECT COUNT(*) FROM foo;

So, shared buffers are cold, but OS cache is warm. This should test the
overhead of going from the OS to shared buffers, which requires the
checksum calculation. Without checksums is around 820ms; with checksums
around 970ms. Again, this is quite reasonable, because I would expect
the bottleneck to be reading from the disk rather than the calculation
itself.

Test 3 - worst-case WAL overhead

For this test, I also left fsync off, because I didn't want to test the
effort to flush WAL (which shouldn't really be required for this test,
anyway). This was simpler:

drop table foo;
create table foo(i int, j int) with (fillfactor=50);
insert into foo select g%25, -1 from generate_series(1,10000000) g;
checkpoint;
select pg_sleep(1);
checkpoint;
select pg_sleep(30); -- do an OS "sync" while this is running
\timing on
SELECT COUNT(*) FROM foo;

Without checksums, it takes about 1000ms. With checksums, about 2350ms.
I also tested with checksums but without the CHECKPOINT commands above,
and it was also 1000ms.

This test is more plausible than the other two, so it's more likely to
be a real problem. So, the biggest cost of checksums is, by far, the
extra full-page images in WAL, which matches our expectations.

Regards,
Jeff Davis

In response to

Re: Enabling Checksums at 2012-12-04 09:03:36 from Jeff Davis

Responses

Re: Enabling Checksums at 2013-01-13 06:05:04 from Greg Smith
Re: Enabling Checksums at 2013-03-03 23:05:22 from Greg Smith

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Simon Riggs	2012-12-19 23:35:39	Re: Set visibility map bit after HOT prune
Previous Message	Simon Riggs	2012-12-19 22:34:14	Re: Feature Request: pg_replication_master()