Quick Links

Re: Decreasing WAL size effects

From:	Aidan Van Dyk <aidan(at)highrise(dot)ca>
To:	Greg Smith <gsmith(at)gregsmith(dot)com>
Cc:	Kyle Cordes <kyle(at)kylecordes(dot)com>, pgsql <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Decreasing WAL size effects
Date:	2008-10-31 19:11:48
Message-ID:	20081031191148.GE20934@yugib.highrise.ca
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general pgsql-hackers

* Greg Smith <gsmith(at)gregsmith(dot)com> [081001 00:00]:

> The overhead of clearing out the whole thing is just large enough that it
> can be disruptive on systems generating lots of WAL traffic, so you don't
> want the main database processes bothering with that. A related fact is
> that there is a noticable slowdown to clients that need a segment switch
> on a newly initialized and fast system that has to create all its WAL
> segments, compared to one that has been active long enough to only be
> recycling them. That's why this sort of thing has been getting pushed
> into the archive_command path; nothing performance-sensitive that can
> slow down clients is happening there, so long as your server is powerful
> enough to handle that in parallel with everything else going on.

> Now, it would be possible to have that less sensitive archive code path
> zero things out, but you'd need to introduce a way to note when it's been
> done (so you don't do it for a segment twice) and a way to turn it off so
> everybody doesn't go through that overhead (which probably means another
> GUC). That's a bit much trouble to go through just for a feature with a
> fairly limited use-case that can easily live outside of the engine
> altogether.

Remember that the place where this benifit is big is on a generally idle
server. Is it possible to make the "time based WAL switch" zero the tail? You
don't even need to fsync it for durability (although you may want to hopefully
preventing a larger fsync delay on the next commit).

<timid experince=none>
How about something like the attached. It's been spun quickly, passed
regression tests, and some simple hand tests on REL8_3_STABLE. It seem slike
HEAD can't initdb on my machine (quad opteron with SW raid1), I tried a few
revision in the last few days, and initdb dies on them all...

I'm not expert in the PG code, I just greped around what looked like reasonable
functions in xlog.c until I (hopefully) figured out the basic flow of switching
to new xlog segments. I *think* I'm using openLogFile and openLogOff
correctly.
</timid>

Setting archiving, with archive_timeout of 30s, and a few hand
pg_start_backup/pg_stop_backup you can see it *really* does make things
really compressable...

It's output is like:
Archiving 000000010000000000000002
Archiving 000000010000000000000003
Archiving 000000010000000000000004
Archiving 000000010000000000000005
Archiving 000000010000000000000006
LOG: checkpoints are occurring too frequently (10 seconds apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".
Archiving 000000010000000000000007
Archiving 000000010000000000000008
Archiving 000000010000000000000009
LOG: checkpoints are occurring too frequently (7 seconds apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".
Archiving 00000001000000000000000A
Archiving 00000001000000000000000B
Archiving 00000001000000000000000C
LOG: checkpoints are occurring too frequently (6 seconds apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".
Archiving 00000001000000000000000D
LOG: ZEROING xlog file 0 segment 14 from 12615680 - 16777216 [4161536 bytes]
STATEMENT: SELECT pg_stop_backup();
Archiving 00000001000000000000000E
Archiving 00000001000000000000000E.00C07098.backup
LOG: ZEROING xlog file 0 segment 15 from 8192 - 16777216 [16769024 bytes]
STATEMENT: SELECT pg_stop_backup();
Archiving 00000001000000000000000F
Archiving 00000001000000000000000F.00000C60.backup
LOG: ZEROING xlog file 0 segment 16 from 8192 - 16777216 [16769024 bytes]
STATEMENT: SELECT pg_stop_backup();
Archiving 000000010000000000000010.00000F58.backup
Archiving 000000010000000000000010
LOG: ZEROING xlog file 0 segment 17 from 8192 - 16777216 [16769024 bytes]
STATEMENT: SELECT pg_stop_backup();
Archiving 000000010000000000000011
Archiving 000000010000000000000011.00000020.backup
LOG: ZEROING xlog file 0 segment 18 from 6815744 - 16777216 [9961472 bytes]
Archiving 000000010000000000000012
LOG: ZEROING xlog file 0 segment 19 from 8192 - 16777216 [16769024 bytes]
Archiving 000000010000000000000013
LOG: ZEROING xlog file 0 segment 20 from 16384 - 16777216 [16760832 bytes]
Archiving 000000010000000000000014
LOG: ZEROING xlog file 0 segment 23 from 8192 - 16777216 [16769024 bytes]
STATEMENT: SELECT pg_switch_xlog();
Archiving 000000010000000000000017
LOG: ZEROING xlog file 0 segment 24 from 8192 - 16777216 [16769024 bytes]
Archiving 000000010000000000000018
LOG: ZEROING xlog file 0 segment 25 from 8192 - 16777216 [16769024 bytes]
Archiving 000000010000000000000019

You can see that when DB activity was heavy enough to fill an xlog segment
before the timout (or interative forced switch), it didn't zero anything. It
only zeroed on a timeout switch, or a forced switch (pg_switch_xlog/pg_stop_backup).

And compressed xlog segments:
-rw-r--r-- 1 mountie mountie 18477 2008-10-31 14:44 000000010000000000000010.gz
-rw-r--r-- 1 mountie mountie 16394 2008-10-31 14:44 000000010000000000000011.gz
-rw-r--r-- 1 mountie mountie 2721615 2008-10-31 14:52 000000010000000000000012.gz
-rw-r--r-- 1 mountie mountie 16588 2008-10-31 14:52 000000010000000000000013.gz
-rw-r--r-- 1 mountie mountie 19230 2008-10-31 14:52 000000010000000000000014.gz
-rw-r--r-- 1 mountie mountie 4920063 2008-10-31 14:52 000000010000000000000015.gz
-rw-r--r-- 1 mountie mountie 5024705 2008-10-31 14:52 000000010000000000000016.gz
-rw-r--r-- 1 mountie mountie 18082 2008-10-31 14:52 000000010000000000000017.gz
-rw-r--r-- 1 mountie mountie 18477 2008-10-31 14:52 000000010000000000000018.gz
-rw-r--r-- 1 mountie mountie 16394 2008-10-31 14:52 000000010000000000000019.gz
-rw-r--r-- 1 mountie mountie 2721615 2008-10-31 15:02 00000001000000000000001A.gz
-rw-r--r-- 1 mountie mountie 16588 2008-10-31 15:02 00000001000000000000001B.gz
-rw-r--r-- 1 mountie mountie 19230 2008-10-31 15:02 00000001000000000000001C.gz

And yes, even the non-zeroed segments compress well here, because
my test load is pretty simple:
CREATE TABLE TEST
(
a numeric,
b numeric,
c numeric,
i bigint not null
);

INSERT INTO test (a,b,c,i)
SELECT random(),random(),random(),s FROM generate_series(1,1000000) s;

--
Aidan Van Dyk Create like a god,
aidan(at)highrise(dot)ca command like a king,
http://www.highrise.ca/ work like a slave.

Attachment	Content-Type	Size
wip-xlog-switch-zero.patch	text/x-diff	1.6 KB

In response to

Re: Decreasing WAL size effects at 2008-10-30 21:10:08 from Greg Smith

Responses

Re: Decreasing WAL size effects at 2008-10-31 19:15:29 from Aidan Van Dyk
Re: Decreasing WAL size effects at 2008-10-31 21:02:11 from Aidan Van Dyk

Browse pgsql-general by date

	From	Date	Subject
Next Message	Aidan Van Dyk	2008-10-31 19:15:29	Re: Decreasing WAL size effects
Previous Message	Scott Marlowe	2008-10-31 18:27:27	Re: Connections getting stuck sending data to client

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Aidan Van Dyk	2008-10-31 19:15:29	Re: Decreasing WAL size effects
Previous Message	Simon Riggs	2008-10-31 19:08:49	Re: Enabling archive_mode without restart