Re: WAL log performance/efficiency question

From: Keaton Adams <kadams(at)mxlogic(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: WAL log performance/efficiency question
Date: 2007-05-17 16:01:34
Message-ID: 1179417694.26604.29.camel@MXLRMT-208.corp.mxlogic.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


OK, I understand.

So one clarifying question on WAL contents:

On an insert of a 100 byte row that is logged, what goes into the WAL
log? Is it 100 bytes, 132 bytes (row + overhead), or other? Does just
the row contents get logged, or the contents plus all of the relative
overhead? I understand that after a checkpoint the first insert
requires the entire 8K page to be written to the WAL, so do subsequent
inserts into WAL follow the same storage pattern as the layout on the
data page, or is the byte count less?

-K

>
> 8K data page (8192 bytes)
> Less page header and row overhead leaves ~8000 bytes
> At 100 bytes per row = ~80 rows/page
> Rows loaded: 250,000 / 80 = 3125 data pages * 8192 = 25,600,000
> bytes / 1048576 = ~ 24.4 MB of data page space.
>

On Thu, 2007-05-17 at 08:04 -0600, Keaton Adams wrote:

> I sent this to pgsql-admin but didn't receive a response. Would this
> be a WAL log performance/efficiency issue?
>
> Thanks,
>
> Keaton
>
>
> Given these postgresql.conf settings:
>
> #---------------------------------------------------------------------------
> # WRITE AHEAD LOG
> #---------------------------------------------------------------------------
>
> # - Settings -
>
> fsync = on # turns forced synchronization
> on or off
> wal_sync_method = fsync # the default is the first
> option
> # supported by the operating
> system:
> # open_datasync
> # fdatasync
> # fsync
> # fsync_writethrough
> # open_sync
> full_page_writes = on # recover from partial page
> writes
> wal_buffers = 32 # min 4, 8KB each
> commit_delay = 100000 # range 0-100000, in
> microseconds
> commit_siblings = 1000 # range 1-1000
>
> # - Checkpoints -
>
> checkpoint_segments = 500 # in logfile segments, min 1, 16MB
> each
> checkpoint_timeout = 300 # range 30-3600, in seconds
> checkpoint_warning = 120 # in seconds, 0 is off
>
> # - Archiving -
> archive_command = '/mnt/logship/scripts/archivemaster.sh %p %f'
> # command to use to archive a logfile
> # segment
>
>
>
> And these tables to load data into:
>
> List of relations
> Schema | Name | Type | Owner
> --------+-----------+-------+----------
> public | testload | table | postgres
> public | testload2 | table | postgres
> public | testload3 | table | postgres
> (3 rows)
>
> postgres=# \d testload
> Table "public.testload"
> Column | Type | Modifiers
> --------+----------------+-----------
> name | character(100) |
>
> postgres=# \d testload2
> Table "public.testload2"
> Column | Type | Modifiers
> --------+----------------+-----------
> name | character(100) |
>
> postgres=# \d testload3
> Table "public.testload3"
> Column | Type | Modifiers
> --------+----------------+-----------
> name | character(100) |
>
> There are no indexes on the tables.
>
>
> Using an 8K data page:
>
> 8K data page (8192 bytes)
> Less page header and row overhead leaves ~8000 bytes
> At 100 bytes per row = ~80 rows/page
> Rows loaded: 250,000 / 80 = 3125 data pages * 8192 = 25,600,000
> bytes / 1048576 = ~ 24.4 MB of data page space.
>
> The test file is shown here (250,000 rows all the same):
> -bash-3.1$ more datafile.txt
> AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEEFFFFFFFFFFGGGGGGGGGGHHHHHHHHHHIIIIIIIIIIJJJJJJJJJJ
> AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEEFFFFFFFFFFGGGGGGGGGGHHHHHHHHHHIIIIIIIIIIJJJJJJJJJJ
> AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEEFFFFFFFFFFGGGGGGGGGGHHHHHHHHHHIIIIIIIIIIJJJJJJJJJJ
>
> The load script:
> -bash-3.1$ more loaddata.sql
> copy testload from '/home/kadams/logship/datafile.txt' delimiter '|';
> copy testload2 from '/home/kadams/logship/datafile.txt' delimiter '|';
> copy testload3 from '/home/kadams/logship/datafile.txt' delimiter '|';
>
> So the one load process does a COPY into the three tables. 24.4 MB *
> 3 tables = ~ 73.2 MB of data page space.
>
> This is the only process running on the database. No other
> loads/users are on the system.
>
> psql -f sql/loaddata.sql >/dev/null 2>&1 &
>
> It seems that 112 MB of WAL file space (16 MB * 7) is required for
> 73.2 MB of loaded data, which is an extra 34.8% of disk space to
> log/archive the COPY commands:
>
> First pass:
> LOG: transaction ID wrap limit is 2147484146, limited by database
> "postgres"
> LOG: archived transaction log file "00000001000000010000005E"
> LOG: archived transaction log file "00000001000000010000005F"
> LOG: archived transaction log file "000000010000000100000060"
> LOG: archived transaction log file "000000010000000100000061"
> LOG: archived transaction log file "000000010000000100000062"
> LOG: archived transaction log file "000000010000000100000063"
> LOG: archived transaction log file "000000010000000100000064"
>
> # of logs in pg_xlog: 9
>
> Second pass:
> LOG: archived transaction log file "000000010000000100000065"
> LOG: archived transaction log file "000000010000000100000066"
> LOG: archived transaction log file "000000010000000100000067"
> LOG: archived transaction log file "000000010000000100000068"
> LOG: archived transaction log file "000000010000000100000069"
> LOG: archived transaction log file "00000001000000010000006A"
> LOG: archived transaction log file "00000001000000010000006B"
>
> # of logs in pg_xlog: 15
>
> Third pass:
> LOG: archived transaction log file "00000001000000010000006C"
> LOG: archived transaction log file "00000001000000010000006D"
> LOG: archived transaction log file "00000001000000010000006E"
> LOG: archived transaction log file "00000001000000010000006F"
> LOG: archived transaction log file "000000010000000100000070"
> LOG: archived transaction log file "000000010000000100000071"
> LOG: archived transaction log file "000000010000000100000072"
>
> # of logs in pg_xlog: 22
>
> Fourth pass:
> LOG: archived transaction log file "000000010000000100000073"
> LOG: archived transaction log file "000000010000000100000074"
> LOG: archived transaction log file "000000010000000100000075"
> LOG: archived transaction log file "000000010000000100000076"
> LOG: archived transaction log file "000000010000000100000077"
> LOG: archived transaction log file "000000010000000100000078"
> LOG: archived transaction log file "000000010000000100000079"
>
> # of logs in pg_xlog: 29
>
> PostgreSQL continued to add log files in pg_xlog, so my assumption is
> that checkpoints did not come into play during the load process,
> correct? (Frequent checkpoints would have added even more to the WAL
> file overhead, is my understanding.)
>
> So is there anything I can do to reduce the 34.8% overhead in WAL file
> space when loading data? Do you see any glaring mistakes in the
> calculations themselves, and would you agree with this overhead
> figure?
>
> We are running on PostgreSQL 8.1.4 and are planning to move to 8.3
> when it becomes available. Are there space utilization/performance
> improvements in WAL logging in the upcoming release?
>
> Thanks,
>
> Keaton
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Ralph Mason 2007-05-17 22:45:29 Ever Increasing IOWAIT
Previous Message Heikki Linnakangas 2007-05-17 15:52:11 Re: WAL log performance/efficiency question