Skip site navigation (1) Skip section navigation (2)

Re: Archive log compression keeping physical log available in the crash recovery

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Jim Nasby <decibel(at)decibel(dot)org>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Archive log compression keeping physical log available in the crash recovery
Date: 2007-03-27 17:22:37
Message-ID: 200703271722.l2RHMbg23221@momjian.us (view raw or flat)
Thread:
Lists: pgsql-hackers
Where are we on this patch idea?

---------------------------------------------------------------------------

Koichi Suzuki wrote:
> Sorry for the late responce;
> 
> Gzip can reduce the archive log size about one fourth.   My point is 
> that it can still be large enough.    Removing physical log record (by 
> replacing them with logical log record) from archive log will achieve 
> will shrink the size of the archive log to one twentieth, in the case of 
> pgbehcn test about ten hours (3,600,000 transactions) with database size 
> about 2GB.   In the case of gzip, maybe becuase of higher CPU load, 
> total throughput for gzip is less than just copying WAL to archive.  In 
> our case, throughput seems to be slightly higher than just copying 
> (preserving physical log) or gzip.   I'll gather the meaturement result 
> and try to post.
> 
> The size of archive log seems not affected by the size of the database, 
> but just by the number of transactions.  In the case of 
> full_page_writes=on and full_page_compress=on, compressed archive log 
> size seems to be dependent only on the number of transactions and 
> transaction characteristics.
> 
> Our evaluation result is as follows:
> Database size: 2GB
> WAL size (after 10hours pgbench run): 48.3GB
> gzipped size: 8.8GB
> removal of the physical log: 2.36GB
> fullpage_writes=off log size: 2.42GB
> 
> The reason why archive log size of our case is slightly smaller than 
> full_page_writes=off is because we remove not only the physical logs 
> but also each page header and the dummy part at the tail of each log 
> segment.
> 
> Further, we can apply gzip to this archive (2.36GB).   Final size is 
> 0.75GB, less than one sixtieth of the original WAL.
> 
> Overall duration to gzip from WAL (48.3GB to 8.8GB) was about 4000sec, 
> and our compression to 2.36GB needed about 1010sec, slightly less than 
> just cat command (1386sec).   When gzip is combined with our compression 
> (48.3GB to 0.75GB), total duration was about 1330sec.
> 
> This shows that phyiscal log removal is good selection for the following 
> case:
> 
> 1) Need same crash recovery possibility as full_page_writes=on, and
> 2) Need to shrink the size of archive log for loger period to store.
> 
> Of course, if we care crash recovery in PITR slave, we still need 
> physical log records in archive log.   In this case, because archive log 
> is not intended to be kept long, its size will not be an issue.
> 
> I'm planning to do archive log size evalutation with other benchmarks 
> such as DBT-2 as well.
> 
> Materials for this has already been thrown to HACKERS and PATCHES.   I 
> hope you try this.
> 
> 
> Jim Nasby wrote:
> > I thought the drive behind full_page_writes = off was to reduce the 
> > amount of data being written to pg_xlog, not to shrink the size of a 
> > PITR log archive.
> > 
> > ISTM that if you want to shrink a PITR log archive you'd be able to get 
> > good results by (b|g)zip'ing the WAL files in the archive. I quick test 
> > on my laptop shows over a 4x reduction in size. Presumably that'd be 
> > even larger if you increased the size of WAL segments.
> > 
> > On Jan 29, 2007, at 2:15 AM, Koichi Suzuki wrote:
> > 
> >> This is a proposal for archive log compression keeping physical log in 
> >> WAL.
> >>
> >> In PotgreSQL 8.2, full-page_writes option came back to cut out physical
> >> log both from WAL and archive log.   To deal with the partial write
> >> during the online backup, physical log is written only during the online
> >> backup.
> >>
> >> Although this dramatically reduces the log size, it can risk the crash
> >> recovery.   If any page is inconsisitent because of the fault, crash
> >> recovery doesn't work because full page images are necessary to recover
> >> the page in such case.  For critical use, especially in commercial use,
> >>  we don't like to risk the crash recovery chance, while reducing the
> >> archive log size will be crucial too for larger databases.    WAL size
> >> itself may be less critical, because they're reused cyclickly.
> >>
> >> Here, I have a simple idea to reduce archive log size while keeping
> >> physical log in xlog:
> >>
> >> 1. Create new GUC: full_page_compress,
> >>
> >> 2. Turn on both the full_page_writes and full_page_compress: physical
> >> log will be written to WAL at the first write to a page after the
> >> checkpoint, just as conventional full_page_writes ON.
> >>
> >> 3. Unless physical log is written during the online backup, this can be
> >> removed from the archive log.   One bit in XLR_BKP_BLOCK_MASK
> >> (XLR_BKP_REMOVABLE) is available to indicate this (out of four, only
> >> three of them are in use) and this mark can be set in XLogInsert().
> >> With the both full_page_writes and full_page_compress on, both logical
> >> log and physical log will also be written to WAL with XLR_BKP_REMOVABLE
> >> flag on.  Having both physical and logical log in a same WAL is not
> >> harmful in the crash recovery.  In the crash recovery, physical log is
> >> used if it's available.  Logical log is used in the archive recovery, as
> >> the corresponding physical log will be removed.
> >>
> >> 4. The archive command (separate binary), removes physical logs if
> >> XLR_BKP_REMOVABLE flag is on.   Physical logs will be replaced by a
> >> minumum information of very small size, which is used to restore the
> >> physical log to keep other log records's LSN consistent.
> >>
> >> 5. The restore command (separate binary) restores removed physical log
> >> using the dummy record and restores LSN of other log records.
> >>
> >> 6. We need to rewrite redo functions so that they ignore the dummy
> >> record inserted in 5.  The amount of code modification will be very 
> >> small.
> >>
> >> As a result, size of the archive log becomes as small as the case with
> >> full_page_writes off, while the physical log is still available in the
> >> crash recovery, maintaining the crash recovery chance.
> >>
> >> Comments, questions and any input is welcome.
> >>
> >> -----
> >> Koichi Suzuki, NTT Open Source Center
> >>
> >> --Koichi Suzuki
> >>
> >> ---------------------------(end of broadcast)---------------------------
> >> TIP 6: explain analyze is your friend
> >>
> > 
> > -- 
> > Jim Nasby                                            jim(at)nasby(dot)net
> > EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)
> > 
> > 
> > 
> 
> 
> -- 
> Koichi Suzuki
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
>        message can get through to the mailing list cleanly

-- 
  Bruce Momjian  <bruce(at)momjian(dot)us>          http://momjian.us
  EnterpriseDB                               http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

In response to

pgsql-hackers by date

Next:From: Bruce MomjianDate: 2007-03-27 17:27:42
Subject: Re: [pgsql-patches] pg_get_domaindef
Previous:From: Heikki LinnakangasDate: 2007-03-27 17:16:20
Subject: Re: Concurrent connections in psql

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group