Where are we on this patch idea?
Koichi Suzuki wrote:
> Sorry for the late responce;
> Gzip can reduce the archive log size about one fourth. My point is
> that it can still be large enough. Removing physical log record (by
> replacing them with logical log record) from archive log will achieve
> will shrink the size of the archive log to one twentieth, in the case of
> pgbehcn test about ten hours (3,600,000 transactions) with database size
> about 2GB. In the case of gzip, maybe becuase of higher CPU load,
> total throughput for gzip is less than just copying WAL to archive. In
> our case, throughput seems to be slightly higher than just copying
> (preserving physical log) or gzip. I'll gather the meaturement result
> and try to post.
> The size of archive log seems not affected by the size of the database,
> but just by the number of transactions. In the case of
> full_page_writes=on and full_page_compress=on, compressed archive log
> size seems to be dependent only on the number of transactions and
> transaction characteristics.
> Our evaluation result is as follows:
> Database size: 2GB
> WAL size (after 10hours pgbench run): 48.3GB
> gzipped size: 8.8GB
> removal of the physical log: 2.36GB
> fullpage_writes=off log size: 2.42GB
> The reason why archive log size of our case is slightly smaller than
> full_page_writes=off is because we remove not only the physical logs
> but also each page header and the dummy part at the tail of each log
> Further, we can apply gzip to this archive (2.36GB). Final size is
> 0.75GB, less than one sixtieth of the original WAL.
> Overall duration to gzip from WAL (48.3GB to 8.8GB) was about 4000sec,
> and our compression to 2.36GB needed about 1010sec, slightly less than
> just cat command (1386sec). When gzip is combined with our compression
> (48.3GB to 0.75GB), total duration was about 1330sec.
> This shows that phyiscal log removal is good selection for the following
> 1) Need same crash recovery possibility as full_page_writes=on, and
> 2) Need to shrink the size of archive log for loger period to store.
> Of course, if we care crash recovery in PITR slave, we still need
> physical log records in archive log. In this case, because archive log
> is not intended to be kept long, its size will not be an issue.
> I'm planning to do archive log size evalutation with other benchmarks
> such as DBT-2 as well.
> Materials for this has already been thrown to HACKERS and PATCHES. I
> hope you try this.
> Jim Nasby wrote:
> > I thought the drive behind full_page_writes = off was to reduce the
> > amount of data being written to pg_xlog, not to shrink the size of a
> > PITR log archive.
> > ISTM that if you want to shrink a PITR log archive you'd be able to get
> > good results by (b|g)zip'ing the WAL files in the archive. I quick test
> > on my laptop shows over a 4x reduction in size. Presumably that'd be
> > even larger if you increased the size of WAL segments.
> > On Jan 29, 2007, at 2:15 AM, Koichi Suzuki wrote:
> >> This is a proposal for archive log compression keeping physical log in
> >> WAL.
> >> In PotgreSQL 8.2, full-page_writes option came back to cut out physical
> >> log both from WAL and archive log. To deal with the partial write
> >> during the online backup, physical log is written only during the online
> >> backup.
> >> Although this dramatically reduces the log size, it can risk the crash
> >> recovery. If any page is inconsisitent because of the fault, crash
> >> recovery doesn't work because full page images are necessary to recover
> >> the page in such case. For critical use, especially in commercial use,
> >> we don't like to risk the crash recovery chance, while reducing the
> >> archive log size will be crucial too for larger databases. WAL size
> >> itself may be less critical, because they're reused cyclickly.
> >> Here, I have a simple idea to reduce archive log size while keeping
> >> physical log in xlog:
> >> 1. Create new GUC: full_page_compress,
> >> 2. Turn on both the full_page_writes and full_page_compress: physical
> >> log will be written to WAL at the first write to a page after the
> >> checkpoint, just as conventional full_page_writes ON.
> >> 3. Unless physical log is written during the online backup, this can be
> >> removed from the archive log. One bit in XLR_BKP_BLOCK_MASK
> >> (XLR_BKP_REMOVABLE) is available to indicate this (out of four, only
> >> three of them are in use) and this mark can be set in XLogInsert().
> >> With the both full_page_writes and full_page_compress on, both logical
> >> log and physical log will also be written to WAL with XLR_BKP_REMOVABLE
> >> flag on. Having both physical and logical log in a same WAL is not
> >> harmful in the crash recovery. In the crash recovery, physical log is
> >> used if it's available. Logical log is used in the archive recovery, as
> >> the corresponding physical log will be removed.
> >> 4. The archive command (separate binary), removes physical logs if
> >> XLR_BKP_REMOVABLE flag is on. Physical logs will be replaced by a
> >> minumum information of very small size, which is used to restore the
> >> physical log to keep other log records's LSN consistent.
> >> 5. The restore command (separate binary) restores removed physical log
> >> using the dummy record and restores LSN of other log records.
> >> 6. We need to rewrite redo functions so that they ignore the dummy
> >> record inserted in 5. The amount of code modification will be very
> >> small.
> >> As a result, size of the archive log becomes as small as the case with
> >> full_page_writes off, while the physical log is still available in the
> >> crash recovery, maintaining the crash recovery chance.
> >> Comments, questions and any input is welcome.
> >> -----
> >> Koichi Suzuki, NTT Open Source Center
> >> --Koichi Suzuki
> >> ---------------------------(end of broadcast)---------------------------
> >> TIP 6: explain analyze is your friend
> > --
> > Jim Nasby jim(at)nasby(dot)net
> > EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
> Koichi Suzuki
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
+ If your life is a hard drive, Christ can be your backup. +
In response to
pgsql-hackers by date
|Next:||From: Bruce Momjian||Date: 2007-03-27 17:27:42|
|Subject: Re: [pgsql-patches] pg_get_domaindef|
|Previous:||From: Heikki Linnakangas||Date: 2007-03-27 17:16:20|
|Subject: Re: Concurrent connections in psql|