Re: Archive log compression keeping physical log available in the crash recovery

From: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
To: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Jim Nasby <decibel(at)decibel(dot)org>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Archive log compression keeping physical log available in the crash recovery
Date: 2007-02-09 05:14:42
Message-ID: 45CC0342.7020200@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Further information about the following evaluation:

Pgbench throughput was as follows:
Full WAL archiving (full_page_writes=on), 48.3GB archive: 123TPS
Gzip WAL compress, 8.8GB archive: 145TPS
Physical log removal, 2.36GB archive: 148TPS
full_page_writes=off, 2.42GB archive: 161TPS

Koichi Suzuki wrote:
> Sorry for the late responce;
>
> Gzip can reduce the archive log size about one fourth. My point is
> that it can still be large enough. Removing physical log record (by
> replacing them with logical log record) from archive log will achieve
> will shrink the size of the archive log to one twentieth, in the case of
> pgbehcn test about ten hours (3,600,000 transactions) with database size
> about 2GB. In the case of gzip, maybe becuase of higher CPU load,
> total throughput for gzip is less than just copying WAL to archive. In
> our case, throughput seems to be slightly higher than just copying
> (preserving physical log) or gzip. I'll gather the meaturement result
> and try to post.
>
> The size of archive log seems not affected by the size of the database,
> but just by the number of transactions. In the case of
> full_page_writes=on and full_page_compress=on, compressed archive log
> size seems to be dependent only on the number of transactions and
> transaction characteristics.
>
> Our evaluation result is as follows:
> Database size: 2GB
> WAL size (after 10hours pgbench run): 48.3GB
> gzipped size: 8.8GB
> removal of the physical log: 2.36GB
> fullpage_writes=off log size: 2.42GB
>
> The reason why archive log size of our case is slightly smaller than
> full_page_writes=off is because we remove not only the physical logs but
> also each page header and the dummy part at the tail of each log segment.
>
> Further, we can apply gzip to this archive (2.36GB). Final size is
> 0.75GB, less than one sixtieth of the original WAL.
>
> Overall duration to gzip from WAL (48.3GB to 8.8GB) was about 4000sec,
> and our compression to 2.36GB needed about 1010sec, slightly less than
> just cat command (1386sec). When gzip is combined with our compression
> (48.3GB to 0.75GB), total duration was about 1330sec.
>
> This shows that phyiscal log removal is good selection for the following
> case:
>
> 1) Need same crash recovery possibility as full_page_writes=on, and
> 2) Need to shrink the size of archive log for loger period to store.
>
> Of course, if we care crash recovery in PITR slave, we still need
> physical log records in archive log. In this case, because archive log
> is not intended to be kept long, its size will not be an issue.
>
> I'm planning to do archive log size evalutation with other benchmarks
> such as DBT-2 as well.
>
> Materials for this has already been thrown to HACKERS and PATCHES. I
> hope you try this.
>
>
> Jim Nasby wrote:
>> I thought the drive behind full_page_writes = off was to reduce the
>> amount of data being written to pg_xlog, not to shrink the size of a
>> PITR log archive.
>>
>> ISTM that if you want to shrink a PITR log archive you'd be able to
>> get good results by (b|g)zip'ing the WAL files in the archive. I quick
>> test on my laptop shows over a 4x reduction in size. Presumably that'd
>> be even larger if you increased the size of WAL segments.
>>
>> On Jan 29, 2007, at 2:15 AM, Koichi Suzuki wrote:
>>
>>> This is a proposal for archive log compression keeping physical log
>>> in WAL.
>>>
>>> In PotgreSQL 8.2, full-page_writes option came back to cut out physical
>>> log both from WAL and archive log. To deal with the partial write
>>> during the online backup, physical log is written only during the online
>>> backup.
>>>
>>> Although this dramatically reduces the log size, it can risk the crash
>>> recovery. If any page is inconsisitent because of the fault, crash
>>> recovery doesn't work because full page images are necessary to recover
>>> the page in such case. For critical use, especially in commercial use,
>>> we don't like to risk the crash recovery chance, while reducing the
>>> archive log size will be crucial too for larger databases. WAL size
>>> itself may be less critical, because they're reused cyclickly.
>>>
>>> Here, I have a simple idea to reduce archive log size while keeping
>>> physical log in xlog:
>>>
>>> 1. Create new GUC: full_page_compress,
>>>
>>> 2. Turn on both the full_page_writes and full_page_compress: physical
>>> log will be written to WAL at the first write to a page after the
>>> checkpoint, just as conventional full_page_writes ON.
>>>
>>> 3. Unless physical log is written during the online backup, this can be
>>> removed from the archive log. One bit in XLR_BKP_BLOCK_MASK
>>> (XLR_BKP_REMOVABLE) is available to indicate this (out of four, only
>>> three of them are in use) and this mark can be set in XLogInsert().
>>> With the both full_page_writes and full_page_compress on, both logical
>>> log and physical log will also be written to WAL with XLR_BKP_REMOVABLE
>>> flag on. Having both physical and logical log in a same WAL is not
>>> harmful in the crash recovery. In the crash recovery, physical log is
>>> used if it's available. Logical log is used in the archive recovery, as
>>> the corresponding physical log will be removed.
>>>
>>> 4. The archive command (separate binary), removes physical logs if
>>> XLR_BKP_REMOVABLE flag is on. Physical logs will be replaced by a
>>> minumum information of very small size, which is used to restore the
>>> physical log to keep other log records's LSN consistent.
>>>
>>> 5. The restore command (separate binary) restores removed physical log
>>> using the dummy record and restores LSN of other log records.
>>>
>>> 6. We need to rewrite redo functions so that they ignore the dummy
>>> record inserted in 5. The amount of code modification will be very
>>> small.
>>>
>>> As a result, size of the archive log becomes as small as the case with
>>> full_page_writes off, while the physical log is still available in the
>>> crash recovery, maintaining the crash recovery chance.
>>>
>>> Comments, questions and any input is welcome.
>>>
>>> -----
>>> Koichi Suzuki, NTT Open Source Center
>>>
>>> --Koichi Suzuki
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 6: explain analyze is your friend
>>>
>>
>> --
>> Jim Nasby jim(at)nasby(dot)net
>> EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
>>
>>
>>
>
>

--
Koichi Suzuki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2007-02-09 05:18:51 Re: [HACKERS] plpgsql, return can contains any expression
Previous Message Richard Troy 2007-02-09 04:41:11 Re: Proposal: Commit timestamp