Re: [BUGS] BUG #7521: Cannot disable WAL log while using pg_dump

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Gezeala M(dot) Bacuño II <gezeala(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #7521: Cannot disable WAL log while using pg_dump
Date: 2012-09-07 18:29:50
Message-ID: CABOikdN8TSnAcAn53_A5E6xgZCS64Bvs0TabOLyBwtBWRm5jwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

(Adding -hackers. Did not realize it got dropped)

On Fri, Sep 7, 2012 at 11:25 PM, Gezeala M. Bacuño II <gezeala(at)gmail(dot)com>wrote:

> On Fri, Sep 7, 2012 at 7:17 AM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
> wrote:
> >
> >
> > On Fri, Sep 7, 2012 at 7:00 PM, Marie Bacuno II <gezeala(at)gmail(dot)com>
> wrote:
> >>
> >>
> >> On Sep 7, 2012, at 2:19, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
> wrote:
> >>
> >>
> >> > or have long running transactions ?
> >>
> >> Yes but I don't think there are when the snapshot was taken. Does the
> >> pg_xlog_location_diff() result from latest and prior checkpoint upon
> >> start-up indicates the size of replayed changes?
> >>
> >
> > Thats the amount of additional WAL generated after you started the
> server.
> >
> >>
> >> >
> >> > BTW, the following query returns ~60GB. Thats the amount of WAL
> >> > written after the server was started and at the end of pg_dump (I
> >> > don't think pg_xlog_location_diff() is available in the older
> >> > releases).
> >> >
> >> > postgres=# select pg_xlog_location_diff('4450/7A14F280',
> >> > '4441/5E681F38')/(2^30);
> >> > ?column?
> >> > ------------------
> >> > 60.1980484202504
> >>
> >> It'll be great to know what the wals modified..?
> >
> >
> > You would need something like xlogdump to decipher them. I quickly tried
> > this and it seems to work against 8.4 version that you are running.
> > https://github.com/snaga/xlogdump
> >
> > Download the source code, compile and run it against one of the most
> recent
> > WAL files in the cluster against which you ran pg_dump. You would need to
> > set PATH to contain the pg_config of the server you are running. Please
> post
> > the output.
> >
> > Thanks,
> > Pavan
> >
> >
>
> Here you go:
>
> ## last WAL
> $ xlogdump -S /dbpool/data/pg_xlog/00000001000044500000007A
>
> /dbpool/data/pg_xlog/00000001000044500000007A:
>
> Unexpected page info flags 0003 at offset 0
> Skipping unexpected continuation record at offset 0
> ReadRecord: record with zero len at 17488/7A14F310
> Unexpected page info flags 0001 at offset 150000
> Skipping unexpected continuation record at offset 150000
> Unable to read continuation page?
> ** maybe continues to next segment **
> ---------------------------------------------------------------
> TimeLineId: 1, LogId: 17488, LogSegment: 122
>
> Resource manager stats:
> [0]XLOG : 3 records, 120 bytes (avg 40.0 bytes)
> checkpoint: 3, switch: 0, backup end: 0
> [1]Transaction: 0 record, 0 byte (avg 0.0 byte)
> commit: 0, abort: 0
> [2]Storage : 0 record, 0 byte (avg 0.0 byte)
> [3]CLOG : 0 record, 0 byte (avg 0.0 byte)
> [4]Database : 0 record, 0 byte (avg 0.0 byte)
> [5]Tablespace: 0 record, 0 byte (avg 0.0 byte)
> [6]MultiXact : 0 record, 0 byte (avg 0.0 byte)
> [7]Reserved 7: 0 record, 0 byte (avg 0.0 byte)
> [8]Reserved 8: 0 record, 0 byte (avg 0.0 byte)
> [9]Heap2 : 2169 records, 43380 bytes (avg 20.0 bytes)
> [10]Heap : 0 record, 0 byte (avg 0.0 byte)
> ins: 0, upd/hot_upd: 0/0, del: 0
> [11]Btree : 0 record, 0 byte (avg 0.0 byte)
> [12]Hash : 0 record, 0 byte (avg 0.0 byte)
> [13]Gin : 0 record, 0 byte (avg 0.0 byte)
> [14]Gist : 0 record, 0 byte (avg 0.0 byte)
> [15]Sequence : 0 record, 0 byte (avg 0.0 byte)
>
> Backup block stats: 2169 blocks, 16551816 bytes (avg 7631.1 bytes)
>
>
I think both my theories seem to be holding up. Heap2 resource manager is
used only for vacuum freeze, lazy vacuum or HOT prune. Given your access
pattern, I bet its the third activity that kicking in on your database. You
got many pages with dead tuples and they are getting cleaned at the first
opportunity, which happens to be the pg_dump thats run immediately after
the server restart. This is seen by all 2169 WAL records in the file being
attributed to the Heap2 RM above.

Whats additionally happening is each of these records are on different heap
pages. The cleanup activity dirties those pages. Since each of these pages
is being dirtied for the first time after a recent checkpoint and
full_page_writes is turned ON, entire page is backed up in the WAL record.
You can see the exact number of backup blocks in the stats above.

I don't think we have any mechanism to control or stop HOT from doing what
it wants to do, unless you are willing to run a modified server for this
reason. But you can at least bring down the WAL volume by turning
full_page_writes OFF.

Thanks,
Pavan

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Gezeala M. Bacuño II 2012-09-07 18:40:48 Re: [BUGS] BUG #7521: Cannot disable WAL log while using pg_dump
Previous Message Pavan Deolasee 2012-09-07 13:29:06 Re: BUG #7521: Cannot disable WAL log while using pg_dump

Browse pgsql-hackers by date

  From Date Subject
Next Message Gezeala M. Bacuño II 2012-09-07 18:40:48 Re: [BUGS] BUG #7521: Cannot disable WAL log while using pg_dump
Previous Message Peter Eisentraut 2012-09-07 18:05:05 Re: improving python3 regression test setup