Re: WARNINGs after starting backup server created with PITR

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Erik Jones <erik(at)myemma(dot)com>
Cc: General postgres mailing list <pgsql-general(at)postgresql(dot)org>
Subject: Re: WARNINGs after starting backup server created with PITR
Date: 2008-01-19 00:58:19
Message-ID: 14838.1200704299@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Erik Jones <erik(at)myemma(dot)com> writes:
>>> 2008-01-17 21:47:34 CST 7598 :WARNING: relation "table_name" page
>>> 5728 is uninitialized --- fixing
>>
>> If you do a vacuum on the master, do you get the same warnings?

> /me runs VACUUM VERBOSE on the two tables that would matter.

> Nope. What worries me is, that since I have a verified case of rsync
> thinking it had successfully transferred a WAL, the same may have
> happened with these files during the base backup. Does that warning,
> in fact, entail that there were catalog entries for those files, but
> that the file was not there, and by "fixing" it the server just
> created empty files?

Not necessarily. What the warning actually means is that VACUUM found
an all-zeroes page within a table. There are scenarios where this is
not unexpected, particularly after a crash on the master. The reason
is that adding a page to a table is a two-step process. First we
write() a page of zeroes at the current EOF; this is basically to make
the filesystem reserve the space. We don't want to report that we've
committed a page-full of new rows and then discover there's no disk
space for them. Then we initialize the page (ie set up the page header)
and start putting rows into it. But these latter operations happen
inside a shared buffer, and might not reach disk until the next
checkpoint. Now, the insertions of the rows are entered into the WAL
log, and once the first such WAL entry has reached disk, the page will
be re-initialized by WAL replay if there's a crash. But there's an
interval between the filesystem's extension of a table with zeroes and
the first WAL entry related to the page reaching disk. If you get a
crash in that interval then the all-zeroes page will still be there
after recovery, and will go unused until VACUUM reclaims it (and
produces the WARNING).

So this would explain some zero pages (though not large numbers of
them) if you'd had crashes on the master. I'm not sure offhand whether
there's any case in which bringing up a PITR slave is close enough to
crash recovery that the same mechanism could apply to produce a zero
page on the slave where there had been none on the master.

In any case, 125 different zeroed pages is pretty hard to explain
by such a mechanism (especially if they were scattered rather than
in contiguous clumps). I tend to agree that it sounds like there
was something wrong with the rsync mirroring process.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Robert Treat 2008-01-19 08:17:21 Re: WARNINGs after starting backup server created with PITR
Previous Message Tom Lane 2008-01-19 00:37:01 Re: ATTN: Clodaldo was Performance problem. Could it be related to 8.3-beta4?