Re: pg_upgrade and rsync

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: pg_upgrade and rsync
Date: 2015-01-22 23:43:31
Message-ID: 20150122234331.GA15865@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 22, 2015 at 10:48:37PM +0200, Heikki Linnakangas wrote:
> >>> * If we need to protect hint bit updates from torn writes, WAL-log a
> >>> * full page image of the page. This full page image is only necessary
> >>> * if the hint bit update is the first change to the page since the
> >>> * last checkpoint.
> >>> *
> >>> * We don't check full_page_writes here because that logic is included
> >>> * when we call XLogInsert() since the value changes dynamically.
> >>> */
> >>> if (XLogHintBitIsNeeded() && (bufHdr->flags & BM_PERMANENT))
> >>> {
> >>> /*
> >>> * If we're in recovery we cannot dirty a page because of a hint.
> >>> * We can set the hint, just not dirty the page as a result so the
> >>> * hint is lost when we evict the page or shutdown.
> >>> *
> >>> * See src/backend/storage/page/README for longer discussion.
> >>> */
> >>> if (RecoveryInProgress())
> >>> return;
> >
> >What if XLogHintBitIsNeeded is false? That would be the case if we're not wall logging hints *on the standby*.
>
> Then the page will be updated without writing a WAL record. Just
> like in the master, if wal_log_hints is off. wal_log_hints works the
> same on the master or the standby.

[ see below for why this entire idea might not work ]

OK, I was confused by your previous "no". It means we do update hint
bits on read-only slave queries --- we just don't WAL log them. In
fact, we can't update hint bits on the standby if we are wal logging
them ---- is that right?

My text was saying:

these differences can be reduced by using a fresh standby and by
enabling <xref linkend="guc-wal-log-hints">. (While
<varname>wal_log_hints</> transfers hint bits from the primary to
standbys, additional hint bits are still set on the standbys by
read-only queries.)

meaning if you don't run any read-only queries on the standby, the files
will be same on master/standby because the hint bits will be the same,
and rsync will not copy the files.

This brings up the other problem that the mod times of the files are
likely to be different between master and slave --- should I recommend
to only use rsync --checksum?

I would really like to get a way to pg_upgrade the standbys but we have
never really be able to get a solution. Ideally we would update just
the system table files, and if the order of pg_upgrade file renames is
exactly the same, everything else would match, but I can't imagine what
such an API would look like. Have pg_upgrade spit out a list of files
to be copied?

In fact, these are the relfilenodes pg_upgrade preserves:

* While pg_class.oid and pg_class.relfilenode are initially the same
* in a cluster, they can diverge due to CLUSTER, REINDEX, or VACUUM
* FULL. In the new cluster, pg_class.oid and pg_class.relfilenode will
* be the same and will match the old pg_class.oid value. Because of
* this, old/new pg_class.relfilenode values will not match if CLUSTER,
* REINDEX, or VACUUM FULL have been performed in the old cluster.
*
* We control all assignments of pg_type.oid because these oids are stored
* in user composite type values.
*
* We control all assignments of pg_enum.oid because these oids are stored
* in user tables as enum values.
*
* We control all assignments of pg_authid.oid because these oids are stored
* in pg_largeobject_metadata.

so if the table/index relfilenodes no longer match the oid on the old
cluster, due to CLUSTER, REINDEX, or VACUUM FULL, the file name will not
match on the new cluster and rsync will copy the entire file. In fact,
rsync is going to copy it to the wrong file name, and delete the right
file.

I am going to now conclude that rsync is never going to work for this,
unless we have pg_upgrade preserve relfilenodes as well. However, I am
not even sure that is possible due to conflicts with system table
relfilenodes created in the new cluster.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2015-01-22 23:46:36 Re: Back-branch update releases scheduled
Previous Message Andreas Karlsson 2015-01-22 23:40:51 Re: Using 128-bit integers for sum, avg and statistics aggregates