Re: pg_rewind copy so much data

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Hung Phan <hungphan227(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-general(at)postgresql(dot)org>
Subject: Re: pg_rewind copy so much data
Date: 2017-09-15 06:55:02
Message-ID: CAB7nPqRswO1Y5dGyqRPv8+fD-znZGHmxN0ytYO2AeR52v_zNCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Sep 15, 2017 at 2:57 PM, Hung Phan <hungphan227(at)gmail(dot)com> wrote:
> [...]

Please do not top-post. This breaks the logic of the thread.

> I use ver 9.5.3.

You should update to the latest minor version available, there have
been quite a couple of bug fixes in Postgres since this 9.5.3.

> I have just run again and get the debug log. It is very long so I attach in mail
In this case the LSN where the promoted standby and the rewound node
diverged is clear:
servers diverged at WAL position 2/D69820C8 on timeline 12
rewinding from last common checkpoint at 2/D6982058 on timeline 12
The last segment on timeline 13 is 0000000D00000002000000E0, which may
be a recycled segment, still that's up to 160MB worth of data...

And from what I can see a lot of the data comes from WAL segments from
past timelines, close to 1.3GB. The rest is more or less completely
coming from relation files from a different tablespace than the
default, tables with OID 16665 and 16683 covering the largest part of
it. What is strange to begin with is that there are many segments from
past timelines. Those should not stick around.

Could you check if the relfilenodes of 16665 and 16683 exist on source
server but do *not* exist on the target server? When issuing a rewind,
a relation file that exists on both has no action taken on (see
process_source_file in filemap.c), and only a set of block are
registered. Based on what comes from your log file, the file is being
copied from the source to the target, not its blocks:
pg_tblspc/16386/PG_9.5_201510051/16387/16665 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/16665.1 (COPY)
pg_tblspc/16386/PG_9.5_201510051/16387/16665_fsm (COPY)
And this leads to an increase of the data included in what is rewound.
So aren't you for example re-creating a new database after the standy
is promoted or something like that?
--
Michael

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Rafal Pietrak 2017-09-15 09:03:19 Re: looking for a globally unique row ID
Previous Message Hung Phan 2017-09-15 06:03:50 Re: pg_rewind copy so much data