Skip site navigation (1) Skip section navigation (2)

Re: Teaching pg_receivexlog to follow timeline switches

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Teaching pg_receivexlog to follow timeline switches
Date: 2013-01-15 14:05:57
Message-ID: 50F56245.8050802@vmware.com (view raw)
Now that a standby server can follow timeline switches through streaming 
replication, we should do teach pg_receivexlog to do the same. Patch 
attached.

I made one change to the way START_STREAMING command works, to better 
support this. When a standby server reaches the timeline it's streaming 
from the master, it stops streaming, fetches any missing timeline 
history files, and parses the history file of the latest timeline to 
figure out where to continue. However, I don't want to parse timeline 
history files in pg_receivexlog. Better to keep it simple. So instead, I 
modified the server-side code for START_STREAMING to return the next 
timeline's ID at the end, and used that in pg_receivexlog. I also 
modifed BASE_BACKUP to return not only the start XLogRecPtr, but also 
the corresponding timeline ID. Otherwise we might try to start streaming 
from wrong timeline if you issue a BASE_BACKUP at the same moment the 
server switches to a new timeline.

When pg_receivexlog switches timeline, what to do with the partial file 
on the old timeline? When the timeline changes in the middle of a WAL 
segment, the segment old the old timeline is only half-filled. For 
example, when timeline changes from 1 to 2, you'll have this in pg_xlog:

000000010000000000000006
000000010000000000000007
000000010000000000000008
000000020000000000000008
00000002.history

The segment 000000010000000000000008 is only half-filled, as the 
timeline changed in the middle of that segment. The beginning portion of 
that file is duplicated in 000000020000000000000008, with the 
timeline-changing checkpoint record right after the duplicated portion.

When we stream that with pg_receivexlog, and hit the timeline switch, 
we'll have this situation in the client:

000000010000000000000006
000000010000000000000007
000000010000000000000008.partial

What to do with the partial file? One option is to rename it to 
000000010000000000000008. However, if you then kill pg_receivexlog 
before it has finished streaming a full segment from the new timeline, 
on restart it will try to begin streaming WAL segment 
000000010000000000000009, because it sees that segment 
000000010000000000000008 is already completed. That'd be wrong.

The best option seems to be to just leave the .partial file in place, so 
as streaming progresses, you end up with:

000000010000000000000006
000000010000000000000007
000000010000000000000008.partial
000000020000000000000008
000000020000000000000009
00000002000000000000000A.partial

It feels a bit confusing to have that old partial file there, but that 
seems like the most correct solution. That file is indeed partial. This 
also ensures that if the server running on timeline 1 continues to 
generate new WAL, and it fills 000000010000000000000008, we won't 
confuse the partial segment with that name with a full one.

- Heikki
Attachment: teach-receivexlog-to-switch-timelines-1.patch
Description: text/x-diff (50.3 KB)
From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teaching pg_receivexlog to follow timeline switches
Date: 2013-01-15 18:22:03
Message-ID: CAHGQGwEzkyyemeEn7T-c-Xm+TcQbAX_GEDMOPAFi5o-fU2Vw-g@mail.gmail.com (view raw)
On Tue, Jan 15, 2013 at 11:05 PM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> Now that a standby server can follow timeline switches through streaming
> replication, we should do teach pg_receivexlog to do the same. Patch
> attached.
>
> I made one change to the way START_STREAMING command works, to better
> support this. When a standby server reaches the timeline it's streaming from
> the master, it stops streaming, fetches any missing timeline history files,
> and parses the history file of the latest timeline to figure out where to
> continue. However, I don't want to parse timeline history files in
> pg_receivexlog. Better to keep it simple. So instead, I modified the
> server-side code for START_STREAMING to return the next timeline's ID at the
> end, and used that in pg_receivexlog. I also modifed BASE_BACKUP to return
> not only the start XLogRecPtr, but also the corresponding timeline ID.
> Otherwise we might try to start streaming from wrong timeline if you issue a
> BASE_BACKUP at the same moment the server switches to a new timeline.
>
> When pg_receivexlog switches timeline, what to do with the partial file on
> the old timeline? When the timeline changes in the middle of a WAL segment,
> the segment old the old timeline is only half-filled. For example, when
> timeline changes from 1 to 2, you'll have this in pg_xlog:
>
> 000000010000000000000006
> 000000010000000000000007
> 000000010000000000000008
> 000000020000000000000008
> 00000002.history
>
> The segment 000000010000000000000008 is only half-filled, as the timeline
> changed in the middle of that segment. The beginning portion of that file is
> duplicated in 000000020000000000000008, with the timeline-changing
> checkpoint record right after the duplicated portion.
>
> When we stream that with pg_receivexlog, and hit the timeline switch, we'll
> have this situation in the client:
>
> 000000010000000000000006
> 000000010000000000000007
> 000000010000000000000008.partial
>
> What to do with the partial file? One option is to rename it to
> 000000010000000000000008. However, if you then kill pg_receivexlog before it
> has finished streaming a full segment from the new timeline, on restart it
> will try to begin streaming WAL segment 000000010000000000000009, because it
> sees that segment 000000010000000000000008 is already completed. That'd be
> wrong.

Can't we rename .partial file safely after we receive a full segment
of the WAL file
with new timeline and the same logid/segmentid?

Regards,

-- 
Fujii Masao


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teaching pg_receivexlog to follow timeline switches
Date: 2013-01-16 16:08:31
Message-ID: 50F6D07F.9010207@vmware.com (view raw)
On 15.01.2013 20:22, Fujii Masao wrote:
> On Tue, Jan 15, 2013 at 11:05 PM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com>  wrote:
>> Now that a standby server can follow timeline switches through streaming
>> replication, we should do teach pg_receivexlog to do the same. Patch
>> attached.
>>
>> I made one change to the way START_STREAMING command works, to better
>> support this. When a standby server reaches the timeline it's streaming from
>> the master, it stops streaming, fetches any missing timeline history files,
>> and parses the history file of the latest timeline to figure out where to
>> continue. However, I don't want to parse timeline history files in
>> pg_receivexlog. Better to keep it simple. So instead, I modified the
>> server-side code for START_STREAMING to return the next timeline's ID at the
>> end, and used that in pg_receivexlog. I also modifed BASE_BACKUP to return
>> not only the start XLogRecPtr, but also the corresponding timeline ID.
>> Otherwise we might try to start streaming from wrong timeline if you issue a
>> BASE_BACKUP at the same moment the server switches to a new timeline.
>>
>> When pg_receivexlog switches timeline, what to do with the partial file on
>> the old timeline? When the timeline changes in the middle of a WAL segment,
>> the segment old the old timeline is only half-filled. For example, when
>> timeline changes from 1 to 2, you'll have this in pg_xlog:
>>
>> 000000010000000000000006
>> 000000010000000000000007
>> 000000010000000000000008
>> 000000020000000000000008
>> 00000002.history
>>
>> The segment 000000010000000000000008 is only half-filled, as the timeline
>> changed in the middle of that segment. The beginning portion of that file is
>> duplicated in 000000020000000000000008, with the timeline-changing
>> checkpoint record right after the duplicated portion.
>>
>> When we stream that with pg_receivexlog, and hit the timeline switch, we'll
>> have this situation in the client:
>>
>> 000000010000000000000006
>> 000000010000000000000007
>> 000000010000000000000008.partial
>>
>> What to do with the partial file? One option is to rename it to
>> 000000010000000000000008. However, if you then kill pg_receivexlog before it
>> has finished streaming a full segment from the new timeline, on restart it
>> will try to begin streaming WAL segment 000000010000000000000009, because it
>> sees that segment 000000010000000000000008 is already completed. That'd be
>> wrong.
>
> Can't we rename .partial file safely after we receive a full segment
> of the WAL file
> with new timeline and the same logid/segmentid?

I'd prefer to leave the .partial suffix in place, as the segment really 
isn't complete. It doesn't make a difference when you recover to the 
latest timeline, but if you have a more complicated scenario with 
multiple timelines that are still "alive", ie. there's a server still 
actively generating WAL on that timeline, you'll easily get confused.

As an example, imagine that you have a master server, and one standby. 
You maintain a WAL archive for backup purposes with pg_receivexlog, 
connected to the standby. Now, for some reason, you get a split-brain 
situation and the standby server is promoted with new timeline 2, while 
the real master is still running. The DBA notices the problem, and kills 
the standby and pg_receivexlog. He deletes the XLOG files belonging to 
timeline 2 in pg_receivexlog's target directory, and re-points 
pg_recevexlog to the master while he re-builds the standby server from 
backup. At that point, pg_receivexlog will start streaming from the end 
of the zero-padded segment, not knowing that it was partial, and you 
have a hole in the archived WAL stream. Oops.

The DBA could avoid that by also removing the last WAL segment on 
timeline 1, the one that was partial. But it's really not obvious that 
there's anything wrong with that segment. Keeping the .partial suffix 
makes it clear.

- Heikki


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teaching pg_receivexlog to follow timeline switches
Date: 2013-01-16 17:06:48
Message-ID: CAHGQGwGN9QMLJ8Xb7G5v77OsohGhQNuiB-pmceBW5JEUTxe+-w@mail.gmail.com (view raw)
On Thu, Jan 17, 2013 at 1:08 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> On 15.01.2013 20:22, Fujii Masao wrote:
>>
>> On Tue, Jan 15, 2013 at 11:05 PM, Heikki Linnakangas
>> <hlinnakangas(at)vmware(dot)com>  wrote:
>>>
>>> Now that a standby server can follow timeline switches through streaming
>>> replication, we should do teach pg_receivexlog to do the same. Patch
>>> attached.
>>>
>>> I made one change to the way START_STREAMING command works, to better
>>> support this. When a standby server reaches the timeline it's streaming
>>> from
>>> the master, it stops streaming, fetches any missing timeline history
>>> files,
>>> and parses the history file of the latest timeline to figure out where to
>>> continue. However, I don't want to parse timeline history files in
>>> pg_receivexlog. Better to keep it simple. So instead, I modified the
>>> server-side code for START_STREAMING to return the next timeline's ID at
>>> the
>>> end, and used that in pg_receivexlog. I also modifed BASE_BACKUP to
>>> return
>>> not only the start XLogRecPtr, but also the corresponding timeline ID.
>>> Otherwise we might try to start streaming from wrong timeline if you
>>> issue a
>>> BASE_BACKUP at the same moment the server switches to a new timeline.
>>>
>>> When pg_receivexlog switches timeline, what to do with the partial file
>>> on
>>> the old timeline? When the timeline changes in the middle of a WAL
>>> segment,
>>> the segment old the old timeline is only half-filled. For example, when
>>> timeline changes from 1 to 2, you'll have this in pg_xlog:
>>>
>>> 000000010000000000000006
>>> 000000010000000000000007
>>> 000000010000000000000008
>>> 000000020000000000000008
>>> 00000002.history
>>>
>>> The segment 000000010000000000000008 is only half-filled, as the timeline
>>> changed in the middle of that segment. The beginning portion of that file
>>> is
>>> duplicated in 000000020000000000000008, with the timeline-changing
>>> checkpoint record right after the duplicated portion.
>>>
>>> When we stream that with pg_receivexlog, and hit the timeline switch,
>>> we'll
>>> have this situation in the client:
>>>
>>> 000000010000000000000006
>>> 000000010000000000000007
>>> 000000010000000000000008.partial
>>>
>>> What to do with the partial file? One option is to rename it to
>>> 000000010000000000000008. However, if you then kill pg_receivexlog before
>>> it
>>> has finished streaming a full segment from the new timeline, on restart
>>> it
>>> will try to begin streaming WAL segment 000000010000000000000009, because
>>> it
>>> sees that segment 000000010000000000000008 is already completed. That'd
>>> be
>>> wrong.
>>
>>
>> Can't we rename .partial file safely after we receive a full segment
>> of the WAL file
>> with new timeline and the same logid/segmentid?
>
>
> I'd prefer to leave the .partial suffix in place, as the segment really
> isn't complete. It doesn't make a difference when you recover to the latest
> timeline, but if you have a more complicated scenario with multiple
> timelines that are still "alive", ie. there's a server still actively
> generating WAL on that timeline, you'll easily get confused.
>
> As an example, imagine that you have a master server, and one standby. You
> maintain a WAL archive for backup purposes with pg_receivexlog, connected to
> the standby. Now, for some reason, you get a split-brain situation and the
> standby server is promoted with new timeline 2, while the real master is
> still running. The DBA notices the problem, and kills the standby and
> pg_receivexlog. He deletes the XLOG files belonging to timeline 2 in
> pg_receivexlog's target directory, and re-points pg_recevexlog to the master
> while he re-builds the standby server from backup. At that point,
> pg_receivexlog will start streaming from the end of the zero-padded segment,
> not knowing that it was partial, and you have a hole in the archived WAL
> stream. Oops.
>
> The DBA could avoid that by also removing the last WAL segment on timeline
> 1, the one that was partial. But it's really not obvious that there's
> anything wrong with that segment. Keeping the .partial suffix makes it
> clear.

Thanks for elaborating the reason why .partial suffix should be kept.
I agree that keeping the .partial suffix would be safer.

Regards,

-- 
Fujii Masao


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teaching pg_receivexlog to follow timeline switches
Date: 2013-01-16 22:28:35
Message-ID: m28v7srcdo.fsf@2ndQuadrant.fr (view raw)
Fujii Masao <masao(dot)fujii(at)gmail(dot)com> writes:
> Thanks for elaborating the reason why .partial suffix should be kept.
> I agree that keeping the .partial suffix would be safer.

+1 to both points.  So +2 I guess :)

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teaching pg_receivexlog to follow timeline switches
Date: 2013-01-17 14:56:49
Message-ID: CA+TgmobvbibBQkuDHPo6StedLKya9rMqEtAFOzyvk9ibrkj07Q@mail.gmail.com (view raw)
On Wed, Jan 16, 2013 at 11:08 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> I'd prefer to leave the .partial suffix in place, as the segment really
> isn't complete. It doesn't make a difference when you recover to the latest
> timeline, but if you have a more complicated scenario with multiple
> timelines that are still "alive", ie. there's a server still actively
> generating WAL on that timeline, you'll easily get confused.
>
> As an example, imagine that you have a master server, and one standby. You
> maintain a WAL archive for backup purposes with pg_receivexlog, connected to
> the standby. Now, for some reason, you get a split-brain situation and the
> standby server is promoted with new timeline 2, while the real master is
> still running. The DBA notices the problem, and kills the standby and
> pg_receivexlog. He deletes the XLOG files belonging to timeline 2 in
> pg_receivexlog's target directory, and re-points pg_recevexlog to the master
> while he re-builds the standby server from backup. At that point,
> pg_receivexlog will start streaming from the end of the zero-padded segment,
> not knowing that it was partial, and you have a hole in the archived WAL
> stream. Oops.
>
> The DBA could avoid that by also removing the last WAL segment on timeline
> 1, the one that was partial. But it's really not obvious that there's
> anything wrong with that segment. Keeping the .partial suffix makes it
> clear.

I shudder at the idea that the DBA is manually involved in any of this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teaching pg_receivexlog to follow timeline switches
Date: 2013-01-17 14:59:19
Message-ID: 50F811C7.4080100@vmware.com (view raw)
On 17.01.2013 16:56, Robert Haas wrote:
> On Wed, Jan 16, 2013 at 11:08 AM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com>  wrote:
>> I'd prefer to leave the .partial suffix in place, as the segment really
>> isn't complete. It doesn't make a difference when you recover to the latest
>> timeline, but if you have a more complicated scenario with multiple
>> timelines that are still "alive", ie. there's a server still actively
>> generating WAL on that timeline, you'll easily get confused.
>>
>> As an example, imagine that you have a master server, and one standby. You
>> maintain a WAL archive for backup purposes with pg_receivexlog, connected to
>> the standby. Now, for some reason, you get a split-brain situation and the
>> standby server is promoted with new timeline 2, while the real master is
>> still running. The DBA notices the problem, and kills the standby and
>> pg_receivexlog. He deletes the XLOG files belonging to timeline 2 in
>> pg_receivexlog's target directory, and re-points pg_recevexlog to the master
>> while he re-builds the standby server from backup. At that point,
>> pg_receivexlog will start streaming from the end of the zero-padded segment,
>> not knowing that it was partial, and you have a hole in the archived WAL
>> stream. Oops.
>>
>> The DBA could avoid that by also removing the last WAL segment on timeline
>> 1, the one that was partial. But it's really not obvious that there's
>> anything wrong with that segment. Keeping the .partial suffix makes it
>> clear.
>
> I shudder at the idea that the DBA is manually involved in any of this.

The scenario I described is that you screwed up your failover 
environment, and end up with a split-brain situation by accident. The 
DBA certainly needs to be involved to recover from that.

- Heikki


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teaching pg_receivexlog to follow timeline switches
Date: 2013-01-17 15:12:09
Message-ID: CA+TgmobYPNC_n_DbOXu965RDuFKYxfLJQvXhzb-c-s_9ALpaHQ@mail.gmail.com (view raw)
On Thu, Jan 17, 2013 at 9:59 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> The scenario I described is that you screwed up your failover environment,
> and end up with a split-brain situation by accident. The DBA certainly needs
> to be involved to recover from that.

OK, I agree, but I still think a lot of DBAs would have no idea how to
handle that situation.  I agree with your proposal, don't get me wrong
- I just think there's still an awful lot of room for operator error
in these more complex replication scenarios.  I don't have a clue how
to fix that, and it's certainly not the purpose of this thread to fix
that; I'm just venting.

Actually, I'm really glad to see all the work you've done to improve
the way that some of these scenarios work and eliminate various bugs
and other surprising failure modes over the last couple of months.
It's great stuff.  Alas, I think we still some distance from being
able to provide an "easy button".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>,Fujii Masao <masao(dot)fujii(at)gmail(dot)com>,PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teaching pg_receivexlog to follow timeline switches
Date: 2013-01-17 19:45:07
Message-ID: 20130117194506.GB4033@alvh.no-ip.org (view raw)
Robert Haas escribió:

> Actually, I'm really glad to see all the work you've done to improve
> the way that some of these scenarios work and eliminate various bugs
> and other surprising failure modes over the last couple of months.
> It's great stuff.

+1

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Phil Sorber <phil(at)omniti(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teaching pg_receivexlog to follow timeline switches
Date: 2013-01-18 04:38:46
Message-ID: CADAkt-gok5P2UcKha02k3cXt4C7iPdruaeA3TeG0-3KC27sjhA@mail.gmail.com (view raw)
On Tue, Jan 15, 2013 at 9:05 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> Now that a standby server can follow timeline switches through streaming
> replication, we should do teach pg_receivexlog to do the same. Patch
> attached.

Is it possible to re-use walreceiver code from the backend?

I was thinking that it would actually be very useful to have the whole
replication functionality modularized and in a standalone binary that
could act as a replication proxy and WAL archiver that could run
without all the overhead of an entire PG instance.


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Phil Sorber <phil(at)omniti(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teaching pg_receivexlog to follow timeline switches
Date: 2013-01-18 12:55:19
Message-ID: 50F94637.9010908@vmware.com (view raw)
On 18.01.2013 06:38, Phil Sorber wrote:
> On Tue, Jan 15, 2013 at 9:05 AM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com>  wrote:
>> Now that a standby server can follow timeline switches through streaming
>> replication, we should do teach pg_receivexlog to do the same. Patch
>> attached.
>
> Is it possible to re-use walreceiver code from the backend?
>
> I was thinking that it would actually be very useful to have the whole
> replication functionality modularized and in a standalone binary that
> could act as a replication proxy and WAL archiver that could run
> without all the overhead of an entire PG instance

There's much sense in trying to extract that into a stand-along module. 
src/bin/pg_basebackup/receivelog.c is about 1000 lines of code at the 
moment, and it looks quite different from the corresponding code in the 
backend, because it doesn't have all the backend infrastructure available.

- Heikki


From: Phil Sorber <phil(at)omniti(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teaching pg_receivexlog to follow timeline switches
Date: 2013-01-21 15:58:22
Message-ID: CADAkt-j2pm3bq5+EjreihE06B6gGpv_jqn3nRNvHnGScfDCjvA@mail.gmail.com (view raw)
On Fri, Jan 18, 2013 at 7:55 AM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> On 18.01.2013 06:38, Phil Sorber wrote:
>> Is it possible to re-use walreceiver code from the backend?
>>
>> I was thinking that it would actually be very useful to have the whole
>> replication functionality modularized and in a standalone binary that
>> could act as a replication proxy and WAL archiver that could run
>> without all the overhead of an entire PG instance
>
>
> There's much sense in trying to extract that into a stand-along module.
> src/bin/pg_basebackup/receivelog.c is about 1000 lines of code at the
> moment, and it looks quite different from the corresponding code in the
> backend, because it doesn't have all the backend infrastructure available.
>
> - Heikki

That's fair.

What do you think about the idea of a full WAL proxy? Probably not for
9.3 at this point though.


From: Noah Misch <noah(at)leadboat(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Teaching pg_receivexlog to follow timeline switches
Date: 2013-01-21 22:43:13
Message-ID: 20130121224313.GA28101@tornado.leadboat.com (view raw)
This patch was in Needs Review status, but you committed it on 2013-01-17.  I
have marked it as such in the CF app.


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Phil Sorber <phil(at)omniti(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teaching pg_receivexlog to follow timeline switches
Date: 2013-01-22 13:02:53
Message-ID: m28v7ll69u.fsf@2ndQuadrant.fr (view raw)
Phil Sorber <phil(at)omniti(dot)com> writes:
> What do you think about the idea of a full WAL proxy? Probably not for
> 9.3 at this point though.

I was thinking that a WAL proxy nowadays is called a cascading standby
with local archiving enabled. I'm not sure why you would want to trust
your archiving and WAL relaying to another piece of software…

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Phil Sorber <phil(at)omniti(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teaching pg_receivexlog to follow timeline switches
Date: 2013-01-22 13:10:17
Message-ID: 50FE8FB9.3020006@vmware.com (view raw)
On 22.01.2013 15:02, Dimitri Fontaine wrote:
> Phil Sorber<phil(at)omniti(dot)com>  writes:
>> What do you think about the idea of a full WAL proxy? Probably not for
>> 9.3 at this point though.
>
> I was thinking that a WAL proxy nowadays is called a cascading standby
> with local archiving enabled. I'm not sure why you would want to trust
> your archiving and WAL relaying to another piece of software…

You might not want to keep a copy of the whole data directory around, as 
you have to in a cascading standby. I can see value in a separate WAL 
proxy software, especially if it's integrated into a larger backup 
manager program like barman or wal-e.

- Heikki


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Phil Sorber <phil(at)omniti(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teaching pg_receivexlog to follow timeline switches
Date: 2013-01-22 13:33:15
Message-ID: m2sj5tibqc.fsf@2ndQuadrant.fr (view raw)
Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> You might not want to keep a copy of the whole data directory around, as you
> have to in a cascading standby. I can see value in a separate WAL proxy
> software, especially if it's integrated into a larger backup manager program
> like barman or wal-e.

+1

I somehow forgot about $PGDATA here. Time for a little break I guess :)

Another idea is to have a daemon mode pg_receivexlog where not only it
can maintain a local archive but also feed it using the replication
protocol to standbies, keeping track of their position.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


From: Phil Sorber <phil(at)omniti(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Teaching pg_receivexlog to follow timeline switches
Date: 2013-01-22 14:13:51
Message-ID: CADAkt-hgE3E+kt6nBnsrXMso=Lu0uDW=omq6R3pwZP2W+oNZBQ@mail.gmail.com (view raw)
On Tue, Jan 22, 2013 at 8:33 AM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
>> You might not want to keep a copy of the whole data directory around, as you
>> have to in a cascading standby. I can see value in a separate WAL proxy
>> software, especially if it's integrated into a larger backup manager program
>> like barman or wal-e.
>
> +1
>
> I somehow forgot about $PGDATA here. Time for a little break I guess :)
>
> Another idea is to have a daemon mode pg_receivexlog where not only it
> can maintain a local archive but also feed it using the replication
> protocol to standbies, keeping track of their position.

I'm not sure if i described it well, but that's essentially what I was
asking about. It would have both wal receiving and and wal sending
capability. Along with it's own local WAL storage perhaps governed in
size by a keep_wal_segments and also a longer term archive that you
could have compressed but also pull from with a archive and restore
command. And also be able to act as a synchronous replication peer. I
think it has already been discussed to have pg_receivexlog do that
last one.

So yeah, a cascading standby without $PGDATA or hot_standby or large
shared_buffers resources. It seems like maybe we could add through
subtraction. Add a parameter that disables wal replay? I'm sure
there'd be more things it would have to disable, but then it's not two
separate binaries.

>
> Regards,
> --
> Dimitri Fontaine
> http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


From: Craig Ringer <craig(at)2ndQuadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Teaching pg_receivexlog to follow timeline switches
Date: 2013-01-24 05:42:33
Message-ID: 5100C9C9.3010308@2ndQuadrant.com (view raw)
On 01/22/2013 06:43 AM, Noah Misch wrote:
> This patch was in Needs Review status, but you committed it on 2013-01-17.  I
> have marked it as such in the CF app.
Thankyou. There's a lot to keep up with :S

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Privacy Policy | About PostgreSQL
Copyright © 1996-2013 The PostgreSQL Global Development Group