Re: BUG #7534: walreceiver takes long time to detect n/w breakdown

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: BUG #7534: walreceiver takes long time to detect n/w breakdown
Date: 2012-10-15 16:31:09
Message-ID: CAHGQGwF9NQuqLm5GJKmEvQwxFkHQ=e2zXs4NC5zjbeoyvTustw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Mon, Oct 15, 2012 at 11:27 PM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> On 15.10.2012 13:13, Heikki Linnakangas wrote:
>>
>> On 13.10.2012 19:35, Fujii Masao wrote:
>>>
>>> ISTM you need to update the protocol.sgml because you added
>>> the field 'replyRequested' to WalSndrMessage and StandbyReplyMessage.
>>
>>
>> Oh, I didn't remember that we've documented the specific structs that we
>> pass around. It's quite bogus anyway to explain the messages the way we
>> do currently, as they are actually dependent on the underlying
>> architecture's endianess and padding. I think we should refactor the
>> protocol to not transmit raw structs, but use pq_sentint and friends to
>> construct the messages. This was discussed earlier (see
>>
>> http://archives.postgresql.org/message-id/4FE2279C.2070506@enterprisedb.com),
>> I think there's consensus that 9.3 would be a good time to do that as we
>> changed the XLogRecPtr format anyway.
>
>
> This is what I came up with. The replication protocol is now
> architecture-independent. The WAL format itself is still
> architecture-independent, of course, but this is useful if you want to e.g
> use pg_receivexlog to back up a server that runs on a different platform.
>
> I chose the int64 format to transmit timestamps, even when compiled with
> --disable-integer-datetimes.
>
> Please review if you have the time..

Thanks for the patch!

When I ran pg_receivexlog, I encountered the following error.

$ pg_receivexlog -D hoge
pg_receivexlog: unexpected termination of replication stream: ERROR:
no data left in message

pg_basebackup -X stream caused the same error.

$ pg_basebackup -D hoge -X stream -c fast
pg_basebackup: could not send feedback packet: no COPY in progress
pg_basebackup: child process exited with error 1

In walreceiver.c, tmpbuf is allocated for every XLogWalRcvProcessMsg() call.
It should be allocated just once and continue to be used till end, to reduce
palloc overhead?

+ hdrlen = sizeof(int64) + sizeof(int64) + sizeof(int64);
+ hdrlen = sizeof(int64) + sizeof(int64) + sizeof(char);

These should be macro, to avoid calculation overhead?

+ /* Construct the the message and send it. */
+ resetStringInfo(&reply_message);
+ pq_sendbyte(&reply_message, 'h');
+ pq_sendint(&reply_message, xmin, 4);
+ pq_sendint(&reply_message, nextEpoch, 4);
+ walrcv_send(reply_message.data, reply_message.len);

You seem to have forgotten to send the sendTime.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Joshua D. Drake 2012-10-15 17:18:55 Re: WebSphere Application Server support for postgres
Previous Message Tom Lane 2012-10-15 16:18:16 Re: BUG #7604: adding criteria to a query against a view in 9.2 expands the results instead of constraining them

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-10-15 16:31:55 Re: Potential autovacuum optimization: new tables
Previous Message Satoshi Nagayasu 2012-10-15 16:19:37 Re: pg_stat_lwlocks view - lwlocks statistics, round 2