Re: streaming header too small

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Jaime Casanova <jaime(at)2ndquadrant(dot)com>, Selena Deckelmann <selena(at)chesnok(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: streaming header too small
Date: 2013-02-20 14:23:45
Message-ID: CABUevEz5b_=0gQoDy2MaeFpFcQznsqekU6DvCLF-XsnzytNDwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Feb 20, 2013 11:29 AM, "Heikki Linnakangas" <hlinnakangas(at)vmware(dot)com>
wrote:
>
> On 20.02.2013 02:11, Selena Deckelmann wrote:
>>
>> So, I just ran into a similar issue backing up a 9.2.1 server using
>> pg_basebackup version 9.2.3:
>>
>> pg_basebackup: starting background WAL receiver
>> pg_basebackup: streaming header too small: 25
>>
>>
>> I've had it happen two times in a row. I'm going to try again...
>>
>> But -- what would be helpful here? I can recompile pg_basebackup with
more
>> debugging...
>
>
> Hmm, 25 bytes would be the size of the WAL data packet, if it contains
just the header and no actual WAL data. I think pg_basebackup should accept
that - it's not unreasonable that the server might send such a packet
sometimes.
>
> Looking at the walsender code, it's not supposed to ever send such a
packet. But I suspect there's one corner-case where it might: if the
current send location is at an xlogid boundary, so that we previously sent
the last byte from the last WAL segment in the previous logical xlog file,
and the WAL flush position points to byte 0 in the beginning of the new WAL
file. Both of those positions are in fact the same thing, but we have two
different ways to represent the same position. For example, if we've
already sent up to WAL position (sentPtr in walsender.c):
>
> xlogid = 4
> xrecoff = XLogFileSize
>
> and GetFlushRecPtr() returns:
>
> xlogid = 5
> xrecoff = 0
>
> Those both point to the same position. But the check in XLogSend that
decides if there is any work to do uses XLByteLE() to check if they are
equal, and XLByteLE() treats the latter to be greater than the former. So,
in that situation, XLogSend() would decide that it has work to do, but
there actually isn't, so it would send 0 bytes of WAL data.
>
> I'm not sure how GetFlushRecPtr() could return such a position, though.
But I'm also not convinced that it can't happen.
>
> It would be fairly easy to fix walsender to not send anything in that
situation. It would also be easy to fix pg_basebackup to not treat it as an
error. We probably should do both.
>
> In 9.3, the XLogRecPtr representation changed so that there is only one
value for a boundary position like that, so this is a 9.2-only issue.

That does sound like a reasonable explanation and fix. Heck, probably
enough to just put the fix in pg_basebackup since it's gone in 9.3 anyway.

But I'd really like to confirm this is the actual situation before
considering it fixed, since it's clearly very intermittent.

Selena, was this reasonably reproducible for you? Would it be possible to
get a network trace of it to show of that's the kind of package coming
across, or by hacking up pg_basebackup to print the exact position it was
at when the problem occurred?

/Magnus

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2013-02-20 14:29:45 Re: Materialized views WIP patch
Previous Message Tom Lane 2013-02-20 12:58:23 Re: FDW for PostgreSQL