Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thomas Kellerer <spam_eater(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"
Date: 2016-09-29 21:48:30
Message-ID: 13903.1475185710@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

I wrote:
> But what gets my attention in this connection is that it doesn't
> seem to be taking the trouble to open the files in binary mode.
> Could that lead to the reported failure? Not sure, but it seems
> like at the least it could result in corrupted VM files.

On further thought, it definitely would lead to some sort of
hard-to-diagnose error. Assuming there's some bytes that look
like \n, Windows read() would expand them to \r\n, which not
only produces garbage vismap data but throws off the page boundaries
in the file. rewriteVisibilityMap() would not notice that, since
it contains exactly no sanity checking on the page headers,
but what it would notice is getting a short read on what it'll
think is a partial last page. Then it does this:

if ((bytesRead = read(src_fd, buffer, BLCKSZ)) != BLCKSZ)
{
close(dst_fd);
close(src_fd);
return getErrorText();
}

The read() won't have set errno, since by its lights there's nothing
wrong. So while we know that getErrorText saw errno == EINVAL, there's
no way to guess what call that might've been left over from.

BTW, just to add to the already long list of mortal sins in this
bit of code, I believe it does 100% the wrong thing on big-endian
hardware.

uint16 new_vmbits = 0;
...
/* Copy new visibility map bit to new format page */
memcpy(new_cur, &new_vmbits, BITS_PER_HEAPBLOCK);

That's going to drop the two new bytes down in exactly the wrong
order if big-endian.

Oh, and for even more fun, this bit will dump core on alignment-picky
machines, if the vmbuf local array starts on an odd boundary:

((PageHeader) new_vmbuf)->pd_checksum =
pg_checksum_page(new_vmbuf, new_blkno);

We might've caught these things earlier if the buildfarm testing
included cross-version upgrades, but of course that requires a
lot of test infrastructure that's not there ...

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2016-09-29 21:51:53 Re: [HACKERS] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"
Previous Message Jay Knight 2016-09-29 21:47:38 Re: Parallel query only when EXPLAIN ANALYZEd

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-09-29 21:51:53 Re: [HACKERS] pg_upgrade from 9.5 to 9.6 fails with "invalid argument"
Previous Message Michael Paquier 2016-09-29 21:43:42 Re: pg_basebackup, pg_receivexlog and data durability (was: silent data loss with ext4 / all current versions)