Re: Error while copying a large file in pg_rewind

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Error while copying a large file in pg_rewind
Date: 2017-07-20 19:31:20
Message-ID: CA+TgmoYuY5zW7JEs+1hSS1D=V5K8h1SQuESrq=bMNeo0B71Sfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 20, 2017 at 2:17 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> Heikki, this bug is rather bad for anybody using pg_rewind with
> relation file sizes larger than 2GB as this corrupts data of
> instances. I think that you would be the best fit as a committer to
> look at this patch as you implemented the tool first, and it would be
> a bad idea to let that sit for a too long time. Could it be possible
> to spare a bit of your time at some point to look at it?

The comment header of receiveFileChunks is updated like this:

/*----
* Runs a query, which returns pieces of files from the remote source data
* directory, and overwrites the corresponding parts of target files with
* the received parts. The result set is expected to be of format:
*
* path text -- path in the data directory, e.g "base/1/123"
- * begin int4 -- offset within the file
+ * begin int8 -- offset within the file
* chunk bytea -- file content
*----

...but down inside the function there's still this:

if (PQftype(res, 0) != TEXTOID &&
PQftype(res, 1) != INT4OID &&
PQftype(res, 2) != BYTEAOID)
{
pg_fatal("unexpected data types in result set
while fetching remote files: %u %u %u\n",
PQftype(res, 0), PQftype(res,
1), PQftype(res, 2));
}

I was initially surprised that your testing managed to pass, but then
I noticed that this sanity test is using && where it should really be
using ||; it will only fail if ALL of the data types are wrong. Oops.

This code plays pretty fast and loose with off_t vs. size_t vs. uint64
vs. int64, but those definitely don't all agree in signedness and may
not even agree in width (size_t, at least, might be 4 bytes). We use
stat() to get the size of a file as an off_t, and then store it into a
uint64. Perhaps off_t is always uint64 anyway, but I'm not sure.
Then, that value gets passed to fetch_file_range(), still as a uint64,
and it then gets sent to the server to be stored into an int8 column,
which is signed rather than unsigned. That will error out if there's
a file size >= 2^63, but filesystem holding more than 8 exabytes are
unusual and will probably remain so for some time. The server sends
back an int64 which we pass to write_target_range(), whose argument is
declared as off_t, which is where we started. I'm not sure there's
any real problem with all of that, but it's a little nervous-making
that we keep switching types. Similarly, libpqProcessFileList gets
the file size as an int64 and then passes it to process_source_file()
which is expecting size_t. So we manage to use 4 different data types
to represent a file size. On most systems, all of them except SQL's
int8 are likely to be 64-bit unsigned integers, but I'm not sure what
will happen on obscure platforms.

Still, it can't be worse than the status quo, where instead of int64
we're using int and int32, so maybe we ought to back-patch it as-is
for now and look at any further cleanup that is needed as a
master-only improvement.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-07-20 19:34:59 Re: Increase Vacuum ring buffer.
Previous Message Stephen Frost 2017-07-20 19:04:05 Re: Increase Vacuum ring buffer.