Re: WIP Incremental JSON Parser

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Jacob Champion <champion(dot)p(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: WIP Incremental JSON Parser
Date: 2024-01-24 18:08:34
Message-ID: CA+TgmoZNT3SjyC4zhoqAPbrbEF21XzzCXgWsizMvhTRHzG=HDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 24, 2024 at 10:04 AM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> The cfbot reports an error on a 32 bit build <https://api.cirrus-ci.com/v1/artifact/task/6055909135220736/testrun/build-32/testrun/pg_combinebackup/003_timeline/log/regress_log_003_timeline>:
>
> # Running: pg_basebackup -D /tmp/cirrus-ci-build/build-32/testrun/pg_combinebackup/003_timeline/data/t_003_timeline_node1_data/backup/backup2 --no-sync -cfast --incremental /tmp/cirrus-ci-build/build-32/testrun/pg_combinebackup/003_timeline/data/t_003_timeline_node1_data/backup/backup1/backup_manifest
> pg_basebackup: error: could not upload manifest: ERROR: could not parse backup manifest: file size is not an integer
> pg_basebackup: removing data directory "/tmp/cirrus-ci-build/build-32/testrun/pg_combinebackup/003_timeline/data/t_003_timeline_node1_data/backup/backup2"
> [02:41:07.830](0.073s) not ok 2 - incremental backup from node1
> [02:41:07.830](0.000s) # Failed test 'incremental backup from node1'
>
> I have set up a Debian 12 EC2 instance following the recipe at <https://raw.githubusercontent.com/anarazel/pg-vm-images/main/scripts/linux_debian_install_deps.sh>, and ran what I think are the same tests dozens of times, but the failure did not reappear in my setup. Unfortunately, the test doesn't show the failing manifest or log the failing field, so trying to divine what happened here is more than difficult.
>
> Not sure how to address this.

Yeah, that's really odd. The backup size field is printed like this:

appendStringInfo(&buf, "\"Size\": %zu, ", size);

And parsed like this:

size = strtoul(parse->size, &ep, 10);
if (*ep)
json_manifest_parse_failure(parse->context,

"file size is not an integer");

I confess to bafflement -- how could the output of the first fail to
be parsed by the second? The manifest has to look pretty much valid in
order not to error out before it gets to this check, with just that
one field corrupted. But I don't understand how that could happen.

I agree that the error reporting could be better here, but it didn't
seem worth spending time on when I wrote the code. I figured the only
way we could end up with something like "Size": "Walrus" is if the
user was messing with us on purpose. Apparently that's not so, yet the
mechanism eludes me. Or maybe it's not some random string, but is
something like an empty string or a number with trailing garbage or a
number that's out of range. But I don't see how any of those things
can happen either.

Maybe you should adjust your patch to dump the manifests into the log
file with note(). Then when cfbot runs on it you can see exactly what
the raw file looks like. Although I wonder if it's possible that the
manifest itself is OK, but somehow it gets garbled when uploaded to
the server, either because the client code that sends it or the server
code that receives it does something that isn't safe in 32-bit mode.
If we hypothesize an off-by-one error or a buffer overrun, that could
possibly explain how one field got garbled while the rest of the file
is OK.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2024-01-24 18:13:24 Re: index prefetching
Previous Message Tomas Vondra 2024-01-24 18:08:12 Re: index prefetching