From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
---|---|
To: | David Gilman <davidgilman1(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)gmail(dot)com |
Subject: | Re: Warn when parallel restoring a custom dump without data offsets |
Date: | 2020-05-23 22:47:51 |
Message-ID: | 20200523224751.GP4472@telsasoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, May 23, 2020 at 03:54:30PM -0400, David Gilman wrote:
> I've rounded this patch out with a test and I've set up the commitfest
> website for this thread. The latest patches are attached and I think
> they are ready for review.
Thanks. https://commitfest.postgresql.org/28/2568/
I'm not sure this will be considered a bugfix, since the behavior is known.
Maybe eligible for backpatch though (?)
Your patch was encoded, so this is failing:
http://cfbot.cputube.org/david-gilman.html
Ideally CFBOT would deal with that (maybe by using git-am - adding Thomas), but
I think you sent using gmail web interface, which also reordered the patches.
(CFBOT *does* sort them, but it's a known annoyance).
> dump file was written with data offsets pg_restore can seek directly to
offsets COMMA
> pg_restore would only find the TOC if it happened to be immediately
"immediately" is wrong, no ? I thought the problem was if we seeked to D and
then looked for C, we wouldn't attempt to go backwards.
> read request only when restoring a custom dump file without data offsets.
remove "only"
> of a bunch of extra tiny reads when pg_restore starts up.
I would have thought to mention the seeks() ; but it's true that the read()s now
grow quadratically. I did run a test, but I don't know how many objects would
be unreasonable or how many it'd take to show a problem.
Maybe we should avoid fseeko(0, SEEK_SET) unless we need to wrap around after
EOF - I'm not sure.
Maybe the cleanest way would be to pre-populate a structure with all the TOC
data and loop around that instead of seeking around the file ? Can we use the
same structure as pg_dump ?
Otherwise, that makes me think of commit 42f70cd9c. Make it's not a good
parallel or example for this case, though.
+ The custom archive format may not work with the <option>-j</option>
+ option if the archive was originally created by writing the archive
+ to an unseekable output file. For the best concurrent restoration
Can I suggest something like: pg_restore with parallel jobs may fail if the
archive dump was written to an unseekable output stream, like stdout.
+ * If the input file can't be seeked we're at the mercy of the
seeked COMMA
>Subject: [PATCH 3/3] Add integration test for out-of-order TOC requests in pg_restore
Well done - thanks for that.
>Also add undocumented --disable-seeking argument to pg_dump to emulate
>writing to an unseekable output file.
Remove "also".
Is it possible to dump to stdout (or pipe to cat or dd) to avoid a new option ?
Maybe that would involve changing the test process to use the shell (system() vs
execve()), or maybe you could write:
/* sh handles output redirection and arg splitting */
'sh', '-c', 'pg_dump -Fc -Z6 --no-sync --disable-seeking postgres > $tempdir/defaults_custom_format_no_seek_parallel_restore.dump',
But I think that would need to then separately handle WIN32, so maybe it's not
worth it.
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2020-05-24 02:31:55 | Re: Adding missing object access hook invocations |
Previous Message | David Gilman | 2020-05-23 19:54:30 | Re: Warn when parallel restoring a custom dump without data offsets |