Russell Smith wrote:
> As I'm interested in this topic, I thought I'd take a look at the
> patch. I have no capability to test it on high end hardware but did
> some basic testing on my workstation and basic review of the patch.
> I somehow had the impression that instead of creating a new connection
> for each restore item we would create the processes at the start and
> then send them the dumpId's they should be restoring. That would allow
> the controller to batch dumpId's together and expect the worker to
> process them in a transaction. But this is probably just an idea I
> created in my head.
Yes it is. To do that I would have to invent a protocol for talking to
the workers, etc, and there is not the slightest chance I would get that
done by November.
And I don't see the virtue in processing them all in a transaction. I've
provided a much simpler means of avoiding WAL logging of the COPY.
> Do we know why we experience "tuple concurrently updated" errors if we
> spawn thread too fast?
No. That's an open item.
> I completed some test restores using the pg_restore from head with the
> patch applied. The dump was a custom dump created with pg 8.2 and
> restored to an 8.2 database. To confirm this would work, I completed a
> restore using the standard single threaded mode. The schema restore
> successfully. The only errors reported involved non-existent roles.
> When I attempt to restore using parallel restore I get out of memory
> errors reported from _PrintData. The code returning the error is;
> while (blkLen != 0)
> if (blkLen + 1 > ctx->inSize)
> ctx->zlibIn = NULL;
> ctx->zlibIn = (char *) malloc(blkLen + 1);
> if (!ctx->zlibIn)
> die_horribly(AH, modulename, " out of memory\n");
> ctx->inSize = blkLen + 1;
> in = ctx->zlibIn;
> It appears from my debugging and looking at the code that in _PrintData;
> lclContext *ctx = (lclContext *) AH->formatData;
> the memory context is shared across all threads. Which means that it's
> possible the memory contexts are stomping on each other. My GDB skills
> are now up to being able to reproduce this in a gdb session as there are
> forks going on all over the place. And if you process them in a serial
> fashion, there aren't any errors. I'm not sure of the fix for this.
> But in a parallel environment it doesn't seem possible to store the
> memory context in the AH.
There are no threads, hence nothing is shared. fork() create s new
process, not a new thread, and all they share are file descriptors.
> I also receive messages saying "pg_restore: [custom archiver] could not
> read from input file: end of file". I have not investigated these
> further as my current guess is they are linked to the out of memory error.
> Given I ran into this error at my first testing attempt I haven't
> evaluated much else at this point in time. Now all this could be
> because I'm using the 8.2 archive, but it works fine in single restore
> mode. The dump file is about 400M compressed and an entire archive
> schema was removed from the restore path with a custom restore list.
> Command line used; PGPORT=5432 ./pg_restore -h /var/run/postgresql -m4
> --truncate-before-load -v -d tt2 -L tt.list
> /home/mr-russ/pg-index-test/timetable.pgdump 2> log.txt
> I've attached the log.txt file so you can review the errors that I saw.
> I have adjusted the "out of memory" error to include a number to work
> out which one was being triggered. So you'll see "5 out of memory" in
> the log file, which corresponds to the code above.
However, there does seem to be something odd happening with the
compression lib, which I will investigate. Thanks for the report.
In response to
pgsql-hackers by date
|Next:||From: D'Arcy J.M. Cain||Date: 2008-09-26 12:37:20|
|Subject: Re: Proposal: new border setting in psql|
|Previous:||From: Simon Riggs||Date: 2008-09-26 11:42:06|
|Subject: Re: [PATCHES] Infrastructure changes for recovery|