Re: parallel pg_restore - WIP patch

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Russell Smith <mr-russ(at)pws(dot)com(dot)au>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: parallel pg_restore - WIP patch
Date: 2008-09-26 12:29:04
Message-ID: 48DCD590.4000207@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Russell Smith wrote:
> Hi,
>
> As I'm interested in this topic, I thought I'd take a look at the
> patch. I have no capability to test it on high end hardware but did
> some basic testing on my workstation and basic review of the patch.
>
> I somehow had the impression that instead of creating a new connection
> for each restore item we would create the processes at the start and
> then send them the dumpId's they should be restoring. That would allow
> the controller to batch dumpId's together and expect the worker to
> process them in a transaction. But this is probably just an idea I
> created in my head.
>

Yes it is. To do that I would have to invent a protocol for talking to
the workers, etc, and there is not the slightest chance I would get that
done by November.
And I don't see the virtue in processing them all in a transaction. I've
provided a much simpler means of avoiding WAL logging of the COPY.

> Do we know why we experience "tuple concurrently updated" errors if we
> spawn thread too fast?
>

No. That's an open item.
> I completed some test restores using the pg_restore from head with the
> patch applied. The dump was a custom dump created with pg 8.2 and
> restored to an 8.2 database. To confirm this would work, I completed a
> restore using the standard single threaded mode. The schema restore
> successfully. The only errors reported involved non-existent roles.
>
> When I attempt to restore using parallel restore I get out of memory
> errors reported from _PrintData. The code returning the error is;
>
> _PrintData(...
> while (blkLen != 0)
> {
> if (blkLen + 1 > ctx->inSize)
> {
> free(ctx->zlibIn);
> ctx->zlibIn = NULL;
> ctx->zlibIn = (char *) malloc(blkLen + 1);
> if (!ctx->zlibIn)
> die_horribly(AH, modulename, " out of memory\n");
>
> ctx->inSize = blkLen + 1;
> in = ctx->zlibIn;
> }
>
>
> It appears from my debugging and looking at the code that in _PrintData;
> lclContext *ctx = (lclContext *) AH->formatData;
>
> the memory context is shared across all threads. Which means that it's
> possible the memory contexts are stomping on each other. My GDB skills
> are now up to being able to reproduce this in a gdb session as there are
> forks going on all over the place. And if you process them in a serial
> fashion, there aren't any errors. I'm not sure of the fix for this.
> But in a parallel environment it doesn't seem possible to store the
> memory context in the AH.
>

There are no threads, hence nothing is shared. fork() create s new
process, not a new thread, and all they share are file descriptors.

> I also receive messages saying "pg_restore: [custom archiver] could not
> read from input file: end of file". I have not investigated these
> further as my current guess is they are linked to the out of memory error.
>
> Given I ran into this error at my first testing attempt I haven't
> evaluated much else at this point in time. Now all this could be
> because I'm using the 8.2 archive, but it works fine in single restore
> mode. The dump file is about 400M compressed and an entire archive
> schema was removed from the restore path with a custom restore list.
>
> Command line used; PGPORT=5432 ./pg_restore -h /var/run/postgresql -m4
> --truncate-before-load -v -d tt2 -L tt.list
> /home/mr-russ/pg-index-test/timetable.pgdump 2> log.txt
>
> I've attached the log.txt file so you can review the errors that I saw.
> I have adjusted the "out of memory" error to include a number to work
> out which one was being triggered. So you'll see "5 out of memory" in
> the log file, which corresponds to the code above.
>

However, there does seem to be something odd happening with the
compression lib, which I will investigate. Thanks for the report.

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message D'Arcy J.M. Cain 2008-09-26 12:37:20 Re: Proposal: new border setting in psql
Previous Message Simon Riggs 2008-09-26 11:42:06 Re: [PATCHES] Infrastructure changes for recovery