Re: WIP parallel restore patch

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP parallel restore patch
Date: 2008-11-20 16:55:51
Message-ID: 49259697.20905@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Magnus Hagander wrote:
> Andrew Dunstan wrote:
>
>> Attached is my latest parallel restore patch. I think it's functionally
>> complete for Unix.
>>
>> Many bugs have been fixed since the last patch, and the hardcoded
>> limitation to two table dependencies is removed. It seems fairly robust
>> in my recent testing.
>>
>> Remaining to be done:
>>
>> . code cleanup
>> . better error checking in a few places
>> . final decision re command line option names/defaults
>> . documentation
>> . Windows support.
>>
>
> I've looked around this a bit, and it's fairly clear where the issue
> comes in with Windows - we get heap corruption. Most likely because we
> have multiple threads working on the same data somewhere.
>
> I notice for example that we're doing a shallow copy of the
> ArchiveHandle with a simple memcpy() for each thread. But that struct
> contains a number of things like file descriptors and pointers. Have you
> verified for each and every one of those that it actually doesn't get
> modified anywhere? If not, a deep copy may be needed to make sure of that.
>
> Other than that, are there any global variables that may be addressed
> from more than one worker? If so they need to be marked as TLS.
>
> And yes, I got hit by the lack of error checking a couple of times
> during my testing - it would probably be a good idea to add that as soon
> as possible, it helps a lot with the debugging.
>
> If I run it with just a single thread, it also crashes in PQfinish()
> called from die_horribly(), when trying to free conn->pgpass, which has
> a value (non-NULL) but is not a valid pointer. This crash happens in the
> worker thread, after it has logged that "fseek is required" - that's an
> indicator something being passed down to the thread is either wrong or
> being scribbled upon after the fact.
>
> I didn't dig into these questions specifically - since you have already
> been reading up on this code to do the patch you can probably reach the
> answer to them much quicker :-) So I'll stick to the questions.
>
>
>

OK, Thanks, this will help. I thought I had caught the ArchiveHandle
things, but there might be one or two I missed.

I'll try to have a new version in a few days.

cheers

andrew

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-11-20 17:02:46 Re: Problem with Bitmap Heap Scan
Previous Message Simon Riggs 2008-11-20 16:14:01 Re: Hot Standby (commit fest version - v5)