Re: GSOC'17 project introduction: Parallel COPY execution with errors handling

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Alexey Kondratov <kondratov(dot)aleksey(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GSOC'17 project introduction: Parallel COPY execution with errors handling
Date: 2017-03-23 12:51:55
Message-ID: CAFj8pRCc6JCYQVHFn8pJswR8OwVL61UJPsOHBA==WK2X_PAdMA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>> 1) Is there anyone out of PG comunity who will be interested in such
>> project and can be a menthor?
>> 2) These two points have a general idea – to simplify work with a large
>> amount of data from a different sources, but mybe it would be better to
>> focus on the single task?
>>
>
> I spent lot of time on implementation @1 - maybe I found somewhere a
> patch. Both tasks has some common - you have to divide import to more
> batches.
>

Patch is in /dev/null :( - My implementation was based on subtransactions
for 1000 rows. When some checks fails, then I throw subtransaction and I
imported every row from block in own subtransaction. It was a prototype - I
didn't search some smarter implementation.

>
>
>
>> 3) Is it realistic to mostly finish both parts during the 3+ months of
>> almost full-time work or I am too presumptuous?
>>
>
> It is possible, I am thinking - I am not sure about all possible details,
> but basic implementation can be done in 3 months.
>

Some data, some check depends on order - it can be a problem in parallel
processing - you should to define corner cases.

>
>
>>
>> I will be very appreciate to any comments and criticism.
>>
>>
>> P.S. I know about very interesting ready projects from the PG's comunity
>> https://wiki.postgresql.org/wiki/GSoC_2017, but it always more
>> interesting to solve your own problems, issues and questions, which are the
>> product of you experience with software. That's why I dare to propose my
>> own project.
>>
>> P.P.S. A few words about me: I'm a PhD stident in Theoretical physics
>> from Moscow, Russia, and highly involved in software development since
>> 2010. I guess that I have good skills in Python, Ruby, JavaScript, MATLAB,
>> C, Fortran development and basic understanding of algorithms design and
>> analysis.
>>
>>
>> Best regards,
>>
>> Alexey
>>
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2017-03-23 12:53:14 Re: GSOC'17 project introduction: Parallel COPY execution with errors handling
Previous Message Ashutosh Bapat 2017-03-23 12:48:59 Re: Partition-wise join for join between (declaratively) partitioned tables