Quick Links

Re: How to insert a bulk of data with unique-violations very fast

From:	"Pierre C" <lists(at)peufeu(dot)com>
To:	Torsten Zühlsdorff <foo(at)meisterderspiele(dot)de>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: How to insert a bulk of data with unique-violations very fast
Date:	2010-06-09 10:51:08
Message-ID:	op.vd04fifueorkce@apollo13
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

>>> Within the data to import most rows have 20 till 50 duplicates.
>>> Sometime much more, sometimes less.
>> In that case (source data has lots of redundancy), after importing the
>> data chunks in parallel, you can run a first pass of de-duplication on
>> the chunks, also in parallel, something like :
>> CREATE TEMP TABLE foo_1_dedup AS SELECT DISTINCT * FROM foo_1;
>> or you could compute some aggregates, counts, etc. Same as before, no
>> WAL needed, and you can use all your cores in parallel.
>> From what you say this should reduce the size of your imported data
>> by a lot (and hence the time spent in the non-parallel operation).
>
> Thank you very much for this advice. I've tried it inanother project
> with similar import-problems. This really speed the import up.

Glad it was useful ;)

In response to

Re: How to insert a bulk of data with unique-violations very fast at 2010-06-09 07:45:46 from Torsten Zühlsdorff

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Max Williams	2010-06-09 10:56:16	Large (almost 50%!) performance drop after upgrading to 8.4.4?
Previous Message	Torsten Zühlsdorff	2010-06-09 07:45:46	Re: How to insert a bulk of data with unique-violations very fast