Quick Links

Fwd: Insert performance (OT?)

From:	Yves Vindevogel <yves(dot)vindevogel(at)implements(dot)be>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Fwd: Insert performance (OT?)
Date:	2005-07-19 10:21:08
Message-ID:	940bb98041b7df475b53f98e35c31ce2@implements.be
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

BTW: thank you for the idea

Begin forwarded message:

> From: Yves Vindevogel <yves(dot)vindevogel(at)implements(dot)be>
> Date: Tue 19 Jul 2005 12:20:34 CEST
> To: Richard Huxton <dev(at)archonet(dot)com>
> Subject: Re: [PERFORM] Insert performance (OT?)
>
>
> On 19 Jul 2005, at 11:39, Richard Huxton wrote:
>
>> Yves Vindevogel wrote:
>>> Hi,
>>> Suppose I have a table with 4 fields (f1, f2, f3, f4)
>>> I define 2 unique indexes u1 (f1, f2, f3) and u2 (f1, f2, f4)
>>> I have 3 records
>>> A, B, C, D (this will be inserted)
>>> A, B, C, E (this will pass u2, but not u1, thus not inserted)
>>> A, B, F, D (this will pass u1, but not u2, thus not inserted)
>>
>> Are you saying you want to know whether they will be inserted before
>> you try to do so?
>>
> No, that is not an issue. Problem is that when I use a big query with
> "insert into .. select" and one record is wrong (like above) the
> complete insert query is abandonned.
> Therefore, I must do it another way. Or I must be able to say, insert
> them and dump the rest.
>
>>> Now, for performance ...
>>> I have tables like this with 500.000 records where there's a new
>>> upload of approx. 20.000 records.
>>> It is only now that we say index u2 to be necessary. So, until now,
>>> I did something like insert into ... select f1, f2, f2, max(f4)
>>> group by f1, f2, f3
>>> That is ok ... and also logically ok because of the data definition
>>
>> I'm confused here - assuming you meant "select f1,f2,f3", then I
>> don't see how you guarantee the row doesn't alredy exist.
>>
> No, I meant it with max(f4) because my table has 4 fields. And no, I
> can't guarantee that, that is exactly my problem.
> But with the unique indexes, I'm certain that it will not get into my
> database
>
>>> I cannot do this with 2 group by's. I tried this on paper and I'm
>>> not succeeding.
>>
>> I don't see how you can have two group-by's, or what that would mean
>> if you did.
>>
> select from ( select from group by) as foo group by
>
>>> So, I must use a function that will check against u1 and u2, and
>>> then insert if it is ok.
>>> I know that such a function is way slower that my insert query.
>>
>> So - you have a table, called something like "upload" with 20,000
>> rows and you'd like to know whether it is safe to insert them. Well,
>> it's easy enough to identify which ones are duplicates.
>>
>> SELECT * FROM upload JOIN main_table ON u1=f1 AND u2=f2 AND u3=f3;
>> SELECT * FROM upload JOIN main_table ON u1=f1 AND u2=f2 AND u3=f4;
>>
> That is a good idea. I can delete the ones that would fail my first
> unique index this way, and then delete the ones that would fail my
> second unique index and then upload them.
> Hmm, why did I not think of that myself.
>
>> Are you saying that deleting these rows and then inserting takes too
>> long?
>>
> This goes very fast, but not with a function that checks each record
> one by one.
>
>> --
>> Richard Huxton
>> Archonet Ltd
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 6: explain analyze is your friend
>>
>>
> Met vriendelijke groeten,
> Bien à vous,
> Kind regards,
>
> Yves Vindevogel
> Implements
>

Attachment	Content-Type	Size
Pasted Graphic 2.tiff	image/tiff	5.6 KB
Pasted Graphic 2.tiff	image/tiff	5.6 KB

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Richard Huxton	2005-07-19 10:51:51	Re: Insert performance (OT?)
Previous Message	Richard Huxton	2005-07-19 09:39:07	Re: Insert performance (OT?)