Re: SELECT INTO large FKyed table is slow

From: Mario Splivalo <mario(dot)splivalo(at)megafon(dot)hr>
To: Pierre C <lists(at)peufeu(dot)com>
Cc: mark(dot)kirkwood(at)catalyst(dot)net(dot)nz, pgsql-performance(at)postgresql(dot)org
Subject: Re: SELECT INTO large FKyed table is slow
Date: 2010-11-30 23:24:32
Message-ID: 4CF587B0.5020003@megafon.hr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 11/29/2010 05:53 PM, Pierre C wrote:
>
>> Yes, since (sample_id, drone_id) is primary key, postgres created
>> composite index on those columns. Are you suggesting I add two more
>> indexes, one for drone_id and one for sample_id?
>
> (sample_id,drone_id) covers sample_id but if you make searches on
> drone_id alone it is likely to be very slow since you got a large number
> of sample_ids. Postgres can use any column of a multicolumn index but it
> is only interesting performance-wise if the cardinality of the first
> (ignored) columns is low. If you often make searches on drone_id, create
> an index. But this isn't what is slowing your foreign key checks.

Again, you have a point there. When I get to SELECTs to the history
table I'll be doing most of the filtering on the drone_id (but also on
sample_id, because I'll seldom drill all the way back in time, I'll be
interested in just some periods), so I'll take this into consideration.

But, as you've said, that's not what it's slowing my FK checks.

>
>>> Also it would be worthwhile for you to post the output of:
>>>
>>> EXPLAIN ANALYZE INSERT INTO drones_history (sample_id, drone_id,
>>> drone_log_notice, drone_temperature, drone_pressure)
>>> SELECT * FROM tmpUpdate;
>>>
>>> to the list, so we can see what is taking the time.
>>
>> Is there a way to do so inside plpgsql function?
>>
>> I can recreate the whole process within psql and then post the explain
>> analyze, it would just take me some time to do so. I'll post as soon
>> as I'm done.
>
> Yes, this would be interesting.

So, I did. I run the whole script in psql, and here is the result for
the INSERT:

realm_51=# explain analyze INSERT INTO drones_history (2771, drone_id,
drone_log_notice, drone_temperature, drone_pressure) SELECT * FROM
tmp_drones_history;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
Seq Scan on tmp_drones_history (cost=0.00..81.60 rows=4160 width=48)
(actual time=0.008..5.296 rows=5150 loops=1)
Trigger for constraint drones_history_fk__drones: time=92.948 calls=5150
Total runtime: 16779.644 ms
(3 rows)

Now, this is only 16 seconds. In this 'batch' I've inserted 5150 rows.
The batch before, I run that one 'the usual way', it inserted 9922 rows,
and it took 1 minute and 16 seconds.

I did not, however, enclose the process into begin/end.

So, here are results when I, in psql, first issued BEGIN:

realm_51=# explain analyze INSERT INTO drones_history (2772, drone_id,
drone_log_notice, drone_temperature, drone_pressure) SELECT * FROM
tmp_drones_history;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------
Seq Scan on tmp_drones_history (cost=0.00..79.56 rows=4056 width=48)
(actual time=0.008..6.490 rows=5059 loops=1)
Trigger for constraint drones_history_fk__drones: time=120.224 calls=5059
Total runtime: 39658.250 ms
(3 rows)

Time: 39658.906 ms

Mario

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Kirkwood 2010-11-30 23:50:29 Re: SELECT INTO large FKyed table is slow
Previous Message T.H. 2010-11-30 23:23:56 Re: Question about subselect/IN performance