Trying to figure out why these queries are so slow

From: Tim Uckun <timuckun(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Trying to figure out why these queries are so slow
Date: 2010-10-05 09:20:41
Message-ID: AANLkTim-BRRttUjWBDQ1_3gXztp_8RhYkhK7HBbmL6wp@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have two tables. Table C has about 300K records in it. Table E has
about a million records in it. Today I tried to run this query.

update C
set result_count = X.result_count
from C
inner join (select c_id, count(c_id) as result_count
from E
where c_id is not null
group by c_id) as X
on C.id = X.c_id

All the fields mentioned are indexed. In the case of Table C it's the
primary key. In the case table E it's just an index (non unique).

I let this query run for about three hours before I cancelled it.

Next I did a CREATE TABLE of the sub query thinking the subquery was
the problem I extracted the results into a separate table and in that
table made the c_id field the primary key. That table only had about
80 thousand records. The creation of the table took just a couple of
minutes. Perfectly reasonable.

I tried to run the same thing again but this time joining the newly
created table instead of the subquery. I killed that query after an
hour.

Then I tried to be clever about it so I added a

Where id in (select id from C where result_count =0 limit 10) into
the where clause. I figure this would run fast because it's only
updating ten records. The result_count field is indexed so that part
should run very quick and the where ID IN should be quick because it's
looking up ten items in the PKEY right? Well that didn't go very well
either. I killed that query after about 15 minutes.

In the end (once again) I wrote a ruby script to pull up the records
from table E in small chunks and pull the corresponding records
records from C in small chunks and update them myself. In other words
I wrote a ruby script to do the join and the update in chunks.

I am simply baffled. What am I doing wrong. Is the database not
tuned? Do I not have enough RAM? Is there some parameter I need to
tweak someplace?

There was nothing else going on in the database when this query was
running. I shut off all applications touching the database.

The database machine is a linode host. It's got two gigs of RAM I
realize that this is not the ideal host for a database but it's what I
have to work with.

kernel.shmmax=536870912 , kernel.shmall=2097152, shared_buffers =
128MB effective_cache_size = 256MB log_min_duration_statement = 1000
max_connections = 100

This was a case of doing an update on about 80,000 records using an
inner join on two tables on using the primary keys of both records.
Why would this query take more than a few seconds to run?

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Thomas Guettler 2010-10-05 09:33:09 Re: Trying to figure out why these queries are so slow
Previous Message Simon Riggs 2010-10-05 09:08:00 Re: streaming replication question