Skip site navigation (1) Skip section navigation (2)

Slow subquery on large dataset

From: bob_bamber(at)hotmail(dot)com (Bob)
To: pgsql-novice(at)postgresql(dot)org
Subject: Slow subquery on large dataset
Date: 2004-02-19 22:07:10
Message-ID: b598ae2.0402191407.c1548ea@posting.google.com (view raw or flat)
Thread:
Lists: pgsql-novice
Hi,

I'm having some performance issues when querying a couple of tables
containing a large amount of data.

Here's the schema:

CREATE TABLE capacity_data (
  data_id SERIAL,
  data TEXT,
  modified TIMESTAMPTZ DEFAULT NOW(),
  modified_by INTEGER NOT NULL,
  CONSTRAINT capacity_data_pk PRIMARY KEY (data_id),
  CONSTRAINT capacity_data_modified_by_fk FOREIGN KEY (modified_by)
REFERENCES editors(editor_id)
);

CREATE TABLE capacities (
CREATE TABLE capacities (
  room_id BIGINT NOT NULL,
  capacity_type_id BIGINT NOT NULL,
  data_id BIGINT NOT NULL,
  modified TIMESTAMPTZ DEFAULT NOW(),
  modified_by INTEGER NOT NULL,
  CONSTRAINT capacities_pk PRIMARY KEY (room_id, data_id),
  CONSTRAINT capacities_room_id_fk FOREIGN KEY (room_id) REFERENCES
meeting_rooms(room_id) ON DELETE CASCADE,
  CONSTRAINT capacities_capacity_type_id_fk FOREIGN KEY
(capacity_type_id) REFERENCES capacity_types(capacity_type_id) ON
DELETE CASCADE,
  CONSTRAINT capacities_data_id_fk FOREIGN KEY (data_id) REFERENCES
capacity_data(data_id) ON DELETE CASCADE,
  CONSTRAINT capacities_modified_by_fk FOREIGN KEY (modified_by)
REFERENCES editors(editor_id)
);  data_id BIGINT NOT NULL,
  modified TIMESTAMPTZ DEFAULT NOW(),
  modified_by INTEGER NOT NULL,
  CONSTRAINT capacities_pk PRIMARY KEY (room_id, data_id),
  CONSTRAINT capacities_room_id_fk FOREIGN KEY (room_id) REFERENCES
meeting_rooms(room_id) ON DELETE CASCADE,
  CONSTRAINT capacities_capacity_type_id_fk FOREIGN KEY
(capacity_type_id) REFERENCES capacity_types(capacity_type_id) ON
DELETE CASCADE,
  CONSTRAINT capacities_data_id_fk FOREIGN KEY (data_id) REFERENCES
capacity_data(data_id) ON DELETE CASCADE,
  CONSTRAINT capacities_modified_by_fk FOREIGN KEY (modified_by)
REFERENCES editors(editor_id)
);

I'm using a subquery to find all the capacity_data.data_id's that are
not in capacities:

foo=# SELECT data_id FROM capacity_data WHERE data_id NOT IN (SELECT
data_id FROM capacities);

However, I have over 15,000 records in capacity_data.  Here is the
query plan:

                               QUERY PLAN
------------------------------------------------------------------------
 Seq Scan on capacity_data  (cost=0.00..2086295.56 rows=7538 width=4)
   Filter: (subplan)
   SubPlan
     ->  Seq Scan on capacities  (cost=0.00..276.75 rows=15075
width=8)
(4 rows)

A little on the slow side!  I have indexes on data_id in both tables
(in capacity_data it's the primary key) how can I use them to quickly
acheive what I want?

Thanks in advance,

Bob.

Responses

pgsql-novice by date

Next:From: kynnDate: 2004-02-20 00:23:29
Subject: psql hangs after "drop table ..."
Previous:From: Joe ConwayDate: 2004-02-19 18:58:24
Subject: Re: crosstabs

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group