Skip site navigation (1) Skip section navigation (2)

Re: Slow queries on big table

From: "Tyrrill, Ed" <tyrrill_ed(at)emc(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Scott Marlowe" <smarlowe(at)g2switchworks(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Slow queries on big table
Date: 2007-05-18 23:16:20
Message-ID: A23190A408F7094FAF446C1538222F7603EE4416@avaexch01.avamar.com (view raw or flat)
Thread:
Lists: pgsql-performance
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> Thanks for the help guys!  That was my problem.  I actually need the 
>> backup_location_rid index for a different query so I am going to keep

>> it.
>
> Well, you don't really *need* it; the two-column index on (record_id,
> backup_id) will serve perfectly well for queries on its leading column
> alone.  It'll be physically >>bigger and hence slightly slower to scan
> than a single-column index; but unless the table is almost completely
> read-only, the update overhead of maintaining all three indexes is
> probably going to cost more than you can save with it.  Try that other
> query with and without backup_location_rid and see how much you're
> really saving.

Well, the query that got me to add backup_location_rid took 105 minutes
using only the primary key index.  After I added backup_location_rid
the query was down to about 45 minutes.  Still not very good, and I am
still fiddling around with it.  The query is:

mdsdb=# explain analyze select backupobjects.record_id from
backupobjects left outer join backup_location using(record_id) where
backup_id is null;
 
QUERY PLAN

------------------------------------------------------------------------
------------------------------------------------------------------------
-------------------------
 Merge Left Join  (cost=0.00..21408455.06 rows=11790970 width=8) (actual
time=2784967.410..2784967.410 rows=0 loops=1)
   Merge Cond: ("outer".record_id = "inner".record_id)
   Filter: ("inner".backup_id IS NULL)
   ->  Index Scan using backupobjects_pkey on backupobjects
(cost=0.00..443484.31 rows=11790970 width=8) (actual
time=0.073..47865.957 rows=11805996 loops=1)
   ->  Index Scan using backup_location_rid on backup_location
(cost=0.00..20411495.21 rows=162435366 width=12) (actual
time=0.110..2608485.437 rows=162426837 loops=1)
 Total runtime: 2784991.612 ms
(6 rows)

It is of course the same backup_location, but backupobjects is:

mdsdb=# \d backupobjects
               Table "public.backupobjects"
     Column     |            Type             | Modifiers
----------------+-----------------------------+-----------
 record_id      | bigint                      | not null
 dir_record_id  | integer                     |
 name           | text                        |
 extension      | character varying(64)       |
 hash           | character(40)               |
 mtime          | timestamp without time zone |
 size           | bigint                      |
 user_id        | integer                     |
 group_id       | integer                     |
 meta_data_hash | character(40)               |
Indexes:
    "backupobjects_pkey" PRIMARY KEY, btree (record_id)
    "backupobjects_meta_data_hash_key" UNIQUE, btree (meta_data_hash)
    "backupobjects_extension" btree (extension)
    "backupobjects_hash" btree (hash)
    "backupobjects_mtime" btree (mtime)
    "backupobjects_size" btree (size)

record_id has in backupobjects has a many to many relationship to
record_id
in backup_location.

Ed

In response to

pgsql-performance by date

Next:From: Alvaro HerreraDate: 2007-05-18 23:20:52
Subject: Re: 121+ million record table perf problems
Previous:From: C StormDate: 2007-05-18 22:41:24
Subject: Efficient recursion

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group