Skip site navigation (1) Skip section navigation (2)

Re: Comparative performance

From: Joe <svn(at)freedomcircle(dot)net>
To: Gavin Sherry <swm(at)alcove(dot)com(dot)au>
Cc: Magnus Hagander <mha(at)sollentuna(dot)net>,pgsql-performance(at)postgresql(dot)org
Subject: Re: Comparative performance
Date: 2005-09-29 12:44:16
Message-ID: 433BE1A0.5000807@freedomcircle.net (view raw or flat)
Thread:
Lists: pgsql-performance
Gavin Sherry wrote:
> Please post the table definitions, queries and explain analyze results so
> we can tell you why the performance is poor.

I did try to post that last night but apparently my reply didn't make it to the 
list.  Here it is again:

Matthew Nuzum wrote:

 > This is the right list. Post detail and I'm sure you'll get some suggestions.


Thanks, Matthew (and Chris and Gavin).

The main table used in the query is defined as follows:

CREATE TABLE entry (
   entry_id serial PRIMARY KEY,
   title VARCHAR(128) NOT NULL,
   subtitle VARCHAR(128),
   subject_type SMALLINT,
   subject_id INTEGER REFERENCES topic,
   actor_type SMALLINT,
   actor_id INTEGER REFERENCES topic,
   actor VARCHAR(64),
   actor_role VARCHAR(64),
   rel_entry_id INTEGER,
   rel_entry VARCHAR(64),
   description VARCHAR(255),
   quote text,
   url VARCHAR(255),
   entry_date CHAR(10),
   created DATE NOT NULL DEFAULT CURRENT_DATE,
   updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP)
WITHOUT OIDS;
CREATE INDEX entry_actor_id ON entry (actor_id);
CREATE INDEX entry_subject_id ON entry (subject_id);

It has 3422 rows at this time.

The query for one of the pages is the following:

SELECT entry_id, subject_type AS type, subject_type, subject_id, actor_type, 
actor_id, actor, actor_role, rel_entry_id, rel_entry, title, subtitle, 
description, url, quote AS main_quote, NULL AS rel_quote, substring(entry_date 
from 8) AS dom, substring(entry_date from 1) AS date_ymd, substring(entry_date 
from 1 for 7) AS date_ym, substring(entry_date from 1 for 4) AS date_y, created, 
updated FROM entry WHERE subject_id = 1079
UNION SELECT entry_id, actor_type AS type, subject_type, subject_id, actor_type, 
actor_id, actor, actor_role, rel_entry_id, rel_entry, title, subtitle, 
description, url, quote AS main_quote, NULL AS rel_quote, substring(entry_date 
from 8) AS dom, substring(entry_date from 1) AS date_ymd, substring(entry_date 
from 1 for 7) AS date_ym, substring(entry_date from 1 for 4) AS date_y, created, 
updated FROM entry WHERE actor_id = 1079 ORDER BY type, title, subtitle;

The output of EXPLAIN ANALYZE is:

  Sort  (cost=158.98..159.14 rows=62 width=568) (actual time=16.000..16.000 
rows=59 loops=1)
    Sort Key: "type", title, subtitle
    ->  Unique  (cost=153.57..157.14 rows=62 width=568) (actual 
time=16.000..16.000 rows=59 loops=1)
          ->  Sort  (cost=153.57..153.73 rows=62 width=568) (actual 
time=16.000..16.000 rows=59 loops=1)
                Sort Key: entry_id, "type", subject_type, subject_id, 
actor_type, actor_id, actor, actor_role, rel_entry_id, rel_entry, title, 
subtitle, description, url, main_quote, rel_quote, dom, date_ymd, date_ym, 
date_y, created, updated
                ->  Append  (cost=0.00..151.73 rows=62 width=568) (actual 
time=0.000..16.000 rows=59 loops=1)
                      ->  Subquery Scan "*SELECT* 1"  (cost=0.00..17.21 rows=4 
width=568) (actual time=0.000..0.000 rows=3 loops=1)
                            ->  Index Scan using entry_subject_id on entry 
(cost=0.00..17.17 rows=4 width=568) (actual time=0.000..0.000 rows=3 loops=1)
                                  Index Cond: (subject_id = 1079)
                      ->  Subquery Scan "*SELECT* 2"  (cost=0.00..134.52 rows=58 
width=568) (actual time=0.000..16.000 rows=56 loops=1)
                            ->  Seq Scan on entry  (cost=0.00..133.94 rows=58 
width=568) (actual time=0.000..16.000 rows=56 loops=1)
                                  Filter: (actor_id = 1079)
  Total runtime: 16.000 ms
(13 rows)

What I don't quite understand is why it's doing a sequential scan on actor_id 
instead of using the entry_actor_id index.  Note that actor_id has 928 non-null 
values (27%), whereas subject_id has 3089 non-null values (90%).

Note that the entry_date column was originally a MySQL date but it had partial 
dates, i.e., some days and months are set to zero.  Eventually I hope to define 
a PostgreSQL datatype for it and to simplify the substring retrievals.  However, 
I don't think the extra computational time should affect the overall runtime 
significantly.

Gavin, I'm using PostgreSQL 8.0.3, Apache 1.3.28, PHP 4.3.4, MySQL 4.0.16 and 
I'm comparing both databases on XP (on a Pentium 4, 1.6 GHz, 256 MB RAM).

Thanks for any feedback.

Joe


In response to

Responses

pgsql-performance by date

Next:From: R, Rajesh (STSD)Date: 2005-09-29 13:05:14
Subject: Query in SQL statement
Previous:From: JoeDate: 2005-09-29 12:37:07
Subject: Re: Comparative performance

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group