Skip site navigation (1) Skip section navigation (2)

EXTERNAL storage and substring on long strings

From: Scott Cain <cain(at)cshl(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: EXTERNAL storage and substring on long strings
Date: 2003-07-31 19:26:40
Message-ID: 1059679600.1429.41.camel@localhost.localdomain (view raw or flat)
Thread:
Lists: pgsql-performancepgsql-sql
Hello,

A few days ago, I asked for advice on speeding up substring queries on
the GENERAL mailing list.  Joe Conway helpfully pointed out the ALTER
TABLE STORAGE EXTERNAL documentation.  After doing the alter,
the queries got slower!  Here is the background:

A freshly loaded database is VACUUM ANALYZEd and I run this query:

explain analyze select substring(residues from 1000000 for 20000)
from feature where feature_id=1;

where feature is a table with ~3 million rows, and residues is a text
column, where for the majority of the rows of feature, it is null, for a
large minority, it is shortish strings (a few thousand characters), and
for 6 rows, residues contains very long strings (~20 million characters
(it's chromosome DNA sequence from fruit flies)).

Here's the result from the ANALYZE:
 Index Scan using feature_pkey on feature  (cost=0.00..3.01 rows=1
width=152) (actual time=388.88..388.89 rows=1 loops=1)
   Index Cond: (feature_id = 1)
 Total runtime: 389.00 msec
(3 rows)

Now, I'll change the storage:

alter table feature alter column residues set storage external;

To make sure that really happens, I run an update on feature:

update feature set residues = residues where feature_id<8;

and then VACUUM ANALYZE again.  I run the same EXPLAIN ANALYZE query as
above and get this output:

 Index Scan using feature_pkey on feature  (cost=0.00..3.01 rows=1
width=153) (actual time=954.13..954.14 rows=1 loops=1)
   Index Cond: (feature_id = 1)
 Total runtime: 954.26 msec
(3 rows)

Whoa!  That's not what I expected, the time to do the query got more
that twice as long.  So I think, maybe it was just an unlucky section,
and overall performance will be much better.  So I write a perl script
to do substring queries over all of my chromosomes at various positions
and lengths (20,000 queries total).  For comparison, I also ran the same
script, extracting the chromosomes via sql and doing the substring in
perl.  Here's what happened:

substr in perl       0.014sec/query
EXTENDED storage     0.0052sec/query
default storage      0.0040sec/query

So, what am I missing?  Why doesn't EXTENDED storage improve substring
performance as it says it should in
http://www.postgresql.org/docs/7.3/interactive/sql-altertable.html ?

I am using an IDE drive on a laptop, running Postgresql 7.3.2 on RedHat
Linux 7.3 with 512M RAM.

Thanks,
Scott

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain(at)cshl(dot)org
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2003-07-31 19:44:57
Subject: Re: EXTERNAL storage and substring on long strings
Previous:From: Manfred KoizarDate: 2003-07-31 19:13:07
Subject: Re: Help on my database performance

pgsql-sql by date

Next:From: Tom LaneDate: 2003-07-31 19:44:57
Subject: Re: EXTERNAL storage and substring on long strings
Previous:From: Charles HauserDate: 2003-07-31 19:18:49
Subject: RI_ConstraintTrigger_11264756

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group