Re: EXTERNAL storage and substring on long strings

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Scott Cain <cain(at)cshl(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: EXTERNAL storage and substring on long strings
Date: 2003-08-06 20:36:31
Message-ID: 7896.1060202191@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance pgsql-sql

Scott Cain <cain(at)cshl(dot)org> writes:
> A few days ago, I asked for advice on speeding up substring queries on
> the GENERAL mailing list. Joe Conway helpfully pointed out the ALTER
> TABLE STORAGE EXTERNAL documentation. After doing the alter,
> the queries got slower! Here is the background:

Ah-hah, I've sussed it ... you didn't actually change the storage
representation. You wrote:

> Now, I'll change the storage:
> alter table feature alter column residues set storage external;
> To make sure that really happens, I run an update on feature:
> update feature set residues = residues where feature_id<8;
> and then VACUUM ANALYZE again.

This sounds good --- in fact, I think we all just accepted it when we
read it --- but in fact *that update didn't decompress the toasted data*.
The tuple toaster sees that the same toasted value is being stored back
into the row, and so it just re-uses the existing toasted data; it does
not stop to notice that the column storage preference has changed.

To actually get the storage to change, you need to feed the value
through some function or operator that will decompress it. Then it
won't get recompressed when it's stored. One easy way (since this
is a text column) is

update feature set residues = residues || '' where feature_id<8;

To verify that something really happened, try doing VACUUM VERBOSE on
the table before and after. The quoted number of tuples in the toast
table should rise substantially.

I did the following comparisons on the test data you made available,
using two tables in which one has default storage and one has "external"
(not compressed) storage:

scott=# \timing
Timing is on.
scott=# select length (dna) from edna;
length
-----------
245203899
(1 row)

Time: 1.05 ms
scott=# select length (dna) from ddna;
length
-----------
245203899
(1 row)

Time: 1.11 ms
scott=# select length(substring(dna from 1000000 for 20000)) from edna;
length
--------
20000
(1 row)

Time: 30.43 ms
scott=# select length(substring(dna from 1000000 for 20000)) from ddna;
length
--------
20000
(1 row)

Time: 37383.02 ms
scott=#

So it looks like the external-storage optimization for substring() does
work as expected, once you get the data into the right format ...

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message scott.marlowe 2003-08-06 20:37:34 Re: ext3 block size
Previous Message Wilson A. Galafassi Jr. 2003-08-06 20:34:39 ext3 block size

Browse pgsql-sql by date

  From Date Subject
Next Message Joe Conway 2003-08-06 20:51:13 Re: EXTERNAL storage and substring on long strings
Previous Message Andrew Droffner 2003-08-06 18:34:33 Re: User-defined SQL function has slower query on 7.3.3 than