How are text columns stored?

From: Meetesh Karia <meetesh(dot)karia(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: How are text columns stored?
Date: 2005-06-23 10:40:15
Message-ID: fc5b04ca050623034029b7e532@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi all,

I'm running PG 8.0.3 on WinXP and I'm coming across some performance issues
related to text columns. Basically, it appears as though PG is storing the
text data inline with the rest of the row data as queries that don't touch
the text column are slower when there is data in the text column than when
there isn't. According to section 8.3 of the doc:

"Long values are also stored in background tables so they do not interfere
with rapid access to the shorter column values."

So, how long does a value have to be to be considered "long"?

If necessary, here is some more specific information about what I'm doing:

1) I create a new table and use 'COPY FROM' to populate it. When the data in
the text column is limited to a max of 60 characters, this part takes 2-3
minutes less than when the data is at its full size. The table will be
populated with ~750k rows. Here's an example of the table I create (no, I
didn't name the fields "vc_field1", "vc_field2", etc ;) ):

create table my_table_import
(
vc_field1 varchar(255),
vc_field2 varchar(255),
vc_field3 varchar(255),
f_field1 float8,
text_field1 text,
ts_field1 timestamp,
v_field4 varchar(255),
i_field1 int8,
i_field2 int8
);

2) I populate i_field1 and i_field2 from lookup tables. This step takes
about 7 mins longer with the full text data than with the limited data.

update my_table_import
set i_field1 = f.i_field1,
i_field2 = u.i_field2
from lookup1 as f, lookup2 as u
where vc_field2 = f.vc_field2
and vc_field1 = u.vc_field1;

3) I then create an index on this table and run a couple of queries on it.
Each of these queries takes about 10 minutes longer with the full text data
then without it. Here's the index that I create and an example of one of the
queries that I run:

create index idx_my_table_import_i1_i2 on my_table_import (i_field1,
i_field2);
analyze my_table_import;

select i_field1, i_field2, max(ts_field1) as ts_field1, count(*) as
dup_count
from my_table_import
where i_field1 between 0 and 9999
group by i_field1, i_field2

Thanks for the help,
Meetesh Karia

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Keith Worthington 2005-06-23 12:16:22 Re: Performance Tuning Article
Previous Message Chun Yit(Chronos) 2005-06-23 09:56:52 Postgresql7.4.5 running slow on plpgsql function