Performance problem with large resultsets (byte array 2200)

From: "Victor Adolfsson" <victor(at)optimumbiometrics(dot)com>
To: pgsql-general(at)postgresql(dot)org
Cc: "Victor Adolfsson" <victor(at)optimumbiometrics(dot)com>
Subject: Performance problem with large resultsets (byte array 2200)
Date: 2007-07-31 09:56:14
Message-ID: 9a6439440707310256j1474c172g4f0ba96fbb4c30c2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi

I'm having a problem with bad performance when retrieving many rows where
each row has a 2200 long byte array (called template_compressed) and a 50
character varying (called uniqueid)

Is there a better datatype than bytea when it is important to fetch the
information from the database?

Would it be better to store a reference in the database and then store the
2200 big template in a separate file?

What is the suggested size threshold for when it is better to store in file
than in database?

What parts need to be configured and how should they be configured given the
environment described at the bottom of this email?
(short version: DualCore P4 2.8 GHz, 1 GB RAM, 2 x 80 GB SATA RAID1)

What is the expected performance when an application on the same server is
querying the database and doing data retrieval with this large recordset?

Are the use of cursors recommended when dealing with large recordsets?
(currently cursors are not used)? Would the performance differ and if so, in
what direction?

How can i decrease the difference in duration between the explain analyze
time and the duration when actually retrieving the data from postgresql to
the application?

When I do an explain analyze on the same query I get Total runtime:
3306.216 ms. (explain analyze output is available below)

This is the output from the postgresql-log file after enabling logging of
slow queries (1second).
LOG: duration: 202927.174 ms statement: SELECT p.uniqueid AS uid,
f.template_compressed AS ctemplate FROM person p INNER JOIN features f ON
p.id=f.person_id WHERE p.website_id = '11' AND p.uniqueid != 'extrababak'
AND birthdate < (NOW() - '13 YEAR'::interval) AND birthdate > (NOW() - '53
YEAR'::interval) AND f.datecreated > '1980-1-1'::date ORDER BY f.datecreatedDESC

I have created an index on all columns which may be part of the where
clause.

explain analyze SELECT p.uniqueid AS uid, f.template_compressed AS ctemplate
FROM person p INNER JOIN features f ON p.id=f.person_id WHERE p.website_id =
'11' AND p.uniqueid != 'extrababak' AND birthdate < (NOW() - '13
YEAR'::interval) AND birthdate > (NOW() - '53 YEAR'::interval) AND
f.datecreated > '1980-1-1'::date ORDER BY f.datecreated DESC;

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..13420.71 rows=63686 width=55) (actual time=
36.335..3265.844 rows=63902 loops=1)
-> Index Scan Backward using idx_features_datecreated on features f
(cost=0.00..3380.64 rows=79249 width=44) (actual
time=1.090..1488.601rows=79264 loops=1)
Index Cond: (datecreated > '1980-01-01'::date)
-> Index Scan using person_pkey on person p (cost=0.00..0.11 rows=1
width=19) (actual time=0.019..0.020 rows=1 loops=79264)
Index Cond: ( p.id = f.person_id)
Filter: ((website_id = 11) AND ((uniqueid)::text <>
'extrababak'::text) AND (birthdate < (now() - '13 years'::interval)) AND
(birthdate > (now() - '53 years'::interval)))
Total runtime: 3306.216 ms
(7 rows)

explain analyze yet another time
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=0.00..13420.71 rows=63686 width=55) (actual time=
0.355..1123.840 rows=63902 loops=1)
-> Index Scan Backward using idx_features_datecreated on features f
(cost=0.00..3380.64 rows=79249 width=44) (actual time=
0.072..97.846rows=79264 loops=1)
Index Cond: (datecreated > '1980-01-01'::date)
-> Index Scan using person_pkey on person p (cost=0.00..0.11 rows=1
width=19) (actual time=0.009..0.010 rows=1 loops=79264)
Index Cond: (p.id = f.person_id)
Filter: ((website_id = 11) AND ((uniqueid)::text <>
'extrababak'::text) AND (birthdate < (now() - '13 years'::interval)) AND
(birthdate > (now() - '53 years'::interval)))
Total runtime: 1163.758 ms
(7 rows)

\d person;
Table "public.person"
Column | Type |
Modifiers
--------------+--------------------------+-----------------------------------------------------
id | integer | not null default
nextval('person_id_seq'::regclass)
uniqueid | character varying(50) |
website_id | integer |
datecreated | timestamp with time zone | default now()
gender | character varying(1) | default 'U'::character varying
birthdate | date |
category_id | integer |
city | character varying(100) |
zipcode | character varying(20) |
state | character varying(100) |
country_iso2 | character varying(2) |
Indexes:
"person_pkey" PRIMARY KEY, btree (id)
"idx_person_birthdate" btree (birthdate)
"idx_person_category_id" btree (category_id)
"idx_person_city" btree (city)
"idx_person_country_iso2" btree (country_iso2)
"idx_person_datecreated" btree (datecreated)
"idx_person_gender" btree (gender)
"idx_person_state" btree (state)
"idx_person_uniqueid" btree (uniqueid)
"idx_person_website_id" btree (website_id)
"idx_person_zipcode" btree (zipcode)
Foreign-key constraints:
"person_website_id_fkey" FOREIGN KEY (website_id) REFERENCES website(id)

obl_db2=# \d features;
Table "public.features"
Column | Type |
Modifiers
---------------------+--------------------------+-------------------------------------------------------

id | integer | not null default
nextval('features_id_seq'::regclass)
datecreated | timestamp with time zone |
template | bytea |
person_id | integer |
templateversion_id | integer |
template_compressed | bytea |
Indexes:
"features_pkey" PRIMARY KEY, btree (id)
"idx_features_datecreated" btree (datecreated)
"idx_features_person_id" btree (person_id)
Foreign-key constraints:
"features_person_id_fkey" FOREIGN KEY (person_id) REFERENCES person(id)
ON DELETE CASCADE
"features_templateversion_id_fkey" FOREIGN KEY (templateversion_id)
REFERENCES templateversion(id)

[victor(at)optimum02 ~]$ free
total used free shared buffers cached
Mem: 1034772 973988 60784 0 111080 361464
-/+ buffers/cache: 501444 533328
Swap: 2031608 88 2031520

uname -a
Linux optimum02 2.6.18-1.2257.fc5smp #1 SMP Fri Dec 15 16:33:51 EST 2006
i686 i686 i386 GNU/Linux

[victor(at)optimum02 ~]$ psql --version
psql (PostgreSQL) 8.2.0
contains support for command-line editing

select version();

version
-------------------------------------------------------------------------------------------------------
PostgreSQL 8.2.0 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC)
4.1.120060525 (Red Hat
4.1.1-1)

cat /proc/sys/kernel/shmall
2097152
cat /proc/sys/kernel/shmmni
4096
cat /proc/sys/kernel/shmmax
134217728

From the postgresql.conf file, the following settings have been changed to
these values
shared_buffers = 96MB
effective_cache_size = 512MB

--
Victor Adolfsson
Chief Technology Officer

+46 733 404 623
Optimum Biometric Labs AB
Campus Gräsvik 5
37175 Karlskrona
Sweden

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Viatcheslav Kalinin 2007-07-31 11:37:05 Polymorphic functions' weird behavior
Previous Message Nis Jørgensen 2007-07-31 07:56:47 Re: alter table table add column