Select distinct sorting all rows 8.0rc1

From: "Guy Rouillier" <guyr(at)masergy(dot)com>
To: "PostgreSQL General" <pgsql-general(at)postgresql(dot)org>
Subject: Select distinct sorting all rows 8.0rc1
Date: 2004-12-05 06:07:39
Message-ID: CC1CF380F4D70844B01D45982E671B2348E46A@mtxexch01.add0.masergy.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

System is 4-way Opteron 844, 16 GB memory, SCSI. This is a trial run
for converting an Oracle DB, so system is not optimized. I have

shared_buffers = 50000
work_mem = 65536

A table has about 65 million rows (data collection system.) It has a
primary key, no other indexes, no OIDs. The primary key contains 5
columns, of which service_id is the first (i.e., higher order.) I've
run a simple ANALYZE on this table. Trying to find the unique
service_id values, I did the following. If I'm reading this right, the
sequential scan is passing all 65 million retrieved values onto the
sort, which understandably takes a long time.

This query won't be run very often, if ever, in production, so I don't
want to add additional indexes to support it. Our known queries run
very fast, about 1.2 seconds the first time through, and 20 msecs on
repeats. Can I run ANALYZE in a different way so that queries like this
can be completed in a shorter amount of time? I'm trying to anticipate
ad-hoc queries the user community might come up with.

Since there are such a small number or result rows compared to the
number of total rows, perhaps using a hash table to record unique values
would avoid passing all 65 million rows to the sort. Thanks for all
suggestions.

estat=> explain analyze select distinct(service_id) from
five_min_stats_200408;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
----
Unique (cost=13578354.70..13894902.76 rows=726 width=12) (actual
time=1227906.271..1282110.055 rows=879 loops=1)
-> Sort (cost=13578354.70..13736628.73 rows=63309612 width=12)
(actual time=1227906.266..1255961.318 rows=63359396 loops=1)
Sort Key: service_id
-> Seq Scan on five_min_stats_200408 (cost=0.00..1668170.12
rows=63309612 width=12) (actual time=0.061..80398.222 rows=63359396
loops=1)
Total runtime: 1284212.556 ms
(5 rows)

Time: 1284213.359 ms

--
Guy Rouillier

Responses

Browse pgsql-general by date

  From Date Subject
Next Message John Hansen 2004-12-05 07:46:01 Re: List archives search function broken
Previous Message Stephan Szabo 2004-12-05 05:44:17 Re: Trigger problem