Quick Links

Select distinct sorting all rows 8.0rc1

From:	"Guy Rouillier" <guyr(at)masergy(dot)com>
To:	"PostgreSQL General" <pgsql-general(at)postgresql(dot)org>
Subject:	Select distinct sorting all rows 8.0rc1
Date:	2004-12-05 06:07:39
Message-ID:	CC1CF380F4D70844B01D45982E671B2348E46A@mtxexch01.add0.masergy.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

System is 4-way Opteron 844, 16 GB memory, SCSI. This is a trial run
for converting an Oracle DB, so system is not optimized. I have

shared_buffers = 50000
work_mem = 65536

A table has about 65 million rows (data collection system.) It has a
primary key, no other indexes, no OIDs. The primary key contains 5
columns, of which service_id is the first (i.e., higher order.) I've
run a simple ANALYZE on this table. Trying to find the unique
service_id values, I did the following. If I'm reading this right, the
sequential scan is passing all 65 million retrieved values onto the
sort, which understandably takes a long time.

This query won't be run very often, if ever, in production, so I don't
want to add additional indexes to support it. Our known queries run
very fast, about 1.2 seconds the first time through, and 20 msecs on
repeats. Can I run ANALYZE in a different way so that queries like this
can be completed in a shorter amount of time? I'm trying to anticipate
ad-hoc queries the user community might come up with.

Since there are such a small number or result rows compared to the
number of total rows, perhaps using a hash table to record unique values
would avoid passing all 65 million rows to the sort. Thanks for all
suggestions.

estat=> explain analyze select distinct(service_id) from
five_min_stats_200408;

QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
----
Unique (cost=13578354.70..13894902.76 rows=726 width=12) (actual
time=1227906.271..1282110.055 rows=879 loops=1)
-> Sort (cost=13578354.70..13736628.73 rows=63309612 width=12)
(actual time=1227906.266..1255961.318 rows=63359396 loops=1)
Sort Key: service_id
-> Seq Scan on five_min_stats_200408 (cost=0.00..1668170.12
rows=63309612 width=12) (actual time=0.061..80398.222 rows=63359396
loops=1)
Total runtime: 1284212.556 ms
(5 rows)

Time: 1284213.359 ms

--
Guy Rouillier

Responses

Re: Select distinct sorting all rows 8.0rc1 at 2004-12-05 18:38:35 from Pierre-Frédéric Caillaud

Browse pgsql-general by date

	From	Date	Subject
Next Message	John Hansen	2004-12-05 07:46:01	Re: List archives search function broken
Previous Message	Stephan Szabo	2004-12-05 05:44:17	Re: Trigger problem