Quick Links

plan problem

From:	Ken Geis <kgeis(at)speakeasy(dot)org>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	plan problem
Date:	2004-04-06 20:25:54
Message-ID:	40731252.2030002@speakeasy.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

I am trying to find an efficient way to draw a random sample from a
complex query. I also want it to be easy to use within my application.

So I've defined a view that encapsulates the query. The id in the
"driving" table is exposed, and I run a query like:

select * from stats_record_view
where id in (select id from driver_stats
order by random()
limit 30000);

driver_stats.id is unique, the primary key. The problem I'm having is
that neither the ORDER BY nor the LIMIT change the uniqueness of that
column, but the planner doesn't know that. It does a HashAggregate to
make sure the results are unique. It thinks that 200 rows will come out
of that operation, and then 200 rows is small enough that it thinks a
Nested Loop is the best way to proceed from there.

I can post more query plan, but I don't think it would be that very
helpful. I'm considering just making a sample table and creating an
analogous view around that. I'd like to be able to keep this as simple
as possible though.

Ken

Responses

Re: plan problem at 2004-04-07 08:38:11 from Richard Huxton

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Chris Browne	2004-04-06 20:57:02	Re: Raw devices vs. Filesystems
Previous Message	Aaron Werman	2004-04-06 20:22:46	Re: possible improvement between G4 and G5