Re: plan problem

From: Richard Huxton <dev(at)archonet(dot)com>
To: Ken Geis <kgeis(at)speakeasy(dot)org>, pgsql-performance(at)postgresql(dot)org
Subject: Re: plan problem
Date: 2004-04-07 12:31:30
Message-ID: 200404071331.30925.dev@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wednesday 07 April 2004 10:03, Ken Geis wrote:
> Richard Huxton wrote:
> > On Tuesday 06 April 2004 21:25, Ken Geis wrote:
> >>I am trying to find an efficient way to draw a random sample from a
> >>complex query. I also want it to be easy to use within my application.
> >>
> >>So I've defined a view that encapsulates the query. The id in the
> >>"driving" table is exposed, and I run a query like:
> >>
> >>select * from stats_record_view
> >> where id in (select id from driver_stats
> >> order by random()
> >> limit 30000);
> >
> > How about a join?
> >
> > SELECT s.*
> > FROM
> > stats_record_view s
> > JOIN
> > (SELECT id FROM driver_stats ORDER BY random() LIMIT 30000) AS r
> > ON s.id = r.id;
>
> Yes, I tried this too after I sent the first mail, and this was somewhat
> better. I ended up adding a random column to the driving table, putting
> an index on it, and exposing that column in the view. Now I can say
>
> SELECT * FROM stats_record_view WHERE random < 0.093;
>
> For my application, it's OK if the same sample is picked time after time
> and it may change if data is added.

Fair enough - that'll certainly do it.

> > Also worth checking the various list archives - this has come up in the
> > past, but some time ago.
>
> There are some messages in the archives about how to get a random
> sample. I know how to do that, and that's not why I posted my message.
> Are you saying that the planner behavior I spoke of is in the
> archives? I wouldn't know what to search on to find that thread. Does
> anyone think that the planner issue has merit to address? Can someone
> help me figure out what code I would look at?

I was assuming after getting a random subset they'd see the same problem you
are. If not, probably worth looking at. In which case, an EXPLAIN ANALYZE of
your original query would be good.

--
Richard Huxton
Archonet Ltd

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Harald Fuchs 2004-04-07 13:05:55 Re: Raw devices vs. Filesystems
Previous Message Shridhar Daithankar 2004-04-07 11:54:41 Re: good pc but bad performance,why?