| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
|---|---|
| To: | Robert Haas <robertmhaas(at)gmail(dot)com> | 
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Qi Huang <huangqiyx(at)hotmail(dot)com>, "neil(dot)conway" <neil(dot)conway(at)gmail(dot)com>, daniel <daniel(at)heroku(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com> | 
| Subject: | Re: Gsoc2012 Idea --- Social Network database schema | 
| Date: | 2012-03-21 15:34:58 | 
| Message-ID: | 1481.1332344098@sss.pgh.pa.us | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Well, the standard syntax apparently aims to reduce the number of
> returned rows, which ORDER BY does not.  Maybe you could do it with
> ORDER BY .. LIMIT, but the idea here I think is that we'd like to
> sample the table without reading all of it first, so that seems to
> miss the point.
I think actually the traditional locution is more like
	WHERE random() < constant
where the constant is the fraction of the table you want.  And yeah,
the presumption is that you'd like it to not actually read every row.
(Though unless the sampling density is quite a bit less than 1 row
per page, it's not clear how much you're really going to win.)
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Qi Huang | 2012-03-21 15:48:51 | Re: Gsoc2012 Idea --- Social Network database schema | 
| Previous Message | Pavel Stehule | 2012-03-21 15:30:17 | Re: Proposal: PL/pgPSM for 9.3 |