Re: Gsoc2012 Idea --- Social Network database schema

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Qi Huang <huangqiyx(at)hotmail(dot)com>, "neil(dot)conway" <neil(dot)conway(at)gmail(dot)com>, daniel <daniel(at)heroku(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: Gsoc2012 Idea --- Social Network database schema
Date: 2012-03-21 15:34:58
Message-ID: 1481.1332344098@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Well, the standard syntax apparently aims to reduce the number of
> returned rows, which ORDER BY does not. Maybe you could do it with
> ORDER BY .. LIMIT, but the idea here I think is that we'd like to
> sample the table without reading all of it first, so that seems to
> miss the point.

I think actually the traditional locution is more like
WHERE random() < constant
where the constant is the fraction of the table you want. And yeah,
the presumption is that you'd like it to not actually read every row.
(Though unless the sampling density is quite a bit less than 1 row
per page, it's not clear how much you're really going to win.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Qi Huang 2012-03-21 15:48:51 Re: Gsoc2012 Idea --- Social Network database schema
Previous Message Pavel Stehule 2012-03-21 15:30:17 Re: Proposal: PL/pgPSM for 9.3