Skip site navigation (1) Skip section navigation (2)

Re: multiple sampling from tables and saving output

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Orme <d(dot)orme(at)imperial(dot)ac(dot)uk>
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Re: multiple sampling from tables and saving output
Date: 2005-02-07 15:54:46
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-novice
David Orme <d(dot)orme(at)imperial(dot)ac(dot)uk> writes:
> The process I need to do is a loop of  1000 repetitions of the 
> following:

> 1) select a random subset of the data from a table
> 2) save various summaries of the randomly selected data

> I can think of various external ways of doing this - my current plan is 
> to use a shell script to resend the same set of instructions repeated 
> times using 'psql -f instruction_set.sql'  - but I was wondering if 
> there was a canonical way of doing this within pgsql.

If you want a sample of, say, 1% of the rows in a table, you can do

	select * from mytable where random() < 0.01;

and get a genuinely unbiased sample.  Keep in mind though that you can't
get an exact sample size this way --- it'll be close to 1% but probably
not spot on.

			regards, tom lane

In response to

pgsql-novice by date

Next:From: Tom LaneDate: 2005-02-07 16:00:00
Subject: Re: Percent of update completed
Previous:From: DAVANNE Eric - NTRDate: 2005-02-07 15:33:33
Subject: password expiration interval

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group