Quick Links

Re: Select random lines of a table using a probability distribution

From:	"ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>
To:	"Jira, Marcel" <Marcel(dot)Jira(at)wu(dot)ac(dot)at>
Cc:	"'pgsql-sql(at)postgresql(dot)org'" <pgsql-sql(at)postgresql(dot)org>
Subject:	Re: Select random lines of a table using a probability distribution
Date:	2011-07-13 13:58:10
Message-ID:	20110713135810.GA1874@staff-mud-56-27.rice.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

On Wed, Jul 13, 2011 at 03:27:10PM +0200, Jira, Marcel wrote:
> Hi!
>
> Let's consider I have a table like this
>
> id qualification gender age income
>
> I'd like to select (for example 100) lines of this table by random, but the random mechanism has to follow a certain probability distribution.
>
> I want to use this procedure to construct a test group for another selection.
>
> Example:
>
> I filter all lines having the qualification "plumber".
> I get 50 different ids consisting of 40 males, 10 females and a certain age distribution.
>
> I also get some information concerning the income of the plumbers.
>
> Now I want to know if the income is more influenced by the gender and age distribution or by the qualification "plumber".
>
> Therefore I would like to select a test group (of 50 or more) without any plumbers. This test group has to follow the same age and gender distribution.
>
> Then I would be able to compare this groups income statistics with the plumbers income statistics.
>
> Is this possible (and doable with reasonable effort) in PostgreSQL?
>
> Thank you in advance.
>
> Best regards,
>
> Marcel Jira
>

You may want to take a look at pl/R which make the R system available to
PostgreSQL as a function language.

Regards,
Ken

In response to

Select random lines of a table using a probability distribution at 2011-07-13 13:27:10 from Jira, Marcel

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Wes James	2011-07-13 15:36:40	Re: combining strings to make a query
Previous Message	Jira, Marcel	2011-07-13 13:27:10	Select random lines of a table using a probability distribution