Fw: Selecting random rows using weights

From: Masaru Sugawara <rk73(at)sea(dot)plala(dot)or(dot)jp>
To: gerard(at)interfold(dot)com
Cc: pgsql-novice(at)postgresql(dot)org
Subject: Fw: Selecting random rows using weights
Date: 2002-05-19 04:29:21
Message-ID: 20020519132333.6BB4.RK73@sea.plala.or.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-novice

On Sun, 12 May 2002 13:24:17 -0600
"G" <gerard(at)interfold(dot)com> wrote:

>
> Is there a way to select rows randomly using weight columns? (weighted
> random values)
>

I find out how to get what you want at random; however it's for only a row.
If you need ROWS, it's impossible to select them by using these functions
at least. Probably, another approach will be need.

-- First --
-- An id column must be unique.
create table tbl_random(id int4 unique not null, weight int4);
insert into tbl_random values(1, 1);
insert into tbl_random values(2, 10);
insert into tbl_random values(3, 0);
insert into tbl_random values(4, 3);
ALTER TABLE tbl_random ADD COLUMN r_start int4;
ALTER TABLE tbl_random ADD COLUMN r_end int4;
CREATE INDEX idx_random_r_start ON tbl_random(r_start);
CREATE INDEX idx_random_r_end ON tbl_random(r_end);

-- Second --
CREATE OR REPLACE FUNCTION fn_update_random() RETURNS boolean AS '
DECLARE
rec RECORD;
range int4 :=1;
BEGIN
FOR rec IN SELECT * FROM tbl_random WHERE weight > 0 LOOP
UPDATE tbl_random SET r_start = range,
r_end = weight + range - 1
WHERE id = rec.id;
SELECT INTO range (r_end + 1) FROM tbl_random
WHERE id = rec.id;
END LOOP;
FOR rec IN SELECT * FROM tbl_random WHERE weight = 0 LOOP
UPDATE tbl_random SET r_start = 0, r_end = 0
WHERE id = rec.id;
END LOOP;
RETURN true;
END;
' LANGUAGE 'plpgsql';

-- Third --
-- This query for updating the weighted range of r_start to r_end
-- needs to execute after rows are inserted into a target table,
-- ones are deleted from it, or the "weight" column in it is updated.
SELECT fn_update_random();

-- 4th --
-- A random number(0 to 1) is scaled by the maximum number of r_end, which
-- must be put into the subselect not to internally execute the random()
-- function twice. And if its scaled number is within the weighted range,
-- the unique row having the range will be selected.
SELECT t1.*, t2.r
FROM tbl_random AS t1,
(SELECT ceil(max(r_end) * (SELECT random())) AS r
FROM tbl_random) AS t2
WHERE t1.r_start <= t2.r AND t2.r <= t1.r_end;

-- Here is a bad example.
SELECT t1.*
FROM tbl_random AS t1
WHERE (SELECT ceil(max(r_end) * (SELECT random())) FROM tbl_random)
BETWEEN t1.r_start AND t1.r_end;

Regards,
Masaru Sugawara

Browse pgsql-novice by date

  From Date Subject
Next Message thiemo 2002-05-19 08:15:06 Forgotten the master password of db
Previous Message Tom Lane 2002-05-18 15:36:21 Re: ADD CONSTRAINT NOT NULL, how?