random dataset generator for SKYLINE operator

From: Hannes Eder <Hannes(at)HannesEder(dot)net>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: random dataset generator for SKYLINE operator
Date: 2007-10-11 14:44:18
Message-ID: 470E36C2.1040308@HannesEder.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We wrote a little contrib module, which we'd like to share. It can be
used to generate random datasets as they have been used in
[Borzsonyi2001] and related work. The code is based directly on the
code of the authors, thanks to Donald Kossmann for sharing the
code. Donald Kossmann agrees on sharing our implementation with the
community.

We use it primary for profiling and regression testing of our
implementation of the SKYLINE operator in PostgreSQL, which is work in
progress, details will be announced soon.

We use it in two ways:

(1) create table a2d1000 as (select * from pg_rand_dataset(‘anti’, 2,
1000, 0) ds(id int, d1 float, d2 float)). And then work with this table,
e.g. an additional index and query the data or

(2) directly for profiling and regression testing as: select * from
pg_rand_dataset(‘indep’, 3, 1000, 0) ds(id int, d1 float, d2 float, d3
float) skyline by d1 min, d2 min, d3 max;

It might be useful for other purposes.

We ask you for comments and suggestions.

What's the best way to share this module?

Declaration
-----------

CREATE FUNCTION pg_catalog.pg_rand_dataset(disttype text, dim int,
rows int, seed int)
RETURNS setof record
AS '$libdir/randdataset', 'pg_rand_dataset'
LANGUAGE C IMMUTABLE STRICT;

Usage example
-------------

Hannes=# select * from pg_rand_dataset('indep',1,5,0) ds(id int, d1 float);
id | d1
----+--------------------
1 | 0.170828036112165
2 | 0.749901980510867
3 | 0.0963716553972902
4 | 0.870465227342427
5 | 0.577303506702792
(5 rows)

Hannes=# select * from pg_rand_dataset('corr',2,3,0) ds(id int, d1
float, d2 float);
id | d1 | d2
----+-------------------+-------------------
1 | 0.489722997173791 | 0.431007019449241
2 | 0.385624871971487 | 0.645283734523713
3 | 0.754378154805565 | 0.82440492311745
(3 rows)

Hannes=# select * from pg_rand_dataset('anti',3,2,0) ds(id int, d1
float, d2 float, d3 float);
id | d1 | d2 | d3
----+-------------------+--------------------+-------------------
1 | 0.848072555389202 | 0.0868831583602555 | 0.656344708211334
2 | 0.447278565604571 | 0.641176215525044 | 0.523196923510816
(2 rows)

References
----------

[Borzsonyi2001] Börzsönyi, S.; Kossmann, D. & Stocker, K.:
The Skyline Operator, ICDE 2001, 421--432

Best,
-Hannes

Attachment Content-Type Size
randdataset.tar.gz application/x-gzip 4.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Kreen 2007-10-11 14:48:35 Patch: txid in core
Previous Message Pavel Stehule 2007-10-11 14:38:04 Re: full text search in 8.3