Re: new procedural language - PL/R

From: Joe Conway <mail(at)joeconway(dot)com>
To: cbbrowne(at)cbbrowne(dot)com
Cc: "Hackers (PostgreSQL)" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: new procedural language - PL/R
Date: 2003-02-03 21:39:06
Message-ID: 3E3EE17A.4030107@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

cbbrowne(at)cbbrowne(dot)com wrote:
> What might be "nifty" would be to have some mappings that did Clever
> Transformations of Queries Into Views, particularly if that allowed
> harnessing the DBMS to do some of the statistical analysis behind your
> back...

I'm not quite sure what you mean here, but it does support pulling data into
the R interpreter as a "data.frame" via SPI, and returning R
matricies/vectors/data.frames as either Postgres arrays or as rows and columns
of a table function. Here's two contrived, but illustrative, examples:

create or replace function test_dtup() returns record as
'data.frame(letters[1:10],1:10)' language 'plr';
select * from test_dtup() as t(f1 text, f2 int);
f1 | f2
----+----
a | 1
b | 2
c | 3
d | 4
e | 5
f | 6
g | 7
h | 8
i | 9
j | 10
(10 rows)

create or replace function test_spi_tup(text) returns record as
'pg.spi.exec(arg1)' language 'plr';
select * from test_spi_tup('select oid, typname from pg_type where typname =
''oid'' or typname = ''text''') as t(typeid oid, typename name);
typeid | typename
--------+----------
25 | text
26 | oid
(2 rows)

You could easily perform a parameterized query via SPI, retrieve the results
into an R data.frame, do some statistical manipulations, and then return the
results as a table function. The table function itself could be wrapped in a
view to hide the whole thing from the end-user.

You can also create custom aggregates. There has been at least one thread not
too long ago regarding an aggregate to calculate median, for instance. Here it
is in plr:

create table foo(f1 text, f2 float8);
insert into foo values('cat1',1.21);
insert into foo values('cat1',1.24);
insert into foo values('cat1',1.18);
insert into foo values('cat1',1.26);
insert into foo values('cat1',1.15);
insert into foo values('cat2',1.15);
insert into foo values('cat2',1.26);
insert into foo values('cat2',1.32);
insert into foo values('cat2',1.30);
create or replace function r_median(_float8) returns float as 'median(arg1)'
language 'plr';
CREATE AGGREGATE median (sfunc = array_accum, basetype = float8, stype =
_float8, finalfunc = r_median);
select f1, median(f2) from foo group by f1 order by f1;
f1 | median
------+--------
cat1 | 1.21
cat2 | 1.28
(2 rows)

It's not as fast as the native PostgreSQL functions if you just need average
or standard deviation, but it's alot easier and faster than writing your own
for something more out-of-the-ordinary.

Joe

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-02-03 21:52:14 Re: Win32 Powerfail testing - results
Previous Message Jan Wieck 2003-02-03 21:20:09 Re: [mail] Re: Windows Build System