Re: Integration with Access Method interface

From: Alice Lottini <alice_lottini(at)yahoo(dot)it>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Mailing List Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Integration with Access Method interface
Date: 2003-04-11 15:44:26
Message-ID: 20030411154426.10504.qmail@web13703.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Our task is to implement FPGrowth (an algorithm for
extracting association rules for data mining purposes)
as a C programme and to integrate it at low level into
Postgres. We are strictly required not to pass through
the SQL layer and to bypass even the optimiser layer,
getting the data out of tables directly with the
Access Methods.

The reason for this is that all the existing tools for
data mining obtain data either from flat files or from
dbms, through SQL queries; since the amount of data
involved is usually extremely huge, this high level
integration results in rather poor performances.
Furthermore, FPGrowth is a recursive algorithm and the
data structures it needs (FPTree's) are likely not to
fit into memory.
In order to partly solve such problems, we've studied
an optimised version of the algorithm as well as a
partitioning technique for the data structures so that
they can be stored on the disk instead of having to be
held into memory.

Now we must enable our programme to access the data
directly from the table so that the FPtree can be
built and, after having partitioned it according to
our strategy, stored on the disk blocks (each node of
our tree should be a tuple).

We'd like to know which is the most suitable way for
integrating our algorithm into the server at the
access method level. If it is not possible simply to
invoke the access methods from an external programme,
what could be an alternative? Maybe making the whole
procedure a user defined function such as the ones in
contrib is the most viable way...
Any suggestion would be greatly appreciated.
Thanks in advance!
Best regards, alice and lorena

--- Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> ha scritto: >
=?iso-8859-1?q?Alice=20Lottini?=
> <alice_lottini(at)yahoo(dot)it> writes:
> > we're developing a C programme which needs to
> directly
> > use the functions of the Access Methods interface.
> > In particular, our programme contains a function,
> > readFromPG, which directly calls functions such as
> > heap_open, heap_beginscan and so on in order to
> > perform a low-level retrieval of data which are to
> be
> > made available for further elaborations.
>
> Why?
>
> The answer to your question is simple: you can't,
> because those are
> internal backend operations and are just not
> available to client
> programs. But I'm really at a loss why you think
> this would be a good
> thing to do. What's wrong with a "SELECT ..."
> command ?
>
> regards, tom lane

______________________________________________________________________
Yahoo! Cellulari: loghi, suonerie, picture message per il tuo telefonino
http://it.yahoo.com/mail_it/foot/?http://it.mobile.yahoo.com/index2002.html

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zeugswetter Andreas SB SD 2003-04-11 16:29:15 Re: Speed of SSL connections; cost of renegotiation
Previous Message Tom Lane 2003-04-11 15:24:33 Re: Speed of SSL connections; cost of renegotiation