Database owner installable modules patch

From: "Tom Dunstan" <pgsql(at)tomd(dot)cc>
To: "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, pgsql-patches(at)postgresql(dot)org
Subject: Database owner installable modules patch
Date: 2008-04-06 20:49:51
Message-ID: ca33c0a30804061349s41b4d8fcsa9c579454b27ecd2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Hi all

Here is a patch that provides an initial implementation of the module
idea that was kicked around over the last few days. While there
certainly wasn't consensus on list, enough people seemed interested in
the idea of database-owner-installable modules that I thought it was
worth having a play with.

The general idea, to recap, is to have modules, whether included in
the distribution a la contrib or installed separately, installed under
a directory such as $pkglib_dir/modules/foo. A typical module
directory might contain:
- foo.so/foo.dll
- install.sql
- uninstall.sql
- foo.conf
- some-other-file-needed-by-foo-module.dat
The module would be installed on the system, but the necessary scripts
to install it in a particular database have not been run. In
particular, the modules would not usually be install in template1.
Database owners themselves can then opt to enable a particular
installed module in their own database - they do not have to hassle a
sysadmin to do it for them.

Features of the patch:
- A database owner can issue the command "INSTALL MODULE foo", and
pgsql will look for a $pkglib_dir/modules/foo/install.sql file to run,
and run it.

- The install script can do pretty much anything - the user is
treated as the superuser for the duration of the script. The main and
obvious usage is to create C language functions required by the
module.

- An entry is created in a new pg_module catalog. This is mainly to
guard against someone trying to install a module twice at this point,
but it may have other uses in the future (see below).

- "UNINSTALL MODULE foo" looks for and executes
$pkglib_dir/modules/foo/uninstall.sql and cleans up the catalog.

Here is a list of things that are either still to do before I'd
consider it worthy of inclusion (should the general approach be
considered acceptable), or which I'd like some guidance on:

- Currently the script is executed in one big SPI_execute call, and
so errors and NOTICEs print out the entire script as context. I'm not
sure how to break it up without writing a full parser - we must have
something available in the backend to break a string up into multiple
statements to execute, but I'm not sure where to look. Also, is there
a better way to do this than SPI?

- I've hacked in a bit of an end-run around permissions checks to
make the current user look like a super-user while a module script is
running. Is there a better way to do this?

- I can't create C language functions from dlls under the modules
dir. I'd like to be able to specify 'modules/foo/foo' as the library
name, but the backend sees a slash and decides that must mean the path
is absolute. I see two ways to fix this: change the existing code in
dfmgr.c to *really* check for absolute/relative paths rather than the
current hack, or I could stick in a special case for when it starts
with "modules/". I thought I'd get some guidance on-list. Do people
think that sticking the dll in with other resources for the module
under $pkglib_dir is a bad thing? (I think having everything in one
place is a very good thing myself).Is the existing check written the
way it is for a particular reason, or is it just "good enough"?

- It would be nice to create the empty modules dir when we install
pgsql, but while I suppose hacking a Makefile to install it is the way
to go, I'm not sure which one would be appropriate.

- Hack pgxs to install stuff into a modules dir if we give it some
appropriate flag.

- I'd like to add pg_depend entries for stuff installed by the module
on the pd_module entry, so that you can't drop stuff required by the
module without uninstalling the module itself. There would have to be
either a function or more syntax to allow a script to do that, or some
sort of module descriptor that let the backend do it itself.

- Once the issue of loading a dll from inside the module's directory
is done, I'd like to look for an e.g. module_install() function inside
there, and execute that rather than the install.sql if found. Ditto
for uninstall.

- Maybe a basic mechanism to allow a module to require another one.
Even just a "SELECT require_module('bar')" call at the top of a
script.

- It would be nice to suppress NOTICEs when installing stuff - the
user almost certainly doesn't care.

- Pick up config files in module directories, so that a module can
install and pick up config for itself rather than getting the sysadmin
to hack the global custom_variable_classes setting.

- Should plperl etc be done as modules so that their config can live
independently as well? And to allow modules to "require" them?

Some other nice to haves for some point in the future:

- Have some sort of install module privilege, rather than just a
check for database ownership
- Allow looking for modules under some module path for e.g.
/usr/local module installs
- Convert existing contrib to modules where appropriate :)
- I really have no idea what happens if non-ascii characters are in
an install script at the moment. What happens if funky characters are
passed to an SPI_execute call?

Very far future:
- Have pgxs auto-generate rpm .spec files for modules, plus e.g. .deb
equivalent, wix files for windows etc etc.
- Versioning on modules?

General discussion:

I see this work as orthogonal to both the CPAN-style distribution /
repository discussion, and the fate-of-contrib discussion. If contrib
modules are reworked as this sort of module and left in the
distribution, they'll be easier to use and more likely to be installed
than they are now. If, as Tom suggested, they're mostly moved out of
the pgsql source tree and to e.g. pgfoundry or whatever, this
mechanism should make them (and every other extension out there) easy
to package, install and enable in a user's database.

Similarly, I don't personally care for a CPAN-style distribution setup
- on my preferred unix-like system I use yum and on windows I prefer
installers. Nonetheless, a standardised system to install and
enable/disable modules acts as an enabler for all packaging and
distribution systems.

I'm not sure about the command names - there was already a tendency in
the recent discussion to mix the notion of installation of code on the
filesystem versus installation into a particular user's database. The
convention for doing stuff in a db is CREATE/DROP, but CREATE MODULE
doesn't feel right to me, just as I don't really like CREATE LANGUAGE.
How about ENABLE/DISABLE MODULE? Makes it clear that the module is
installed, it's just not available in this database yet. Thoughts?

Anyway, discussion and feedback hereby solicited!

Cheers

Tom

Attachment Content-Type Size
modules.patch text/x-patch 24.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2008-04-06 21:17:10 Re: Adding pipelining support to set returning functions
Previous Message Tino Wildenhain 2008-04-06 18:28:59 Re: modules

Browse pgsql-patches by date

  From Date Subject
Next Message Gregory Stark 2008-04-06 22:29:50 Re: Database owner installable modules patch
Previous Message Tom Lane 2008-04-06 16:02:25 Re: updated hash functions for postgresql v1