Plugins redux (was Re: [PATCHES] PL instrumentation plugin support)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: korryd(at)enterprisedb(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Plugins redux (was Re: [PATCHES] PL instrumentation plugin support)
Date: 2006-08-09 16:44:00
Message-ID: 27198.1155141840@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Yesterday I suggested that we should redesign the PL plugin patch:
http://archives.postgresql.org/pgsql-patches/2006-07/msg00291.php
in view of the recently-committed patch to add initialization/
finalization support for all dynamic libraries loaded into the backend:
http://archives.postgresql.org/pgsql-committers/2006-08/msg00176.php

That was just handwaving at the time, because I didn't have a concrete
proposal, but here is one.

Overview
--------

ISTM there are two separate bits of functionality that the core backend
needs to provide in support of PL plugins. The first is that we need a
"rendezvous" facility whereby two separately-loaded shared libraries can
connect and exchange data (a struct of function pointers for the immediate
plans for plpgsql, but in general it might be anything). The exchanged
data mustn't be limited to things known in advance to the core backend.
The facility should support loading of communicating libraries in either
order, and it should also allow for unloading of libraries. This seems
easy enough to design: the core backend should basically provide a
hashtable that maps names to "void *" pointer variables, and then two
libraries that want to communicate just agree on a name and the data
structure to be pointed to. (More details below.)

The other, probably more controversial bit of functionality is that there
needs to be a way to cause a backend to load a PL plugin shared library
without any cooperation from the connected client application. For
interactive use of a PL debugger it might be sufficient to tell people to
do "LOAD 'plpgsql_debugger'" before running their function-to-be-tested,
but if you're trying to debug some behavior that only manifests in a large
complicated application, it may be impractical to get the application to
issue such a command. We agreed while discussing this at the PG
Anniversary Conference that providing a GUC variable for this purpose is
probably sufficient, as there are multiple ways that such a variable can
be set on behalf of an application: ALTER USER SET, ALTER DATABASE SET,
"export PGOPTIONS=--var=val" before starting the app, etc. As submitted,
the plugin patch envisions defining a custom GUC variable for each PL and
having each PL add logic to force loading of any shared library mentioned
in its specific variable. I claim however that that is just duplicative,
and that a single backend-wide GUC variable should be sufficient.

Details
-------

I propose adding one function to dfmgr.c,

void ** find_rendezvous_variable(const char *varname)

Internally this function has a per-backend hashtable that associates names
with "void *" data fields. It merely looks up the given name and returns
a pointer to the data field; if the name is not already present then it's
added with a data field initialized to NULL. The hashtable entries
continue to exist for the life of the backend, and are never modified by
the core backend code.

A plugin such as a plpgsql debugger would do this in its _PG_init()
function:

static PLpgSQL_plugin my_plugin = { ... function addresses ... };

PLpgSQL_plugin **var_ptr;

var_ptr = (PLpgSQL_plugin **) find_rendezvous_variable("PLpgSQL_plugin");
*var_ptr = &my_plugin;

and this in its _PG_fini() function:

PLpgSQL_plugin **var_ptr;

var_ptr = (PLpgSQL_plugin **) find_rendezvous_variable("PLpgSQL_plugin");
*var_ptr = NULL;

Meanwhile, plpgsql itself would do this in its _PG_init() function:

static PLpgSQL_plugin **plugin_ptr = NULL;

plugin_ptr = (PLpgSQL_plugin **) find_rendezvous_variable("PLpgSQL_plugin");

and in the places where it wants to pass control to the plugin, it'd do

if (*plugin_ptr)
((*plugin_ptr)->function_field) (... args ...);

The principal difference between this proposal and the original patch
is that this way supports unloading of PL plugins; AFAICS, the original
patch would just crash if the plugin were unloaded and then use of plpgsql
continued. Also, this way doesn't depend on the user to correctly specify
an initialization function within the plugin: as long as you load the
right shared library, everything works.

As for forcing the library load to occur, I propose a new GUC variable
"backend_load_libraries" that is much like the postmaster's
preload_libraries, except that the requested library loads happen
at backend start time instead of in the postmaster. Then we need
write and document the code only once, and there are other possible
uses for it besides PL plugins.

We need to think about security issues in defining such a variable:
we don't want unprivileged users to be able to load arbitrary code
into the server. One solution to this is to make the variable SUSET,
ie, you must be superuser to set it. This would make it impossible
for people to debug their own functions without superuser privileges
... but IIRC actual use of the proposed plpgsql debugger requires
superuser privileges too, so this isn't necessarily a fatal objection.
Still, it'd be nice if the PGOPTIONS way of setting the variable could
be used without requiring that the client app itself run as superuser.
Plan B is to restrict which shared libraries can be loaded using the
variable --- there are different ways we could do that, but my suggestion
would be to insist that the shared library come from "$libdir/plugins/".
The DBA is then responsible for allowing only "safe" shared libraries
to be installed in that directory, and therefore we can allow unprivileged
users to set "backend_load_libraries". (We might as well also allow them
to execute LOAD for these libraries, so that plugins could be switched
intra-session without superuser privileges.)

BTW, is anyone up for renaming the existing "preload_libraries" variable
to "postmaster_load_libraries"? This would be more symmetrical with
"backend_load_libraries", and so perhaps clearer about which does what
when.

Comments?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2006-08-09 16:45:15 Re: 8.2 features status
Previous Message Heikki Linnakangas 2006-08-09 16:40:27 Re: 8.2 features status

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2006-08-09 17:07:30 Re: [HACKERS] [PATCHES] Values list-of-targetlists patch for comments (was Re:
Previous Message David Fetter 2006-08-09 16:22:42 Re: [HACKERS] [PATCHES] Values list-of-targetlists patch for comments (was Re: