pg_background (and more parallelism infrastructure patches)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: pg_background (and more parallelism infrastructure patches)
Date: 2014-07-25 18:11:32
Message-ID: CA+Tgmoam66dTzCP8N2cRcS6S6dBMFX+JMba+mDf68H=KAkNjPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached is a contrib module that lets you launch arbitrary command in
a background worker, and supporting infrastructure patches for core.
You can launch queries and fetch the results back, much as you could
do with a dblink connection back to the local database but without the
hassles of dealing with authentication; and you can also run utility
commands, like VACUUM. For people who have always wanted to be able
to launch a vacuum (or an autonomous transaction, or a background
task) from a procedural language ... enjoy.

Here's an example of running vacuum and then fetching the results.
Notice that the notices from the original session are propagated to
our session; if an error had occurred, it would be re-thrown locally
when we try to read the results.

rhaas=# create table foo (a int);
CREATE TABLE
rhaas=# select pg_background_launch('vacuum verbose foo');
pg_background_launch
----------------------
51101
(1 row)

rhaas=# select * from pg_background_result(51101) as (x text);
INFO: vacuuming "public.foo"
INFO: "foo": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
x
--------
VACUUM
(1 row)

Here's an overview of the attached patches:

Patches 1 and 2 add a few new interfaces to the shm_mq and dsm APIs
that happen to be convenient for later patches in the series. I'm
pretty sure I could make all this work without these, but it would
take more code and be less efficient, so here they are.

Patch 3 adds the ability for a backend to request that the protocol
messages it would normally send to the frontend get redirected to a
shm_mq. I did this by adding a couple of hook functions. The best
design is definitely arguable here, so if you'd like to bikeshed, this
is probably the patch to look at. This patch also adds a function to
help you parse an ErrorResponse or NoticeResponse and re-throw the
error or notice in the originating backend. Obviously, parallelism is
going to need this kind of functionality, but I suspect a variety of
other applications people may develop using background workers may
want it too; and it's certainly important for pg_background itself.

Patch 4 adds infrastructure that allows one session to save all of its
non-default GUC values and another session to reload those values.
This was written by Amit Khandekar and Noah Misch. It allows
pg_background to start up the background worker with the same GUC
settings that the launching process is using. I intend this as a
demonstration of how to synchronize any given piece of state between
cooperating backends. For real parallelism, we'll need to synchronize
snapshots, combo CIDs, transaction state, and so on, in addition to
GUCs. But GUCs are ONE of the things that we'll need to synchronize
in that context, and this patch shows the kind of API we're thinking
about for these sorts of problems.

Patch 5 is a trivial patch to add a function to get the authenticated
user ID. Noah pointed out to me that it's important for the
authenticated user ID, session user ID, and current user ID to all
match between the original session and the background worker.
Otherwise, pg_background could be used to circumvent restrictions that
we normally impose when those values differ from each other. The
session and current user IDs are restored by the GUC save-and-restore
machinery ("session_authorization" and "role") but the authenticated
user ID requires special treatment. To make that happen, it has to be
exposed somehow.

Patch 6 is pg_background itself. I'm quite pleased with how easily
this came together. The existing background worker, dsm, shm_toc, and
shm_mq infrastructure handles most of the heavily lifting here -
obviously with some exceptions addressed by the preceding patches.
Again, this is the kind of set-up that I'm expecting will happen in a
background worker used for actual parallelism - clearly, more state
will need to be restored there than here, but nonetheless the general
flow of the code here is about what I'm imagining, just with somewhat
more different kinds of state. Most of the work of writing this patch
was actually figuring out how to execute the query itself; what I
ended up with is mostly copied form exec_simple_query, but with some
difference here and there. I'm not sure if it would be
possible/advisable to try to refactor to reduce duplication.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0001-Extend-shm_mq-API-with-new-functions-shm_mq_sendv-an.patch text/x-patch 7.8 KB
0002-Extend-dsm-API-with-a-new-function-dsm_unkeep_mappin.patch text/x-patch 2.0 KB
0003-Support-frontend-backend-protocol-communication-usin.patch text/x-patch 13.0 KB
0004-Add-infrastructure-to-save-and-restore-GUC-values.patch text/x-patch 12.2 KB
0005-Add-a-function-to-get-the-authenticated-user-ID.patch text/x-patch 1.5 KB
0006-pg_background-Run-commands-in-a-background-worker-an.patch text/x-patch 31.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Claudio Freire 2014-07-25 18:21:43 Re: Proposal: Incremental Backup
Previous Message Magnus Hagander 2014-07-25 17:20:32 Re: implement subject alternative names support for SSL connections