WIP patch: convert SQL-language functions to return tuplestores

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: WIP patch: convert SQL-language functions to return tuplestores
Date: 2008-10-27 01:49:49
Message-ID: 28998.1225072189@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We have an open TODO item to support SQL-language functions that return
the output of a RETURNING clause attached to an INSERT/UPDATE/DELETE query
within the function. This is something that was left undone in the 8.2
development cycle after this thread analyzed the problem:
http://archives.postgresql.org/pgsql-hackers/2006-10/msg00665.php
The basic conclusion was that the only sane way to do it is to have the
SQL function execute the DML command to completion and then return the
emitted rows in a tuplestore. Which is fine, but it would be pretty messy
to do unless we make set-returning SQL functions return tuplestores all
the time, and there was worry that that might lose performance compared to
the existing value-per-call protocol.

Attached is a draft patch that converts set-returning SQL functions to
return tuplestores. It's pretty incomplete --- it doesn't actually add
the RETURNING feature, and there's a lot of ugly stuff to clean up ---
but it passes regression tests and it's close enough for performance
testing. What I find is that the performance hit doesn't seem too bad.
The test case I'm using looks like this:

regression=# create function foo(int) returns setof int as
'select generate_series(1,$1)' language sql;
CREATE FUNCTION
regression=# select count(*) from (select foo(NNN)) ss;

This example is chosen with malice aforethought to stress the tuplestore
performance as much as possible. The internal generate_series() call is
about as cheap a set-generating function as possible, and it returns
through value-per-call mechanism so there is no added tuplestore in the
way. In the outer query we again avoid the tuplestore that would be
created by nodeFunctionscan.c, and we use an upper count(*) to avoid
shipping all the rows to the client. I should note also that the function
is intentionally declared VOLATILE to prevent its being inlined into the
calling query.

What I find on my development machine is that CVS HEAD processes this
query at about 1.33 microsec/row. With the attached patch, the speed is
about 1.0 usec/row if the tuplestore stays within work_mem; about 1.3
usec/row if it spills "to disk" but doesn't overflow available kernel disk
cache; and about 1.56 usec/row in cases considerably larger than available
RAM, when we actually have to write the data to disk and read it back.
This is on my development workstation, which is a dual 2.8GHz Xeon EM64T
with your typical junk consumer-grade single ATA disk drive, running
Fedora 9. (BTW, the test seems to be mostly CPU-bound even when spilling
to disk.)

So I'm concluding that we can easily afford to switch to tuplestore-always
operation, especially if we are willing to put any effort into tuplestore
optimization. (I note that the current tuplestore code writes 24 bytes
per row for this example, which is a shade on the high side for only 4 bytes
payload. It looks like it would be pretty easy to knock 10 bytes off that
for a 40% savings in I/O volume.)

I'm putting up this patch mostly so that anyone who's worried about the
performance issue can do their own tests. It's definitely not meant for
style or completeness critiques ;-)

BTW, the patch also removes the existing limitation of not being able
to call set-returning plpgsql functions in a SELECT targetlist...

regards, tom lane

Attachment Content-Type Size
functions-via-tuplestore-1.patch.gz application/octet-stream 11.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Klint Gore 2008-10-27 02:11:36 Re: again... (win32 logging errors)
Previous Message Robert Haas 2008-10-27 00:37:44 Re: BufferAccessStrategy for bulk insert