From: | Steve Singer <ssinger_pg(at)sympatico(dot)ca> |
---|---|
To: | Jan Urbański <wulczer(at)wulczer(dot)org> |
Cc: | Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: plpython SPI cursors |
Date: | 2011-11-20 18:14:17 |
Message-ID: | BLU0-SMTP4530C1907601DFB9AEAE778ECA0@phx.gbl |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 11-10-15 07:28 PM, Jan Urbański wrote:
> Hi,
>
> attached is a patch implementing the usage of SPI cursors in PL/Python.
> Currently when trying to process a large table in PL/Python you have
> slurp it all into memory (that's what plpy.execute does).
>
> J
I found a few bugs (see my testing section below) that will need fixing
+ a few questions about the code
Overview & Feature Review
-----------
This patch adds cursor support to plpython. SPI cursors will allow
a plpython function to read process a large results set without having to
read it all into memory at once. This is a good thing. Without this
patch I think you could accomplish the same with using SQL DECLARE CURSOR
and SQL fetch. This feature allows you to use a python cursor as
an iterator resulting in much cleaner python code than the SQL FETCH
approach. I think the feature is worth having
Usability Review
----------------------
The patch adds the user methods
cursor=plpy.cursor(query_or_plan)
cursor.fetch(100)
cursor.close()
Do we like the name plpy.cursor or would we rather call it something like
plpy.execute_cursor(...) or plpy.cursor_open(...) or plpy.create_cursor(...)
Since we will be mostly stuck with the API once we release 9.2 this is worth
some opinions on. I like cursor() but if anyone disagrees now is the time.
This patch does not provide a wrapper around SPI_cursor_move. The patch
is useful without that and I don't see anything that preculdes someone else
adding that later if they see a need.
Documentation Review
---------------------
The patch includes documentation updates that describes the new feature.
The Database Access page doesn't provide a API style list of database access
functions like the plperl
http://www.postgresql.org/docs/9.1/interactive/plperl-builtins.html page
does. I think the organization of the perl page is
clearer than the python one and we should think about a doing some
documentaiton refactoring. That should be done as a seperate patch and
shouldn't be a barrier to committing this one.
Code Review
----------------
in PLy_cursor_plan line 4080
+ PG_TRY();
+ {
+ Portal portal;
+ char *volatile nulls;
+ volatile int j;
+
+ if (nargs > 0)
+ nulls = palloc(nargs * sizeof(char));
+ else
+ nulls = NULL;
+
+ for (j = 0; j < nargs; j++)
+ {
+ PyObject *elem;
I am probably not seeing a code path or misunderstanding something
about the setjmp/longjump usages but I don't see why nulls and j need to be
volatile here.
line 444
PLy_cursor(PyObject *self, PyObject *args)
+ {
+ char *query;
+ PyObject *plan;
+ PyObject *planargs = NULL;
+
+ if (PyArg_ParseTuple(args, "s", &query))
+ return PLy_cursor_query(query);
+
Should query be freed with PyMem_free()
Testing
-----------
I tested both python 2.6 and 3 on a Linux system
create or replace function x() returns text as $$
cur=None
try:
with plpy.subtransaction():
cur=plpy.cursor('select generate_series(1,1000)')
rows=cur.fetch(10);
plpy.execute('select f()')
except plpy.SPIError:
rows=cur.fetch(10);
return rows[0]['generate_series']
return 'none'
$$ language plpythonu;
select x();
crashes the backend
test=# select x();
The connection to the server was lost. Attempting reset: LOG: server
process (PID 3166) was terminated by signal 11: Segmentation fault
The below test gives me a strange error message:
create or replace function x1() returns text as $$
plan=None
try:
with plpy.subtransaction():
plpy.execute('CREATE TEMP TABLE z AS select generate_series(1,1000)')
plan=plpy.prepare('select * FROM z')
plpy.execute('select * FROM does_not_exist')
except plpy.SPIError, e:
cur=plpy.cursor(plan)
rows=cur.fetch(10)
return rows[0]['generate_series']
return '1'
$$ language plpythonu;
select x1();
test=# select x1()
test-# ;
ERROR: TypeError: Expected sequence of 82187072 arguments, got 0: <NULL>
CONTEXT: Traceback (most recent call last):
PL/Python function "x1", line 9, in <module>
cur=plpy.cursor(plan)
PL/Python function "x1"
STATEMENT: select x1()
I was expecting an error from the function just a bit more useful one.
Performance
-------------------
I did not do any specific performance testing but I don't see this patch
as having any impact to performance
From | Date | Subject | |
---|---|---|---|
Next Message | Jan Urbański | 2011-11-20 18:22:23 | Re: plpython SPI cursors |
Previous Message | Kevin Grittner | 2011-11-20 16:33:47 | Re: testing ProcArrayLock patches |