Re: Psycopg2 and LIXA

From: Christian Ferrari <camauz(at)yahoo(dot)com>
To: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
Cc: "psycopg(at)postgresql(dot)org" <psycopg(at)postgresql(dot)org>
Subject: Re: Psycopg2 and LIXA
Date: 2012-02-12 22:02:58
Message-ID: 1329084178.24559.YahooMailNeo@web29501.mail.ird.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: psycopg

Hi Daniele,
I've greatly appreciated your answer. There is a lot of stuff to work on.
At first glance it seems some of the standard followed by LIXA could hurt some Python de-facto standard, but I think a solution path can be found.

>> Thinking about integration between LIXA and Psycopg2 I'm proposing three different paths:
[...]

>> 3. it could be very easy to overload the "psycopg2.connect()" method: if it accepted a "PGconn *" too, the integration would be straightforward like: psycopg2.connect([...], lixa.lixa_pq_get_conn() )

> The big shortcoming I see in this method is that it takes a Python
> object which is a wrapper to the PGconn, to be accessed from C. Swig
> is not the only way to create such a wrapper, so I wouldn't like to
> bind the psycopg C implementation specifically to swig (note that
> psycopg should unpack manually the swig wrapper: it wouldn't be
> automatic as psycopg is not swig-generated). A more portable method
> would be to have connect() just receive an integer which would be a
> pointer to the PGconn... but I wouldn't define it as a robust
> interface! An entirely different wrapper for dynamic libraries is the
> one provided by ctypes <http://docs.python.org/library/ctypes.html>,
> which is part of the Python standard library and requires no code
> generation beforehand (two reasons for which it could be considered a
> somewhat blessed wrapper).

At this time I'm looking to SWIG just because it could help me in wrapping LIXA with Python, PHP, Perl and Ruby; without a tool like SWIG, the mileage would probably be too long.
I don't know CTYPE (unfortunately I'm not a Python expert at all) and in the meantime I'm going to discover the power of CTYPE.
I agree with you: if Psycopg2 is not based on SWIG, it will be a bad choice to bind it with SWIG for the sake of this integration.
The idea behind 

psycopg2.connect([...], lixa.lixa_pq_get_conn() )

is: if Psycopg2 could accept the PostgreSQL (libpq-fe) connection from third party supplying an overloaded method, I could integrate LIXA with Psycopg2 without strange tricks and hacks inside LIXA. The type of object an overloaded psycopg2.connect method accepted, should be the best choice for Psycopg2, practical for LIXA.
We could discuss this point in a successive reply because I'm more interested in the following issues.

>> What's your opinions and suggestions?
>
> I guess it depends on how do you want your library to be used by the
> Python code, what level or transparency you require from it, or
> conversely how much explicit you want using lixa to be.

I think this is the right time to explain some design choices of LIXA.
When I started developing LIXA (3 years ago), I was principally interested in the XA protocol. I studied the official document published by X/Open and discovered XA is a *system* interface: it was designed as a standard API (and protocol) between one Transaction Manager and many Resource Managers. It was not designed to be used by the developers of an Application Program (I'm using the same terminology used by X/Open documentation).
I discovered there was a standard for the interface exposed by a Transaction Manager and invoked by an Application Program: it's name is "TX (Transaction Demarcation) Specification".
In my honest opinion TX is not a marvel, but incidentally it was supported by Encina (now IBM TXSeries) and by Tuxedo (once BEA, now Oracle).
I chose to avoid reinventing the wheel and sticked to TX standard.
The TX API is not complete and does not solve some issues, but it has two interesting features: it's easy to understand (and implement) and it doesn't specify too many restrictions (some issues can be bypassed). Speaking about TX, I would say "it just works".
The TX API was designed for C and COBOL languages: I suppose no one could imagine a crazy guy would try to extend it to Python, PHP, Perl and Ruby that time.

> One thing I
> notice, I don't know how much do you know about it, is that all Python
> database modules implement the same basic interface, called DBAPI
> <http://www.python.org/dev/peps/pep-0249/>: this interface also
> defines how to perform 2-phase commit, and it declares to follow the
> same XA X/Open standard lixa implements... although it does with
> different methods.

I briefly examined DBAPI, but unfortunately there is a major drawback: DBAPI supplies an API that is equivalent to XA and can be used to implement a Transaction Manager, but it should not be used by an Application Program. If an Application Program had to deal with "prepare" and "recover" verbs, it would implement a Transaction Manager itself.

> I'm afraid I'm no 2PC expert (although I've
> implemented the support for such methods in psycopg). So in first
> place I wonder if there is a different level at which the libraries
> may interoperate, using the DBAPI 2PC-related method. BTW, if lixa
> could operate with the generic DBAPI 2-phase interface, it could work
> for free not only with psycopg but with any driver implementing such
> interface (well, to be honest I don't know how many of them exist
> yet...).

LIXA already implements all the Transaction Manager logic, and all the
code is C code: extending that logic to deal with Python API is a
complex task with many risks.

> Also note that if you force your database connection to have a
> specific interface of yours, you would make harder for Python users to
> use such connection in conjunction with already written code or third
> party libraries: you'd have much more success if you could map the xa
> methods to dbapi methods, which means no tx_close() and such (good
> Python interfaces tend to differ from good C interface, that's why
> swing-generated wrappers are usually poor ones by themselves and
> require further wrapping to stop being painful).

I don't think this is a real issue: LIXA supports Distributed Transaction Processing using TX API. There's no way to pick-up an Application Program designed for one phase commit and convert it to an Application Program for two phase commit without some changes. The scenario I imagine is the following one:
1. there is an Application Program designed for only one Resource Manager, for example PostgreSQL
2. the same Application Program must be re-engineered to deal with two Resource Managers (PostgreSQL and MySQL) and some transactions must change data inside both Resource Managers (INSERT INTO PostgreSQL, UPDATE MySQL).
If the data was critical, the developer would use a Transaction Manager with 2 phase commit support: LIXA might be a choice.

> Before suggesting any specific solution, I'd like to know what is
> between the client function I understand the lixa user should invoke
> (lixa_pq_get_conn()) and the function that actually creates a libpq
> connection (lixa_pq_open()):

Using TX API (supplied by LIXA), there are four steps:
1. tx_open()
2. tx_begin()
3. connection handler retrieval
4. business logic (using connection handler, PGconn * for PostgreSQL)

(2. and 3. can be swapped if necessary)
This is one of the key aspects of TX: the connection must be opened by the Transaction Manager, the configuration necessary to open the connection must be managed by the system engineers in charge of the Transaction Manager.
When you use TX, you don't have to know how the Resource Managers will be reached. TX standard does not specify how the Transaction Manager implements the behavior, it only specifies it's a task of the Transaction Manager.
LIXA implementation uses a flexible approach: a configuration file contains some profiles, every profile references a set of Resource Managers and theirs configurations. If the Application Program does not specify the LIXA_PROFILE environment variable, it will use the default (first) available profile; if the Application Program specifies the LIXA_PROFILE environment variable, it will use the desired profile. Some commercial Transaction Managers does not allow such flexibility: they behave like LIXA with a single configured profile.
The X/Open (TX) standard specifies the connection must be opened by the Transaction Manager for many reasons; there are at least two really important reasons:
1. the Resource Manager can not be used independently by the Transaction Manager (an Application Program could create a session, perform some work and then pass it to the Transaction Manager creating a potential inconsistent state)
2. the Transaction Manager inspects the Resource Managers at tx_open() time to perform automatic recovery of previous prepared/recovery pending transactions

The method "lixa_pq_get_conn()" is a work-around necessary for PostgreSQL and MySQL (lixa_my_get_conn()). Oracle and DB2 do not need such a work-around: they supply specific API; PostgreSQL and MySQL *do* *not* *implement* *standard* *XA*, but only some proprietary extensions that can be used to arrange an XA like interface (LIXA provides that stubs too).

> I suspect a good solution could be for
> the lixa code to create a psycopg connection (which in turn calls
> PQconnectdb) and get the PGconn from there, then returning the python
> object to the invoking python code. Also, because lixa seems modular,
> couldn't you create a "psycopg" module, which would largely use
> the
> same "postgresql" module implementation but would offers methods that
> are meaningful for a Python user (i.e. return a connectionObject
> instead of a PGconn)?

Wrapping "lixa_pq_get_conn()" (that's C code) with a method retrieving a different type could be done at C level and Python level as well. Where could I found the exact "connectionObject" specification? Could you point me to the right direction at first step?

> I believe it is possible for the two libraries to interoperate, but I
> think the implementation is only a detail, easy to solve: I'd rather
> try to understand the way lixa would  be used from Python (creation,
> usage, finalization of the connections, of the transactions and the
> relation between the two) and derive the implementation from such use
> case.

> -- Daniele

There's probably another interesting detail: LIXA implementation of TX API is thread safe and "thread related".
Two distinct threads must invoke two distinct "tx_open()/tx_close()" functions.
The transactional state must not be shared between distinct threads because the state is indexed using the thread id (TX functions do not pass a reference to the state, so it must be implicitly managed by the API).
This is an example:
Thread1                  Thread2
tx_open()                tx_open()
tx_begin()               tx_begin()
some stuff               some stuff
tx_commit()              tx_rollback()
tx_begin()               tx_close()
some stuff
tx_rollback()
tx_close()
the connection handler must not be passed by Thread1 to Thread2 and vice versa.

Thanks in advance.
Ch.

In response to

Responses

Browse psycopg by date

  From Date Subject
Next Message Federico Di Gregorio 2012-02-13 08:52:26 Re: Psycopg2 and LIXA
Previous Message Daniele Varrazzo 2012-02-12 02:51:57 Re: Psycopg2 and LIXA