Re: Implementing "thick"/"fat" databases

From: Chris Travers <chris(dot)travers(at)gmail(dot)com>
To: Sim Zacks <sim(at)compulab(dot)co(dot)il>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Implementing "thick"/"fat" databases
Date: 2011-07-25 15:24:38
Message-ID: CAKt_ZfsERr+4v8CA8zFvvmm+Xdf__oyth58FsgasQtDA6FgmSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sun, Jul 24, 2011 at 11:53 PM, Sim Zacks <sim(at)compulab(dot)co(dot)il> wrote:

>
> The goal is to make our system client agnostic, Most of our GUI is written
> in wxpython, we also have some web functions and even a barcode terminal
> function, written in C#. We would like to use an application server, so that
> all the code is run on the server but we don't want to be web-based. I don't
> want to have the same business logic written into various clients as that
> causes maintenance headaches.

Ok. So we are talking about a multi-application database, and you are
trying to provide some degree of consistency in business logic across
the software applications (and hence uses of your data). So far, so
good. Agreed to put anything that might need to be in common.

> The way that postgresql works is that each session is its own process. That
> means that from the server perspective (at least in most ways), my plpython
> function is not connected to any other call on the database. My goal is to
> allow any client to request functionality and have it execute the same way
> every time.

So.... I assume that means using Pl/Mono to make your barcode stuff
work in the db too?
>
> Data logic is built in by use of constraints and triggers and some
> functions, business logic is built only into functions.

I guess I am using "data logic" more broadly than you. Any select,
update, or insert statement against a relation (except, of course,
simple function calls) is part of that data logic, and there are
supporting dependencies too, such as security and permissions.

IOW, I think a lot of 'business logic' is data logic. I separate this
with what you actually use the data for.

IOW, Generate a report with the following format is data logic (even
perhaps, and email it to the head of product ordering), but use this
report to decide what inventory to order is application logic (you
probably want people making these decisions).

Similarly "Generate me a list of invoices to be paid in the following
format": is data logic, but selecting those invoices to be paid,
entering the data, determining what bank account you want to pay them
from etc. is application logic. Recording the payments is again data
logic. Application logic might be better thought of as workflow
logic.
>
> In other words, our postgresql server is a hybrid database/application
> server.

Right. One thing we are seeing in LedgerSMB as we go through this is
that the database does indeed become an application server. The areas
which are taking the new approach are mostly SQL and HTML templates
(in TemplateToolkit), while the Perl code is shrinking pretty fast.

I guess what I am wondering is if you are using it as an application
server, is there a benefit to trying to put everything in the
database? Or just those parts which necessarily follow from the
database independent of workflow, etc or are likely to be common
across diverse applications hitting your server?

If so, what is it?
>

> As I mentioned, I am still playing around and testing this, and have
> The concept is that each client has a SocketServer thread that listens on a
> specific port (8080 in this case).

Ok, so just pointing out here that if you go this route you are
starting to break the client-agnosticism requirement, unless I
misunderstand what you are trying to do :-).

> My database function is called python_prompt pasted below.
> It has a timeout of 2 minutes after which the function errors out with a
> timeout. The user will get a message that the function did not complete. The
> important thing is that it has a timeout. If the timeout is 20 minutes, that
> is also ok. Depending on the query that calls this function, there do not
> have to be any locks being used and as this is its own process and it is
> basically sleeping while waiting for the answer, it doesn't use resources
> (aside from a connection and the work mem in the calling query, of course).

Now, if for sake of argument this happens after an insert or update,
that means things are locked for that time, right? This would be more
likely to cause deadlocks and if the person steps up entering the
data, there would be a rollback, right?

>
> Using this, a plpgsql query can prompt the user, "You are about to use more
> stock then you have, are you sure you want to do this?"
>
> CREATE OR REPLACE FUNCTION python_prompt(v_question text)
>  RETURNS boolean AS
> $BODY$
> import time
> import socket
> ipaddr=plpy.execute("select inet_client_addr()")[0]["inet_client_addr"]
> HOST, PORT = str(ipaddr), 8080
> sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
> sock.settimeout(120)
> sock.connect((HOST, PORT))
> sock.send(v_question + "\n")
> ans=sock.recv(1024)
> sock.close()
> if ans=="yes":
>    return true
> else:
>    return false
> $BODY$
>  LANGUAGE plpythonu VOLATILE;

I guess I am wondering what you get by requiring that a client listens
on port 8080 AND knows how to communicate with the server on it that
you don't get from assuming that the client can manage the workflow
(breaking the routines into two separate ones and prompting for input
between). I also think there is a better way to solve this problem
(see below).
>
> I haven't fully decided to use this yet, but one of my goals is to make my
> GUI as dumb as possible.
>
Probably a personal preference but I don't like to try to push
something as far as possible :-)

Anyway, here's an alternative to your interactive query approach, the
one we use for LedgerSMB user creation for 1.3. it';s the "if you
don't tell me to do it anyway, I am raising an exception" approach.

Basically the problem is that db users are used to enforce application
permissions, and db users are cluster-wide while application users are
specific to a database. So what happens when you are running two
companies (which have to have complete data separation and different
permissions) but a single user account is needed to access both of
them?

What we decided to do is have a stored procedure which creates the
user, sets the password (valid for 1 day by default, until changed by
the user, at which point it is valid for an admin-specified time).

Now, one parameter to the function is whether this is an import of an
existing db role or not (i.e. not == creation of new user/role). In
the event you specify that this is an import, if you also specify a
password, you get an error unconditionally. Don't want to change
password on import. However, if you don't specify that this is an
import and the role exists you get a different exception. Exception
strings in this case are short but descriptive ('No password allowed,'
and 'Duplicate user'), and these are handed back up to the application
to process. The application then can flag that this role exists, and
the user of the software can resubmit as an import. Now there's no
reason the application if it wasn't a web-based app couldn't just pop
up a confirmation box saying "This role exists. Do you want to
import? Yes/no" and handle it there.

Similarly if you add a do_anyway arg to your invoice item query, you
can raise an exception "Negative Inventory" and the application can
handle that, flag the user, and let the user decide whether to do it
anyway.

It seems to me that as you have timeouts, the application has to be
prepared to retry functions anyway.

Also one huge cost to putting too much interactive logic in the
database in some environments. I haven't yet found a decent way of
localizing strings in database queries. That means that if you want
to support multiple languages, it's a LOT easier to deal with these
strings if they are in the client codebase than if they are in the
server codebase and interpolated in the client (gettext throws errors
with variable interpolation if you identify the string on the server
and mix localization of client-side things with server-side strings).
I suppose you can patch gettext to ignore these errors.......

Best Wishes,
Chris Travers

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Chris Travers 2011-07-25 15:35:01 Re: Implementing "thick"/"fat" databases
Previous Message Tom Lane 2011-07-25 14:47:47 Re: streaming replication does not work across datacenter with 20ms latency?