Re: Report search_path value back to the client.

From: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alexey Klyukin <alexk(at)hintbits(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Report search_path value back to the client.
Date: 2015-03-02 13:55:11
Message-ID: CAFh8B=kYfZekb7tVZZSNWQzXSYMKmtJeQLqbLZbhjdLujENRcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

2015-02-20 16:19 GMT+01:00 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:

> 2. Semantics. The existing GUC_REPORT variables are all things that
> directly relate to client-visible behavior, eg how values of type
> timestamp will be interpreted and printed. search_path is no such thing,
> so it's hard to make a principled argument for reporting it that doesn't
> lead to the conclusion that you want *everything* reported. (In
> particular, I don't believe at all your argument that this would help
> pgbouncer significantly.)
>

Well, I will try to explain in details why do we want to use pgbouncer in
front of postgres and what stops us from doing this.
Usually our applications don't access data directly but via stored
procedures. One database can be accessed by different applications in
different versions. Different versions of applications may use different
sets and versions of stored procedures. It achieved by keeping stored
procedures in different schemas (we call them API schema). For example at
the same time 'app-v1' is working with stored procedures located in schema
'app_api_v1', 'app-v2' with 'app_api_v2' and 'other-app-v3' is working with
'other_app_api_v3'. This is very convenient. Before deploying new version
of application we create new 'api schema' for it and then create all stored
procedures in this schema. Directly after connecting to database
application executes 'set search_path to app_api_vX, public' and after this
point it always call stored procedures without specifying schema name. This
way we are able to keep different versions of the same stored procedure in
database.
And here the problem come. Sometimes we have dozens of application
instances. Each application instance might keep more than one connection to
database. In total there can be hundreds connections but most of the time
they are not in state 'active'. For such cases connection pools have been
invented. It's cheep to keep even thousands open connections to event-based
connection pool comparing with the real database. The first idea that comes
into mind - lets put pgbouncer with pool_mode = transaction in front of
database. But here our good old friend 'set search_path to app_api_vX,
public' does weird things with pgbouncer. The same server connection might
be reused by different applications in different versions and soon or later
application will get: 'ERROR: function foo() does not exist' because
search_path for this connection between pgbouncer and postgres has been
changed by another application.

My first idea was - it should be possible to fix pgbouncer and make it
aware of current search_path value of connections between application <->
pgbouncer and pgbouncer <-> postgres. I started to look into it's source
code and yea, this can be done relatively easy, because pgbouncer already
does similar job for "client_encoding", "DateStyle", "TimeZone",
"standard_conforming_strings" and "application_name", which doesn't relate
to the 'application visible behaviour' more than the search_path. Pgbouncer
always keeps current values of these variables for each connection and when
it takes a connection from the pool, it compares the values of
client_connection and server_connection parameters. If they don't match -
it changes the value on the server side by calling 'set variable to
client_value'.

Execution of statement 'set some_variable to some_value' from the client
side is not the unique way to change value of variable. As you already
mentioned such statement can be executed from stored procedure as well. So
scanning of client statements is not an option. Only postgres is able to
tell at any time current value of search_path and asking it every time for
current value also is not an option, because this will double amount of
queries executed by database. Luckily, postgres already has this wonderful
mechanism as GUC_REPORT...

If the postgres will report the value of search_path on connect and on
change - pgbouncer could be really easy patched:

diff --git a/include/varcache.h b/include/varcache.h
index 4984b01..916fa01 100644
--- a/include/varcache.h
+++ b/include/varcache.h
@@ -5,6 +5,7 @@ enum VarCacheIdx {
VTimeZone,
VStdStr,
VAppName,
+ VSearchPath,
NumVars
};

diff --git a/lib b/lib
index fe4dd02..c0111d4 160000
--- a/lib
+++ b/lib
@@ -1 +1 @@
-Subproject commit fe4dd0255ea796e64599c0fc28e6900fbc07e6aa
+Subproject commit c0111d42712202df5ccb01c4e07e88ab5c8b94b8
diff --git a/src/varcache.c b/src/varcache.c
index 6321dc5..cb3f559 100644
--- a/src/varcache.c
+++ b/src/varcache.c
@@ -35,6 +35,7 @@ static const struct var_lookup lookup [] = {
{"TimeZone", VTimeZone },
{"standard_conforming_strings", VStdStr },
{"application_name", VAppName },
+ {"search_path", VSearchPath },
{NULL},
};

We could possibly alleviate problem #1 by changing the behavior of guc.c
> so it doesn't report every single transition of flagged variables, but
> only (say) once just before ReadyForQuery if the variable changed in
> the just-finished command. That's not exactly a one-line fix though.
>

Probably for some variables this really make sense. Inside stored
procedures any of GUC_REPORT variables can be changed the same way as
search_path (thousands of times), but not all of these variables directly
relate to the application visible behaviour.

Regards,
Alexander Kukushkin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kouhei Kaigai 2015-03-02 14:07:36 Re: Join push-down support for foreign tables
Previous Message Thom Brown 2015-03-02 13:50:42 Re: Join push-down support for foreign tables