Re: PROXY protocol support

From: Julien Riou <julien(at)riou(dot)xyz>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org,Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: PROXY protocol support
Date: 2019-05-19 20:53:26
Message-ID: 45EE9893-5885-476E-85B6-F8F8E6AE9FA3@riou.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On May 19, 2019 5:59:04 PM GMT+02:00, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>Greetings,
>
>* Julien Riou (julien(at)riou(dot)xyz) wrote:
>> Nowadays, PostgreSQL is often used behind proxies. Some are
>PostgreSQL
>> protocol aware (Pgpool, PgBouncer), some are pure TCP (HAProxy). From
>> the database instance point of view, all clients come from the proxy.
>>
>> There are two major problems with this topology:
>>
>> * It neutralizes the host based authentication. Every client shares
>> the same source. Either we allow this source or not but we cannot
>allow
>> clients on a more fine-grained basis, or not by the IP address.
>
>You can instead have the IP-based checking done at the pooler.
>
>> * It makes debugging harder. If we have a DDL or a slow query logged,
>we
>> cannot use the source to identify who is responsible.
>
>Protocol-level poolers are able to do this, and pgbouncer does (see
>application_name_add_host).
>
>> On one hand, we can move the authentication and logging mechanisms to
>> PostgreSQL based proxies but they will never be as complete as
>> PostgreSQL itself. And they don't have features like HTTP health
>checks
>> to redirect trafic to nodes (health, role, whatever behind the URL).
>On
>> the other hand, those features are not implemented at all because
>they
>> don't know the PostgreSQL protocol, they simply forward requests.
>>
>> In the HTTP reverse proxies world, there's a "dirty hack" to identify
>> the source IP address: add an HTTP header "X-Forwared-For" to the
>> request. It's the destination duty to do whatever they want with this
>> information. With this feature in mind, someone from HAProxy has
>> implemented this mechanism at the protocol level. It's called the
>PROXY
>> protocol.
>
>Someone from HAProxy could certainly implement something similar by
>having HAProxy understand PostgreSQL's protocol.
>
>> With this piece of logic at the beginning of the protocol, we could
>> implement a totally transparent proxy and benefit from the great
>> features of PostgreSQL regarding clients. Note that MariaDB support
>the
>> PROXY protocol in MaxScale (proxy) and MariaDB Server in recent
>> versions.
>
>pgbouncer is already a transparent proxy that understands the PG
>protocol, and, even better, it has support for transaction-level
>pooling
>(as well as connection-level), which is really critical for larger PG
>deployments as PG backend startup is (relatively) expensive.
>
>> PS: I've already sent this message to a wrong mailing list. Stephen
>> Frost said it's implemented in pgbouncer but all I can find is an
>open
>> issue: https://github.com/pgbouncer/pgbouncer/issues/241.
>
>That would be some *other* proxy system (Amazon's ELB) that apparently
>also doesn't understand the PG protocol and therefore doesn't have a
>feature similar to pgbouncer's application_name_add_host.
>
>I haven't looked very closely at if it'd be possible to interpret the
>PROXY protocol thing that Amazon's ELB can do without confusing it with
>a regular PG authentication startup and I'm not sure if we'd really
>want
>to wed ourselves to something like that. Certainly, what pgbouncer
>does
>works quite well and is about as transparent to clients as possible.
>
>You'd almost certainly want something like pgbouncer after the ELB
>anyway to avoid having tons of connections to PG and avoid spinning up
>new backends constantly.
>
>Thanks,
>
>Stephen

It could be proprietary Amazon load balancers I don't have experience with, or simple HAProxy coupled with a Patroni HTTP API to tell if a backend is healthy or not.

The PgBouncer approach is interesting. I'm already using the application name as a workaround to identify containerized applications but didn't used it for setting the source IP.

If we take a look at the MariaDB implementation, they check for errors in the startup packet then run the PROXY protocol decoding then return a real error if it doesn't work. As our bouncers are all behind a pool of HAProxy, and if we consider PgBouncer as a trusted extension of PostgreSQL, maybe implementing it in PgBouncer first will be easier.

Thanks for your insightful comments.
Julien

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-05-19 21:38:26 Re: Why is infinite_recurse test suddenly failing?
Previous Message Tom Lane 2019-05-19 19:48:02 Re: Emacs vs pg_indent's weird indentation for function declarations