Re: [PATCH] Allow Postgres to pick an unused port to listen

From: Yurii Rashkovskii <yrashk(at)gmail(dot)com>
To: Aleksander Alekseev <aleksander(at)timescale(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: [PATCH] Allow Postgres to pick an unused port to listen
Date: 2023-04-20 04:30:46
Message-ID: CA+RLCQzSS5hk03w22acZkt9KnVTkkzbs+RMZaO-jiycS_fM39A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alexander,

On Wed, Apr 19, 2023 at 11:44 PM Aleksander Alekseev <
aleksander(at)timescale(dot)com> wrote:

> Hi,
>
> Here are my two cents.
>
> > > I would like to suggest a patch against master (although it may be
> worth
> > > backporting it) that makes it possible to listen on any unused port.
> >
> > I think this is a bad idea, mainly because this:
> >
> > > Instead, with this patch, one can specify `port` as `0` (the "wildcard"
> > > port) and retrieve the assigned port from postmaster.pid
> >
> > is a horrid way to find out what was picked, and yet there could
> > be no other.
>
> What personally I dislike about this approach is the fact that it is
> not guaranteed to work in the general case.
>
> Let's say the test framework started Postgres on a random port. Then
> the framework started to do something else, building a Docker
> container for instance. While the framework is busy PostgreSQL crashes
> (crazy, I know, but not impossible). Both PID and the port will be
> reused eventually by another process. How soon is the implementation

detail of the given OS and its setting.
>

Let's say Postgres crashed, and the port was not reused. In this case, the
connection will fail. The test bench script can then, at the very least,
try checking the log files to see if there's any indication of a crash
there and report if one occurred. If the port was reused by something other
than Postgres, the script should (ideally) fail to communicate with it
using Postgres protocol. If it was reused by another Postgres instance, it
gets a bit tougher, but then the test bench can, upon connection, verify
that it is the same system by comparing the system identifier on the file
system (retrieved using pg_controldata) and over the wire (retrieved
using `select system_identifier from pg_control_system()`)

I also suspect that this problem has a bigger scope than port retrieval. If
one is to use postmaster.pid only for PID retrieval, then there's still no
guarantee that between the time we retrieved the PID from the file and used
it,
Postgres didn't crash, and the PID was not re-used by a different process,
potentially even another postgres process launched in parallel by the test
bench.

There are tools mentioned previously by me in the thread that allow
inspecting which ports are opened by a given PID, and one can use those to
provide an extra determination as to whether we're still on the right
track. These tools
can also tell us what is the process name.

Ultimately, there's no transactionality in POSIX API, so we're always
exposed to the chance of discrepancies between the inspection time and the
next step.

>
> A bullet-proof approach would be (approximately) for the test
> framework to lease the ports on the given machine, for instance by
> using a KV value with CAS support like Consul or etcd (or another
> PostgreSQL instance), as this is done for leader election in
> distributed systems (so called leader lease). After leasing the port
> the framework knows no other testing process on the given machine will
> use it (and also it keeps telling the KV storage that the port is
> still leased) and specifies it in postgresql.conf as usual.
>

The approach you suggest introduces a significant amount of complexity but
seemingly fails to address one of the core issues: using a KV store to
lease a port does not guarantee the port's availability. I don't believe
this is a sound way to address this issue, let alone a bulletproof one.

Also, I don't think there's a case for distributed systems here because
we're only managing a single computer's resource: the allocation of local
ports.

If I were to go for a more bulletproof approach, I would probably consider
a different technique that would not necessitate provisioning and running
additional software for port leasing.

For example, I'd suggest adding an option to Postgres to receive sockets it
should listen on from a UNIX socket (using SCM_RIGHTS message) and then
have another program acquire the sockets using whatever algorithm (picking
pre-set one, unused wildcard port, etc.) and start Postgres passing the
sockets using the aforementioned UNIX socket. This program will be your
leaseholder and can perhaps print out the PID so that the testing scripts
can immediately use it. The leaseholder should watch for the Postgres
process to crash. This is still a fairly complicated solution that needs
some refining, but it does allocate ports flawlessly, relying on OS being
the actual leaseholder and not requiring fighting against race conditions.
I didn't go for anything like this because of the sheer complexity of it.

The proposed solution is, I believe, a simple one that gets you there in an
awful majority of cases. If one starts running out in the error cases like
port reuse or listener disappearance, the logic I described above may get
them a step further.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-04-20 05:08:02 Re: Non-superuser subscription owners
Previous Message Kyotaro Horiguchi 2023-04-20 04:30:16 Re: eclg -C ORACLE breaks data