Re: [PATCH] Allow Postgres to pick an unused port to listen

Поиск
Список
Период
Сортировка
От Yurii Rashkovskii
Тема Re: [PATCH] Allow Postgres to pick an unused port to listen
Дата
Msg-id CA+RLCQzSS5hk03w22acZkt9KnVTkkzbs+RMZaO-jiycS_fM39A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Allow Postgres to pick an unused port to listen  (Aleksander Alekseev <aleksander@timescale.com>)
Ответы Re: [PATCH] Allow Postgres to pick an unused port to listen  (Denis Laxalde <denis.laxalde@dalibo.com>)
Список pgsql-hackers
Alexander,

On Wed, Apr 19, 2023 at 11:44 PM Aleksander Alekseev <aleksander@timescale.com> wrote:
Hi,

Here are my two cents.

> > I would like to suggest a patch against master (although it may be worth
> > backporting it) that makes it possible to listen on any unused port.
>
> I think this is a bad idea, mainly because this:
>
> > Instead, with this patch, one can specify `port` as `0` (the "wildcard"
> > port) and retrieve the assigned port from postmaster.pid
>
> is a horrid way to find out what was picked, and yet there could
> be no other.

What personally I dislike about this approach is the fact that it is
not guaranteed to work in the general case.

Let's say the test framework started Postgres on a random port. Then
the framework started to do something else, building a Docker
container for instance. While the framework is busy PostgreSQL crashes
(crazy, I know, but not impossible). Both PID and the port will be
reused eventually by another process. How soon is the implementation 
detail of the given OS and its setting.

Let's say Postgres crashed, and the port was not reused. In this case, the connection will fail. The test bench script can then, at the very least, try checking the log files to see if there's any indication of a crash there and report if one occurred. If the port was reused by something other than Postgres, the script should (ideally) fail to communicate with it using Postgres protocol.  If it was reused by another Postgres instance, it gets a bit tougher, but then the test bench can, upon connection, verify that it is the same system by comparing the system identifier on the file system (retrieved using pg_controldata) and over the wire (retrieved using `select system_identifier from pg_control_system()`)

I also suspect that this problem has a bigger scope than port retrieval. If one is to use postmaster.pid only for PID retrieval, then there's still no guarantee that between the time we retrieved the PID from the file and used it,
Postgres didn't crash, and the PID was not re-used by a different process, potentially even another postgres process launched in parallel by the test bench.

There are tools mentioned previously by me in the thread that allow inspecting which ports are opened by a given PID, and one can use those to provide an extra determination as to whether we're still on the right track. These tools
can also tell us what is the process name.

Ultimately, there's no transactionality in POSIX API, so we're always exposed to the chance of discrepancies between the inspection time and the next step.

A bullet-proof approach would be (approximately) for the test
framework to lease the ports on the given machine, for instance by
using a KV value with CAS support like Consul or etcd (or another
PostgreSQL instance), as this is done for leader election in
distributed systems (so called leader lease). After leasing the port
the framework knows no other testing process on the given machine will
use it (and also it keeps telling the KV storage that the port is
still leased) and specifies it in postgresql.conf as usual.

The approach you suggest introduces a significant amount of complexity but seemingly fails to address one of the core issues: using a KV store to lease a port does not guarantee the port's availability. I don't believe this is a sound way to address this issue, let alone a bulletproof one.

Also, I don't think there's a case for distributed systems here because we're only managing a single computer's resource: the allocation of local ports.

If I were to go for a more bulletproof approach, I would probably consider a different technique that would not necessitate provisioning and running additional software for port leasing. 

For example, I'd suggest adding an option to Postgres to receive sockets it should listen on from a UNIX socket (using SCM_RIGHTS message) and then have another program acquire the sockets using whatever algorithm (picking pre-set one, unused wildcard port, etc.) and start Postgres passing the sockets using the aforementioned UNIX socket. This program will be your leaseholder and can perhaps print out the PID so that the testing scripts can immediately use it. The leaseholder should watch for the Postgres process to crash. This is still a fairly complicated solution that needs some refining, but it does allocate ports flawlessly, relying on OS being the actual leaseholder and not requiring fighting against race conditions. I didn't go for anything like this because of the sheer complexity of it.

The proposed solution is, I believe, a simple one that gets you there in an awful majority of cases. If one starts running out in the error cases like port reuse or listener disappearance, the logic I described above may get them a step further. 
 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: eclg -C ORACLE breaks data
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Non-superuser subscription owners