Обсуждение: PG17beta1: Unable to test Postgres on Fedora due to fatal Error in psql: undefined symbol: PQsocketPoll

Поиск
Список
Период
Сортировка

  1. Problem and reproducability 


After the release of  PG17b1 I wanted to test it on a newly installed machine:

I installed Fedora 40 Server on x86-64 and did a full dnf update (as of 23.may 2024).


To self-compile from source I did:


sudo dnf group install "C Development*" "Development*"

sudo dnf install wine mold meson perl zstd lz4 lz4-devel libzstd-devel
(This may be a little too much, I was lazy for minimal installation)

I built Postgres 17 beta1 from source with meson and installed it with ninja install.
I started the server and created a new database with psql
I restored a test database from dump with pg_restore.

When I tried to connect to the restored database with psql \c I got:


[root@localhost local]# sudo -u postgres pgbeta/bin/psql -h /tmp -p 5431
psql (17beta1)
Type "help" for help.

postgres=# select version ();
                              version
--------------------------------------------------------------------
 PostgreSQL 17beta1 on x86_64-linux, compiled by gcc-14.1.1, 64-bit
(1 row)

postgres=# \l
                                                List of databases
   Name    |  Owner   | Encoding | Locale Provider | Collate | Ctype | Locale | ICU Rules |   Access privileges
-----------+----------+----------+-----------------+---------+-------+--------+-----------+-----------------------
 cpsdb     | postgres | UTF8     | builtin         | C       | C     | C      |           |
 postgres  | postgres | UTF8     | builtin         | C       | C     | C      |           |
 template0 | postgres | UTF8     | builtin         | C       | C     | C      |           | =c/postgres          +
           |          |          |                 |         |       |        |           | postgres=CTc/postgres
 template1 | postgres | UTF8     | builtin         | C       | C     | C      |           | =c/postgres          +
           |          |          |                 |         |       |        |           | postgres=CTc/postgres
(4 rows)

postgres=# \c cpsdb
pgbeta/bin/psql: symbol lookup error: pgbeta/bin/psql: undefined symbol: PQsocketPoll
[root@localhost local]#

----------------------------------------------------------------------- ERROR ^^^^^^^^^^^^^^^^^^^^^^

So it was not possible to use the database locally with psql.

2. Analysis

(To my understanding) the problem comes from incompatible libpq.so libraries on the system.
The installation of all the development packages installed :

[root@localhost lib64]# dnf list installed libpq
Installed Packages
libpq.x86_64                                         16.1-4.fc40                                          @fedora

This is the older version provided by Fedora (Nov 2023!, even after 16.3 from May 2024)

3. Questions

- Why doesn't psql use the just created lib64/libpq.so.5.17 from ninja install?

The loading of the locally available libpq.so should always have priority over a system wide in /usr/lib64

- Why is the Fedora supplied library 2 minor versions behind?

- How to install the new libpq systemwide to make it usable for all applications (not only postgres supplied clients)?
To my understanding libpq is normally downward compatible, so it may be possible to use libpq17 against all supported older releases

4. Proposals

- The distributors like Fedora, Debian,Ubuntu etc. should be encouraged to update the minor versions IN A TIMELY FASHION like any other upgrades: Minor versions normally don't break anything and often contain security fixes and important bug fixes valuable to all update-willing users of the system. Perhaps somebody deeper involved can support the distributors in this case.

-  PGDG should provide the newest libpq.so (beta or stable) in its common repository starting at the first beta release of a major version.
So everybody can install it separately and test it against its own application. This should ease real world testing alot.

- Due to the downward compatibility of libpq and the difficulty of handling multiple versions on the same machine I propose to always provide the newest libpq (highest stable version and latest beta version) for separate installation.
This should be independend installable from the main packages, much like the Visual runtime libraries for applications under Windows.

The user can choose between the latest stable version (at this time libpq 16.3), the latest beta (at this time libpq 17beta) or the version belonging to its major/minor version. This should be documented and be easyly changeable.

The buildfarm could be run always with the most current version. libpq could be thought of a separate product as base for all client applications.

5. Solving the problem

I don't know how to solve the problem in an official way.

I haven't tried manual changes in the system (copying binaries, changing symbolic links, making own packages etc.)

I am able to access the test database from outside (from a windows psql client of pg17b1), but this is not very practical.

The same problem occurs on my other machine running Fedora 39 and should occur in many other distributions also.

I think a self compiled version of postgres should be self-confined and ready to run for testing.


Any thoughts?

Hans Buschmann






 

Hans Buschmann <buschmann@nidsa.net> writes:
> When I tried to connect to the restored database with psql \c I got:
> ...
> postgres=# \c cpsdb
> pgbeta/bin/psql: symbol lookup error: pgbeta/bin/psql: undefined symbol: PQsocketPoll

> (To my understanding) the problem comes from incompatible libpq.so libraries on the system.

Right, you must have a v16-or-earlier libpq lying around somewhere,
and psql has bound to that not to the beta-test version.
PQsocketPoll is new in v17.

> - Why doesn't psql use the just created lib64/libpq.so.5.17 from ninja install?

It's on you to ensure that happens, especially on Linux systems which
have a strong bias towards pulling libraries from /usr/lib[64].
Normally our --enable-rpath option is sufficient; while that's
default in an autoconf-based build, I'm not sure that it is
in a meson build.  Also, if your beta libpq is not where the
rpath option expected it to get installed, the linker will silently
fall back to /usr/lib[64].

> The loading of the locally available libpq.so should always have priority over a system wide in /usr/lib64

Tell it to the Linux developers --- they think the opposite.
Likewise, all of your other proposals need to be addressed to
the various distros' packagers; this is not the place to complain.

The main thing that is bothering me about the behavior you
describe is that it didn't fail until psql actually tried to
call PQsocketPoll.  (AFAICT from a quick look, that occurs
during \c but not during the startup connection.)  I had thought
that we select link options that result in early binding and
hence startup-time failure for a case like this.  I can confirm
though that this acts as described on my RHEL8 box if I force
current psql to link to v16 libpq, so either we've broken that
or it never did apply to frontend programs.  But it doesn't
seem to me to be a great thing for it to behave like this.
You could easily miss that you have a broken setup until
after you deploy it.

            regards, tom lane