Обсуждение: BUG #17923: Excessive warnings of collation version mismatch in logs

Поиск
Список
Период
Сортировка

BUG #17923: Excessive warnings of collation version mismatch in logs

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      17923
Logged by:          Hans Buschmann
Email address:      buschmann@nidsa.net
PostgreSQL version: 15.2
Operating system:   Fedora 38 x86-64
Description:

We recently upgraded our production system from FEDORA 37 to FEDORA 38
(x86-64).

This also upgraded the glibc from 2.36 to 2.37.

Quite a period later we observed excessive logging (every 30 seconds) of the
following messages:

LSAM  2023-04-18 17:10:08 CEST  01000  1:> WARNING:  database "template1"
has a collation version mismatch
LSAM  2023-04-18 17:10:08 CEST  01000  2:> DETAIL:  The database was created
using collation version 2.36, but the operating system provides version
2.37.
LSAM  2023-04-18 17:10:08 CEST  01000  3:> HINT:  Rebuild all objects in
this database that use the default collation and run ALTER DATABASE
template1 REFRESH COLLATION VERSION, or build PostgreSQL with the right
library version.
LSAM  2023-04-18 17:10:38 CEST  01000  1:> WARNING:  database "postgres" has
a collation version mismatch
LSAM  2023-04-18 17:10:38 CEST  01000  2:> DETAIL:  The database was created
using collation version 2.36, but the operating system provides version
2.37.
LSAM  2023-04-18 17:10:38 CEST  01000  3:> HINT:  Rebuild all objects in
this database that use the default collation and run ALTER DATABASE postgres
REFRESH COLLATION VERSION, or build PostgreSQL with the right library
version.
... etc.

Our production database has collation c, so it is not mentioned in the
logs.

Unfortunately the frequent repetition of these messages caused more the
48000 identical entries in about 14 days.

The following questions arise:

1. Why isn't it sufficient to report this message only at server startup ?
The messages are sent to psql too, but only when logging into one of these
mostly unused databases, so it is not observed in a regular routine.

2. Why is this message reported for databases with no objects or similar
cases?
These empty databases have no objects, other databases may have only objects
where a collation version change doesn't matter, such as indexes on
non-string data types like integers. In these cases no warning is
necessary.

3. Is there a real collation change in glibc 2.37 or is it only a version
number mismatch with changes only in other parts of the library?

4. Could the version information be shown in \l output of psql?

5. Wouldn't it make sense to automatically update the collation version
number of such databases (without any collation-related objects) at server
startup?
This resembles the praxis of initdb, which quietly sets the collation
version number at creation time mostly without any user intervention.

Thanks for looking

Hans Buschmann


Re: BUG #17923: Excessive warnings of collation version mismatch in logs

От
Peter Eisentraut
Дата:
On 07.05.23 12:13, PG Bug reporting form wrote:
> 1. Why isn't it sufficient to report this message only at server startup ?
> The messages are sent to psql too, but only when logging into one of these
> mostly unused databases, so it is not observed in a regular routine.

Each database is separate.  The server doesn't look at each database on 
startup.  This is only done when you connect to a given database.

> 2. Why is this message reported for databases with no objects or similar
> cases?

When you connect to a database, the server doesn't scan all objects in 
the database to see if they might be affected.  That would be too slow.

If you have no objects in the database, you can just run the provided 
REFRESH command to make the warning go away.

> 3. Is there a real collation change in glibc 2.37 or is it only a version
> number mismatch with changes only in other parts of the library?

We don't know, because glibc isn't reliable at telling us about it.  So 
we take the safe course.

> 4. Could the version information be shown in \l output of psql?

That's a valid idea.

> 5. Wouldn't it make sense to automatically update the collation version
> number of such databases (without any collation-related objects) at server
> startup?

See #2.  We are not going to start scanning each database on each 
connection to check whether it might be empty.




AW: BUG #17923: Excessive warnings of collation version mismatch in logs

От
Hans Buschmann
Дата:

Hello Peter,


Thank you for looking at.


I think, there is a little misunderstanding:


The problem is not at connection time (I only mentioned it for info).


The problem is the endless repeating of these log entries without user login (cut a little bit at line end):


17:09:08 CEST  01000  1:> WARNING:  database "template1" has a collation version mismatch
17:09:08 CEST  01000  2:> DETAIL:  The database was created using collation version 2.36,
17:09:08 CEST  01000  3:> HINT:  Rebuild all objects in this database that use the defaul
17:09:38 CEST  01000  1:> WARNING:  database "postgres" has a collation version mismatch
17:09:38 CEST  01000  2:> DETAIL:  The database was created using collation version 2.36,
17:09:38 CEST  01000  3:> HINT:  Rebuild all objects in this database that use the defaul
17:10:08 CEST  01000  1:> WARNING:  database "template1" has a collation version mismatch
17:10:08 CEST  01000  2:> DETAIL:  The database was created using collation version 2.36,
17:10:08 CEST  01000  3:> HINT:  Rebuild all objects in this database that use the defaul
17:10:38 CEST  01000  1:> WARNING:  database "postgres" has a collation version mismatch
17:10:38 CEST  01000  2:> DETAIL:  The database was created using collation version 2.36,
17:10:38 CEST  01000  3:> HINT:  Rebuild all objects in this database that use the defaul
17:11:08 CEST  01000  1:> WARNING:  database "template1" has a collation version mismatch
17:11:08 CEST  01000  2:> DETAIL:  The database was created using collation version 2.36,
17:11:08 CEST  01000  3:> HINT:  Rebuild all objects in this database that use the defaul
17:11:38 CEST  01000  1:> WARNING:  database "postgres" has a collation version mismatch
17:11:38 CEST  01000  2:> DETAIL:  The database was created using collation version 2.36,
17:11:38 CEST  01000  3:> HINT:  Rebuild all objects in this database that use the defaul
17:12:08 CEST  01000  1:> WARNING:  database "template1" has a collation version mismatch
17:12:08 CEST  01000  2:> DETAIL:  The database was created using collation version 2.36,
17:12:08 CEST  01000  3:> HINT:  Rebuild all objects in this database that use the defaul
17:12:38 CEST  01000  1:> WARNING:  database "postgres" has a collation version mismatch
17:12:38 CEST  01000  2:> DETAIL:  The database was created using collation version 2.36,
17:12:38 CEST  01000  3:> HINT:  Rebuild all objects in this database that use the defaul
17:13:08 CEST  01000  1:> WARNING:  database "template1" has a collation version mismatch
17:13:08 CEST  01000  2:> DETAIL:  The database was created using collation version 2.36,
17:13:08 CEST  01000  3:> HINT:  Rebuild all objects in this database that use the defaul

This is repeated for about 48000 cases!! in a period of roughly 20 days!!!!!


Let me explain my suggestions (Please remind that I am not familiar with the source code, internal data structures or extensive C hackking, so I'll present it in a more conceptual way):

1. Current situation:
The server checks every minute for every database if there is a collation version mismatch.
If it doesn't match, it emits every minute a warning (without user login).

This hits performance and floods the logs.

2. My simple suggestion (as by question 1):
Check this mismatch only at server startup (the glic cannot be exchanged on a running server!)
and report the warnings only once per database.
It should be investigated, by which algorithm the server checks that every minute and this should be avoided if possible.

3. Further suggestion:
When the mismatch occurs at server startup, check if the database has any objects (more selective: if it has any indexes or materialized views)
This is a very short one-time operation only at server startup.
If there are no such objects in the mentioned database, change the collation version silently to the current version (do the ALTER DATABASE xxxxx REFRESH COLLATION VERSION automatically)

4. Elaborated premium solution with focus to easy administration:
With the ever growing databases easy recovery of collation problems is not easy for the user (see other discussions on ICU defaults etc. on hackers).

So for every collation a hash value could be computed and stored in the catalogs.

If there is now a mismatch, the warnings can inform the users of the changed collations (not every collation is subject to frequent changes!).
This circumvents the indeterminable changes of a glibc version change.

A query to the catalogs for such a database can then provide the objects where the altered collations are used and the user can take appropriate action (reindexing a big database is also very expensive!).

This certainly is much more work then only stopping the endless warnings but gives a fair amount of self-administration guide to the user.

PS: perhaps you may move the discussion to hackers to reach a broader audience.

Hans Buschmann







Von: Peter Eisentraut <peter.eisentraut@enterprisedb.com>
Gesendet: Mittwoch, 10. Mai 2023 07:30
An: Hans Buschmann; pgsql-bugs@lists.postgresql.org
Betreff: Re: BUG #17923: Excessive warnings of collation version mismatch in logs
 
On 07.05.23 12:13, PG Bug reporting form wrote:
> 1. Why isn't it sufficient to report this message only at server startup ?
> The messages are sent to psql too, but only when logging into one of these
> mostly unused databases, so it is not observed in a regular routine.

Each database is separate.  The server doesn't look at each database on
startup.  This is only done when you connect to a given database.

> 2. Why is this message reported for databases with no objects or similar
> cases?

When you connect to a database, the server doesn't scan all objects in
the database to see if they might be affected.  That would be too slow.

If you have no objects in the database, you can just run the provided
REFRESH command to make the warning go away.

> 3. Is there a real collation change in glibc 2.37 or is it only a version
> number mismatch with changes only in other parts of the library?

We don't know, because glibc isn't reliable at telling us about it.  So
we take the safe course.

> 4. Could the version information be shown in \l output of psql?

That's a valid idea.

> 5. Wouldn't it make sense to automatically update the collation version
> number of such databases (without any collation-related objects) at server
> startup?

See #2.  We are not going to start scanning each database on each
connection to check whether it might be empty.

Re: AW: BUG #17923: Excessive warnings of collation version mismatch in logs

От
Peter Eisentraut
Дата:
On 11.05.23 12:07, Hans Buschmann wrote:
> The problem is not at connection time (I only mentioned it for info).
> 
> 
> The problem is the endless repeating of these log entries without user 
> login (cut a little bit at line end):
> 
> 
> 17:09:08 CEST  01000  1:> WARNING:  database "template1" has a collation 
> version mismatch
> 17:09:08 CEST  01000  2:> DETAIL:  The database was created using 
> collation version 2.36,
> 17:09:08 CEST  01000  3:> HINT:  Rebuild all objects in this database 
> that use the defaul

I guess this is from the autovacuum background processes.  Maybe we 
should disable the warnings for those?



AW: AW: BUG #17923: Excessive warnings of collation version mismatch in logs

От
Hans Buschmann
Дата:

+1 for disabling the warnings there!


But I wonder, why this test is necessary at every autovacuum call at all?

Woudn't it be sufficient to test it at server start time only once?


Are there perhaps other places with this test?


Hans Buschmann


Re: AW: AW: BUG #17923: Excessive warnings of collation version mismatch in logs

От
Peter Eisentraut
Дата:
On 22.05.23 09:11, Hans Buschmann wrote:
> But I wonder, why this test is necessary at every autovacuum call at all?
> 
> Woudn't it be sufficient to test it at server start time only once?

The main server process (postmaster) cannot access the database 
contents.  So we can only do it when we start a session process.