Re: Why do indexes and sorts use the database collation?

Поиск
Список
Период
Сортировка
От Jeff Davis
Тема Re: Why do indexes and sorts use the database collation?
Дата
Msg-id 13fb2032cbfda29bb4c6c55ccc41a3d01a8de717.camel@j-davis.com
обсуждение исходный текст
Ответ на Re: Why do indexes and sorts use the database collation?  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Mon, 2023-11-13 at 14:12 -0800, Andres Freund wrote:
> Why on earth are we solving this by having multiple pg_collation
> entries for
> exactly the same collation, instead of normalizing the collation-name
> during
> lookup by adding the relevant encoding name if not explicitly
> specified?  It
> makes a lot of sense to not force the user to specify the encoding
> when it
> can't differ.

I'm not aware of it being a common practical problem, so perhaps lack
of motivation. But you're right that it doesn't look very efficient.

We can even go deeper into ICU if we wanted to: lots of locales are
actually aliases to a much smaller number of actual collators. And a
lot are just aliases to the root locale. It's not trivial to reliably
tell if two collators are identical, but in principle it should be
possible: each collation is just a set of tailorings on top of the root
locale, so I suppose if those are equal it's the same collator, right?

> It's imo similarly absurd that an index with "default" collation
> cannot be
> used when specifying the equivalent collation explicitly in the query
> and vice
> versa.

The catalog representation is not ideal to treat the database collation
consistently with other collations. It would be nice to fix that.

> > > >
> Jeff was saying that textual primary keys typically don't need
> sorting and
> because of that we could default to "C", for performance. Part of my
> response
> was that I think the user's intent could be expressed by specifying
> the column
> collation as "C" - to which Jeff replied that that would change the
> semantics. Which, to me, seems to completely run counter to his
> argument that
> we could just use "C" for such indexes.

I am saying we shouldn't prematurely optimize for the case of ORDER BY
on a text PK case by making a an index with a non-"C" collation, given
the costs and risks of non-"C" indexes. Particularly because, even if
there is an ORDER BY, there are several common reasons such an index
would not help anyway.

> > > >
Regards,
    Jeff Davis





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dmitry Dolgov
Дата:
Сообщение: Re: Schema variables - new implementation for Postgres 15
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500