Обсуждение: Determining Indexes to Rebuild on new libc

Поиск
Список
Период
Сортировка

Determining Indexes to Rebuild on new libc

От
Don Seiler
Дата:
Good morning,

As we're staring down the eventuality of having to migrate to a newer OS (currently on Ubuntu 18.04 LTS), we're preparing for the collation change madness that will ensue. We're looking at logical replication but there is a lot to unpack there as well given the number of databases and the massive size of a few of them. I had been inclined to bite the bullet and do logical replication (or dump/restore on smaller DBs) but the timeframe for the project is being pushed up so I'm looking for shortcuts where possible (obviously without risking DB integrity). This would also give me the opportunity for other modifications like enabling data checksums on the new DBs that I had been sorely wanting for years now.

One question that gets asked is if we could do physical replication, cut over, and then only rebuild indexes that "need it" in order to minimize the subsequent downtime. i.e. can we determine which indexes will actually have a potential problem. For example, a lot of indexes are on text/varchar datatype fields that hold UUID data and nothing more (basic alphanumeric characters and hyphens embedded). If we can be certain that these fields truly only hold this type of data, could we skip rebuilding them after the cutover to a newer OS (eg Ubuntu 22.04 LTS with the newer libc collation)?

Thanks,
Don.

--
Don Seiler
www.seiler.us

Re: Determining Indexes to Rebuild on new libc

От
Ron
Дата:
On 8/4/22 09:29, Don Seiler wrote:
Good morning,

As we're staring down the eventuality of having to migrate to a newer OS (currently on Ubuntu 18.04 LTS), we're preparing for the collation change madness that will ensue. We're looking at logical replication but there is a lot to unpack there as well given the number of databases and the massive size of a few of them. I had been inclined to bite the bullet and do logical replication (or dump/restore on smaller DBs) but the timeframe for the project is being pushed up so I'm looking for shortcuts where possible (obviously without risking DB integrity). This would also give me the opportunity for other modifications like enabling data checksums on the new DBs that I had been sorely wanting for years now.

One question that gets asked is if we could do physical replication, cut over, and then only rebuild indexes that "need it" in order to minimize the subsequent downtime. i.e. can we determine which indexes will actually have a potential problem. For example, a lot of indexes are on text/varchar datatype fields that hold UUID data and nothing more (basic alphanumeric characters and hyphens embedded). If we can be certain that these fields truly only hold this type of data, could we skip rebuilding them after the cutover to a newer OS (eg Ubuntu 22.04 LTS with the newer libc collation)?

Join pg_indexes to information_schema to get only the indices with character fields.

If ALL your UUID indices helpfully have "uuid" embedded in their names, then you can exclude them, too.

Otherwise, you must scan each indexed text field looking for anything besides low-ASCII characters, and hope that no such data has been added to since you scanned.

For example: SELECT field_1 from foo where field_1 >= CHR(128);

CHECK constraints would of course also ensure that it can never happen...

--
Angular momentum makes the world go 'round.

Re: Determining Indexes to Rebuild on new libc

От
Jim Mlodgenski
Дата:


On Thu, Aug 4, 2022 at 10:29 AM Don Seiler <don@seiler.us> wrote:
One question that gets asked is if we could do physical replication, cut over, and then only rebuild indexes that "need it" in order to minimize the subsequent downtime. i.e. can we determine which indexes will actually have a potential problem. For example, a lot of indexes are on text/varchar datatype fields that hold UUID data and nothing more (basic alphanumeric characters and hyphens embedded). If we can be certain that these fields truly only hold this type of data, could we skip rebuilding them after the cutover to a newer OS (eg Ubuntu 22.04 LTS with the newer libc collation)?


I don't think you can make that assumption with the UUID data. Try this simple example and you can see the sort order changes on the newer OS and can lead to duplicates on your primary key if you don't reindex after the upgrade.

CREATE TABLE t1 (c1 varchar PRIMARY KEY);
INSERT INTO t1 VALUES ('1-a'), ('1a'), ('1-aa');
SELECT * FROM t1 ORDER BY c1;

Re: Determining Indexes to Rebuild on new libc

От
Don Seiler
Дата:
On Thu, Aug 4, 2022 at 10:03 AM Jim Mlodgenski <jimmy76@gmail.com> wrote:

I don't think you can make that assumption with the UUID data. Try this simple example and you can see the sort order changes on the newer OS and can lead to duplicates on your primary key if you don't reindex after the upgrade.

CREATE TABLE t1 (c1 varchar PRIMARY KEY);
INSERT INTO t1 VALUES ('1-a'), ('1a'), ('1-aa');
SELECT * FROM t1 ORDER BY c1;


Looks like you're right. Testing with PG 12.11 pgdg packages for their respective Ubuntu versions:

18.04:
testdb=# SELECT * FROM t1 ORDER BY c1;
  c1
------
 1a
 1-a
 1-aa
(3 rows)


22.04:
testdb=# SELECT * FROM t1 ORDER BY c1;
  c1
------
 1-a
 1a
 1-aa
(3 rows)


So it looks like we'd have to rebuild any UUID index to be safe anyway, which are used a lot for identifiers. *sadness intensifies*

--
Don Seiler
www.seiler.us

Re: Determining Indexes to Rebuild on new libc

От
Scott Ribe
Дата:
> On Aug 4, 2022, at 9:48 AM, Don Seiler <don@seiler.us> wrote:
>
> So it looks like we'd have to rebuild any UUID index to be safe anyway, which are used a lot for identifiers.
*sadnessintensifies* 

Actually, given that collation change, I'd be hesitant to assume ANY text index was safe. Names, addresses, part #s...
Allcan have hyphens.