Re: pg_dump versus hash partitioning

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: pg_dump versus hash partitioning
Дата
Msg-id CA+TgmoYfWKz9wKVNRo8QetFfFKZbk8wrbxWN5EBpw_qYWT5WLw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg_dump versus hash partitioning  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: pg_dump versus hash partitioning  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: pg_dump versus hash partitioning  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Список pgsql-hackers
On Wed, Feb 1, 2023 at 5:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Here, you'd like to argue that it's perfectly
> > fine if we instead insert some of the rows into different tables than
> > where they were on the original system.
>
> I can agree with that argument for range or list partitioning, where
> the partitions have some semantic meaning to the user.  I don't buy it
> for hash partitioning.  It was an implementation artifact to begin
> with that a given row ended up in partition 3 not partition 11, so why
> would users care which partition it ends up in after a dump/reload?
> If they think there is a difference between the partitions, they need
> education.

I see your point. I think it's somewhat valid. However, I also think
it muddies the definition of what pg_dump is allowed to do in a way
that I do not like. I think there's a difference between the CTID or
XMAX of a tuple changing and it ending up in a totally different
partition. It feels like it makes the definition of correctness
subjective: we do think that people care about range and list
partitions as individual entities, so we'll put the rows back where
they were and complain if we can't, but we don't think they think
about hash partitions that way, so we will err on the side of making
the dump restoration succeed. That's a level of guessing what the user
meant that I think is uncomfortable. I feel like when somebody around
here discovers that sort of reasoning in some other software's code,
or in a proposed patch, it pretty often gets blasted on this mailing
list with considerable vigor.

I think you can construct plausible cases where it's not just
academic. For instance, suppose I intend to use some kind of logical
replication system, not necessarily the one built into PostgreSQL, to
replicate data between two systems. Before engaging that system, I
need to make the initial database contents match. The system is
oblivious to partitioning, and just replicates each table to a table
with a matching name. Well, if the partitions don't actually match up
1:1, I kind of need to know about that. In this use case, the rows
silently moving around doesn't meet my needs. Or, suppose I dump and
restore two databases. It works perfectly. I then run a comparison
tool of some sort that compares the two databases. EDB has such a
tool! I don't know whether it would perform the comparison via the
partition root or not, because I don't know how it works. But I find
it pretty plausible that some such tool would show differences between
the source and target databases. Now, if I had done the data migration
using --load-with-partition-root, I would expect that. I might even be
looking for it, to see what got moved around. But otherwise, it might
be unexpected.

Another subtle problem with this whole situation is: suppose that on
host A, I set up a table hash-partitioned by an enum column and make a
bunch of hash partitions. Then, on host B, I set up the same table
with a bunch of foreign table partitions, each corresponding to the
matching partition on the other node. I guess that just doesn't work,
whereas if the column were of any other data type, it would work. If
it were a collatable data type, it would need the collations and
collation definitions to match, too.

I know these things are subtle, and maybe these specific things will
never happen to anyone, or nobody will care. I don't know. I just have
a really hard time accepting that a categorical statement that this
just can't ever matter to anyone.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nathan Bossart
Дата:
Сообщение: Re: Weird failure with latches in curculio on v15
Следующее
От: Tom Lane
Дата:
Сообщение: Re: pg_dump versus hash partitioning