Re: Statistics Import and Export

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: Statistics Import and Export
Дата
Msg-id CAOuzzgp4vbs5wOUFyE9Ts+kc7+BU-_5oPNs2dgQ5cHhJBUvmFg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Statistics Import and Export  (Corey Huinker <corey.huinker@gmail.com>)
Ответы Re: Statistics Import and Export  (Corey Huinker <corey.huinker@gmail.com>)
Список pgsql-hackers
Greetings,

On Thu, Feb 29, 2024 at 17:48 Corey Huinker <corey.huinker@gmail.com> wrote:
Having looked through this thread and discussed a bit with Corey
off-line, the approach that Tom laid out up-thread seems like it would
make the most sense overall- that is, eliminate the JSON bits and the
SPI and instead export the stats data by running queries from the new
version of pg_dump/server (in the FDW case) against the old server
with the intelligence of how to transform the data into the format
needed for the current pg_dump/server to accept, through function calls
where the function calls generally map up to the rows/information being
updated- a call to update the information in pg_class for each relation
and then a call for each attribute to update the information in
pg_statistic.

Thanks for the excellent summary of our conversation, though I do add that we discussed a problem with per-attribute functions: each function would be acquiring locks on both the relation (so it doesn't go away) and pg_statistic, and that lock thrashing would add up. Whether that overhead is judged significant or not is up for discussion. If it is significant, it makes sense to package up all the attributes into one call, passing in an array of some new pg_statistic-esque special type....the very issue that sent me down the JSON path.

I certainly see the flexibility in having a per-attribute functions, but am concerned about non-binary-upgrade situations where the attnums won't line up, and if we're passing them by name then the function has dig around looking for the right matching attnum, and that's overhead too. In the whole-table approach, we just iterate over the attributes that exist, and find the matching parameter row.

That’s certainly a fair point and my initial reaction (which could certainly be wrong) is that it’s unlikely to be an issue- but also, if you feel you could make it work with an array and passing all the attribute info in with one call, which I suspect would be possible but just a bit more complex to build, then sure, go for it. If it ends up being overly unwieldy then perhaps the  per-attribute call would be better and we could perhaps acquire the lock before the function calls..?  Doing a check to see if we have already locked it would be cheaper than trying to acquire a new lock, I’m fairly sure.

Also per our prior discussion- this makes sense to include in post-data section, imv, and also because then we have the indexes we may wish to load stats for, but further that also means it’ll be in the paralleliziable part of the process, making me a bit less concerned overall about the individual timing. 

Thanks!

Stephen

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Infinite loop in XLogPageRead() on standby
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: BitmapHeapScan streaming read user and prelim refactoring