On Tue, 2023-08-01 at 09:47 -0400, richard coleman wrote:
> In PostgreSQL foreign tables are not automatically analyzed and analyze must be
> specifically called on each table. In the case of sharing tables between
> PostgreSQL clusters, there is the use_remote_estimate foreign server option.
>
> In some of our multi terabyte databases, manually running analyze on all of
> the foreign tables can take more than a day. This is per database containing
> the foreign schema. Since we have certain large schema that we have centrally
> located and share to all of our other database clusters, this really adds up.
>
> use_remote_estimate isn't really a solution as it adds way too much overhead
> and processing time to every query run.
>
> Since these tables are being continuously analyzed in the database that hosts
> the data, is there some way that they statistics could be easily passed through
> the foreign server mechanism to the remote database that's calling the query?
>
> Unless I'm missing something we can either:
> 1. manually run analyze on each foreign table in each database that points to
> the host table
> 2. set use_remote_estimate = true which will cause PostgreSQL to re-obtain
> statistics on a per query basis.
>
> What I am hoping for is either:
> 1. pass through the results of analyze from the source database to the one
> where the foreign query is being run
> 2. add the ability to automatically run analyze on foreign tables just as they
> are currently run on local tables.
>
> Of the two, #1 would seem to be the easiest and least wasteful of resources.
Unfortunately, both your wishes don't look feasible:
- Transferring table statistics would mean that PostgreSQL understands statistics
from other server versions. This is complicated, and we have decided not to
do this for pg_upgrade, so I don't think we'll try to do it here.
- Autoanalyzing foreign tables would mean that we have some idea how much data
has changed on the remote server. How should we do that?
What I can imagine is that instead of reading the complete remote table during
ANALYZE, PostgreSQL applies TABLESAMPLE to fetch only part. That could be a
workable enhancement.
Yours,
Laurenz Albe