Обсуждение: Cost overestimation of foreign JOIN
Hi, While testing of Asynchronous Append feature with TPC-H queries, I found that the push-down JOIN technique is rarely used. For my servers fdw_tuple_cost = 0.2, fdw_startup_cost = 100. Exploring the code, i found in postgres_fdw, estimate_path_cost_size(), lines 2908,2909: run_cost += nrows * join_cost.per_tuple; nrows = clamp_row_est(nrows * fpinfo->joinclause_sel); Above: nrows = fpinfo_i->rows * fpinfo_o->rows; Maybe it is needed to swap lines 2908 and 2909 (see attachment)? In my case of two big partitioned tables and small join result it strongly influenced on choice of the JOIN push-down strategy. -- regards, Andrey Lepikhov Postgres Professional
Вложения
Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes: > Maybe it is needed to swap lines 2908 and 2909 (see attachment)? No; as explained in the comment immediately above here, we're assuming that the join conditions will be applied on the cross product of the input relations. Now admittedly, that's a worst-case assumption, since it amounts to expecting that the remote server will do the join in the dumbest possible nested-loop way. If the remote can use a merge or hash join, for example, the cost is likely to be a lot less. But it is not the job of this code path to outguess the remote planner. It's certainly not appropriate to invent an unprincipled cost estimate as a substitute for trying to guess that. If you're unhappy with the planning results you get for this, why don't you have use_remote_estimate turned on? regards, tom lane
On 30.11.2020 22:38, Tom Lane wrote: > Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes: >> Maybe it is needed to swap lines 2908 and 2909 (see attachment)? > > No; as explained in the comment immediately above here, we're assuming > that the join conditions will be applied on the cross product of the > input relations. Thank you. Now it is clear to me. > > Now admittedly, that's a worst-case assumption, since it amounts to > expecting that the remote server will do the join in the dumbest > possible nested-loop way. If the remote can use a merge or hash > join, for example, the cost is likely to be a lot less. My goal is scaling Postgres on a set of the same servers with same postgres instances. If one server uses for the join a hash-join node, i think it is most likely that the other server will also use for this join a hash-join node (Maybe you remember, I also use the statistics copying technique to provide up-to-date statistics on partitions). Tests show good results with such an approach. But maybe this is my special case. > But it is > not the job of this code path to outguess the remote planner. It's > certainly not appropriate to invent an unprincipled cost estimate > as a substitute for trying to guess that. Agreed. > > If you're unhappy with the planning results you get for this, > why don't you have use_remote_estimate turned on? I have a mixed load model. Large queries are suitable for additional estimate queries. But for many simple SELECT's that touch a small portion of the data, the latency has increased significantly. And I don't know how to switch the use_remote_estimate setting in such case. -- regards, Andrey Lepikhov Postgres Professional
On Mon, Nov 30, 2020 at 11:56 PM Andrey Lepikhov <a.lepikhov@postgrespro.ru> wrote: > > On 30.11.2020 22:38, Tom Lane wrote: > > Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes: > >> Maybe it is needed to swap lines 2908 and 2909 (see attachment)? > > > > No; as explained in the comment immediately above here, we're assuming > > that the join conditions will be applied on the cross product of the > > input relations. > > Thank you. Now it is clear to me. > > > > Now admittedly, that's a worst-case assumption, since it amounts to > > expecting that the remote server will do the join in the dumbest > > possible nested-loop way. If the remote can use a merge or hash > > join, for example, the cost is likely to be a lot less. > > My goal is scaling Postgres on a set of the same servers with same > postgres instances. If one server uses for the join a hash-join node, i > think it is most likely that the other server will also use for this > join a hash-join node (Maybe you remember, I also use the statistics > copying technique to provide up-to-date statistics on partitions). Tests > show good results with such an approach. But maybe this is my special case. > > > But it is > > not the job of this code path to outguess the remote planner. It's > > certainly not appropriate to invent an unprincipled cost estimate > > as a substitute for trying to guess that. > > Agreed. > > > > If you're unhappy with the planning results you get for this, > > why don't you have use_remote_estimate turned on? > > I have a mixed load model. Large queries are suitable for additional > estimate queries. But for many simple SELECT's that touch a small > portion of the data, the latency has increased significantly. And I > don't know how to switch the use_remote_estimate setting in such case. You may disable use_remote_estimates for given table or a server. So if tables participating in short queries are different from those in the large queries, you could set use_remote_estimate at table level to turn it off for the first set. Otherwise, we need a FDW level GUC which can be turned on/off for a given session or a query. Generally use_remote_estimate isn't scalable and there have been discussions about eliminating the need of it. But no concrete proposal has come yet. -- Best Wishes, Ashutosh Bapat
On 12/1/20 6:17 PM, Ashutosh Bapat wrote: > On Mon, Nov 30, 2020 at 11:56 PM Andrey Lepikhov > <a.lepikhov@postgrespro.ru> wrote: >> >> On 30.11.2020 22:38, Tom Lane wrote: >>> Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes: >>> If you're unhappy with the planning results you get for this, >>> why don't you have use_remote_estimate turned on? >> >> I have a mixed load model. Large queries are suitable for additional >> estimate queries. But for many simple SELECT's that touch a small >> portion of the data, the latency has increased significantly. And I >> don't know how to switch the use_remote_estimate setting in such case. > > You may disable use_remote_estimates for given table or a server. So > if tables participating in short queries are different from those in > the large queries, you could set use_remote_estimate at table level to > turn it off for the first set. Otherwise, we need a FDW level GUC > which can be turned on/off for a given session or a query. Currently I implemented another technique: - By default, use_remote_estimate is off. - On the estimate_path_cost_size() some estimation criteria is checked. If true, we force remote estimation for this JOIN. This approach solves the push-down problem in my case - TPC-H test with 6 servers/instances. But it is not so scalable, as i want. > > Generally use_remote_estimate isn't scalable and there have been > discussions about eliminating the need of it. But no concrete proposal > has come yet. > Above I suggested to use results of cost calculation on local JOIN, assuming that in the case of postgres_fdw wrapper very likely, that foreign server will use the same type of join (or even better, if it has some index, for example). If this approach is of interest, I can investigate it. -- regards, Andrey Lepikhov Postgres Professional