Обсуждение: Can we rely on the ordering of paths in pathlist?

Поиск
Список
Период
Сортировка

Can we rely on the ordering of paths in pathlist?

От
Richard Guo
Дата:
As the comment above add_path() says, 'The pathlist is kept sorted by
total_cost, with cheaper paths at the front.'  And it seems that
get_cheapest_parallel_safe_total_inner() relies on this ordering
(without being mentioned in the comments, which I think we should do).
I'm wondering if this is the right thing to do, as in other places
cheapest total cost is found by compare_path_costs(), which would
consider startup cost if paths have the same total cost, and that seems
more sensible.

Attach a trivial patch to make get_cheapest_parallel_safe_total_inner
act this way.

Thanks
Richard
Вложения

Re: Can we rely on the ordering of paths in pathlist?

От
Andy Fan
Дата:


On Tue, Apr 11, 2023 at 11:03 AM Richard Guo <guofenglinux@gmail.com> wrote:
As the comment above add_path() says, 'The pathlist is kept sorted by
total_cost, with cheaper paths at the front.'  And it seems that
get_cheapest_parallel_safe_total_inner() relies on this ordering
(without being mentioned in the comments, which I think we should do).

I think the answer for ${subject} should be yes. Per the comments in
add_partial_path, we have

 * add_partial_path
 *
 *  As in add_path, the partial_pathlist is kept sorted with the cheapest
 *  total path in front.  This is depended on by multiple places, which
 *  just take the front entry as the cheapest path without searching.
 *
I'm wondering if this is the right thing to do, as in other places
cheapest total cost is found by compare_path_costs(), which would
consider startup cost if paths have the same total cost, and that seems
more sensible.

Attach a trivial patch to make get_cheapest_parallel_safe_total_inner
act this way.


Did you run into any real issue with the current coding? If we have to
"consider startup cost if paths have the same total cost", we still can
rely on the ordering and stop iterating when the total_cost becomes
bigger to avoid scanning all the paths in pathlist. 

But if you are complaining the function prototype, where is the pathlist
may be not presorted,  I think the better way maybe changes it from:
Path *get_cheapest_parallel_safe_total_inner(List *paths)  to 
Path *get_cheapest_parallel_safe_total_inner(RelOptInfo *rel);
and scan the rel->pathlist in the function body. 

--
Best Regards
Andy Fan

Re: Can we rely on the ordering of paths in pathlist?

От
Richard Guo
Дата:

On Tue, Apr 11, 2023 at 11:43 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
On Tue, Apr 11, 2023 at 11:03 AM Richard Guo <guofenglinux@gmail.com> wrote:
As the comment above add_path() says, 'The pathlist is kept sorted by
total_cost, with cheaper paths at the front.'  And it seems that
get_cheapest_parallel_safe_total_inner() relies on this ordering
(without being mentioned in the comments, which I think we should do).

I think the answer for ${subject} should be yes. Per the comments in
add_partial_path, we have

 * add_partial_path
 *
 *  As in add_path, the partial_pathlist is kept sorted with the cheapest
 *  total path in front.  This is depended on by multiple places, which
 *  just take the front entry as the cheapest path without searching.
 *

I'm not sure about this conclusion.  Surely we can depend on that the
partial_pathlist is kept sorted by total_cost ASC.  This is emphasized
in the comment of add_partial_path, and also leveraged in practice, such
as in many places we just use linitial(rel->partial_pathlist) as the
cheapest partial path.

But get_cheapest_parallel_safe_total_inner works on pathlist not
partial_pathlist.  And for pathlist, I'm not sure if it's a good
practice to depend on its ordering.  Because

1) the comment of add_path only mentions that add_path_precheck relies
on this ordering, but it does not mention that other functions also do;

2) other than add_path_precheck, I haven't observed any other functions
that rely on this ordering.  The only exception as far as I notice is
get_cheapest_parallel_safe_total_inner.

On the other hand, if we declare that we can rely on the pathlist being
sorted in ascending order by total_cost, we should update the comment
for add_path to highlight this aspect.  We should also include a comment
for get_cheapest_parallel_safe_total_inner to clarify why an early exit
is possible, similar to what we do for add_path_precheck.  Additionally,
in several places, we can optimize our code by taking advantage of this
fact and terminate the iteration through the pathlist early when looking
for the cheapest path of a certain kind.

Thanks
Richard