Обсуждение: unrecognized node type while displaying a Path due to dangling pointer

Поиск
Список
Период
Сортировка

unrecognized node type while displaying a Path due to dangling pointer

От
Jeevan Chalke
Дата:
Hello,

While debugging one issue, we have added the below line in postgresGetForeignPlan() to see the foreignrel details.

@@ -1238,6 +1238,8 @@ postgresGetForeignPlan(PlannerInfo *root,
    bool        has_limit = false;
    ListCell   *lc;
 
+   elog(INFO, "foreignrel: %s", nodeToString(foreignrel));
+

And with this change, we ran the postgres_fdw regression (executing sql/postgres_fdw.sql) suite. We observed the below warnings that seem strange.

+WARNING:  could not dump unrecognized node type: 0
+WARNING:  could not dump unrecognized node type: 26072088
+WARNING:  could not dump unrecognized node type: 26438448
+WARNING:  could not dump unrecognized node type: 368

Of course, with multiple runs, we see some random node types listed above. Thanks to my colleague Suraj Kharage for this and for working parallel with me.

Does anybody have any idea about these?

After debugging one random query from the above-failed case, what we have observed is (we might be wrong, but worth noting here):

1. This warning ended up while displaying RelOptInfo->pathlist.
2. In create_ordered_paths(), input_rel has two paths, and it loops over both paths to get the best-sorted path.
3. First path was unsorted, and thus we add a sort node on top of it, and adds that to the ordered_rel.
4. However, 2nd path was already sorted and passed as is to the add_path().
5. add_path() decides to reject this new path on some metrics. However, in the end, it pfree() this passed in path. It seems wrong as its references do present elsewhere. For example, in the first path's parent rels path list.
6. So, while displaying the parent's path, we end up with these warnings.

I tried to get a fix for this but no luck so far.
One approach was to copy the path before passing it to the add_path(). However, there is no easy way to copy a path due to its circular references.

To see whether this warning goes or not, I have commented code in add_path() that does pfree() on the new_path. And with that, I don't see any warnings. But removing that code doesn't seem to be the correct fix.

Suggestions?

Thanks

--
Jeevan Chalke
Senior Staff SDE, Database Architect, and Manager
Product Development




edbpostgres.com

Re: unrecognized node type while displaying a Path due to dangling pointer

От
Alvaro Herrera
Дата:
On 2023-Jul-11, Jeevan Chalke wrote:

> 4. However, 2nd path was already sorted and passed as is to the add_path().
> 5. add_path() decides to reject this new path on some metrics. However, in
> the end, it pfree() this passed in path. It seems wrong as its references
> do present elsewhere. For example, in the first path's parent rels path
> list.
> 6. So, while displaying the parent's path, we end up with these warnings.

In other words, this is use-after-free, with add_path freeing the
passed-in Path pointer, but one particular case in which this Path is
still used afterwards.

> I tried to get a fix for this but no luck so far.

I proposed to add an add_path_extended() function that adds 'bool
free_input_path' argument, and pass it false in that one place in
create_ordered_paths.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/



Re: unrecognized node type while displaying a Path due to dangling pointer

От
Jeevan Chalke
Дата:


On Tue, Jul 11, 2023 at 1:19 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2023-Jul-11, Jeevan Chalke wrote:

> 4. However, 2nd path was already sorted and passed as is to the add_path().
> 5. add_path() decides to reject this new path on some metrics. However, in
> the end, it pfree() this passed in path. It seems wrong as its references
> do present elsewhere. For example, in the first path's parent rels path
> list.
> 6. So, while displaying the parent's path, we end up with these warnings.

In other words, this is use-after-free, with add_path freeing the
passed-in Path pointer, but one particular case in which this Path is
still used afterwards.

> I tried to get a fix for this but no luck so far.

I proposed to add an add_path_extended() function that adds 'bool
free_input_path' argument, and pass it false in that one place in
create_ordered_paths.

Yeah, this can be a way.

However, I am thinking the other way around now. What if we first added the unmodified input path as it is to the ordered_rel first?

If we do so, then while adding the next path, add_path() may decide to remove the older one as the newer path is the best one. The remove_old logic in add_path() will free the path (unsorted one), and we end up with the same error.

And if we conditionally remove that path (remove_old logic one), then we need to pass false in every add_path() call in create_ordered_paths().

Am I missing something?

Thanks
 

--
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/


--
Jeevan Chalke
Senior Staff SDE, Database Architect, and Manager
Product Development




edbpostgres.com

Re: unrecognized node type while displaying a Path due to dangling pointer

От
Jeevan Chalke
Дата:


On Tue, Jul 11, 2023 at 2:58 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:


On Tue, Jul 11, 2023 at 1:19 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2023-Jul-11, Jeevan Chalke wrote:

> 4. However, 2nd path was already sorted and passed as is to the add_path().
> 5. add_path() decides to reject this new path on some metrics. However, in
> the end, it pfree() this passed in path. It seems wrong as its references
> do present elsewhere. For example, in the first path's parent rels path
> list.
> 6. So, while displaying the parent's path, we end up with these warnings.

In other words, this is use-after-free, with add_path freeing the
passed-in Path pointer, but one particular case in which this Path is
still used afterwards.

> I tried to get a fix for this but no luck so far.

I proposed to add an add_path_extended() function that adds 'bool
free_input_path' argument, and pass it false in that one place in
create_ordered_paths.

Yeah, this can be a way.

However, I am thinking the other way around now. What if we first added the unmodified input path as it is to the ordered_rel first?

If we do so, then while adding the next path, add_path() may decide to remove the older one as the newer path is the best one. The remove_old logic in add_path() will free the path (unsorted one), and we end up with the same error.

And if we conditionally remove that path (remove_old logic one), then we need to pass false in every add_path() call in create_ordered_paths().

Attached patch.
 

Am I missing something?

Thanks
 

--
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/


--
Jeevan Chalke
Senior Staff SDE, Database Architect, and Manager
Product Development




edbpostgres.com


--
Jeevan Chalke
Senior Staff SDE, Database Architect, and Manager
Product Development




edbpostgres.com
Вложения

Re: unrecognized node type while displaying a Path due to dangling pointer

От
Tom Lane
Дата:
Jeevan Chalke <jeevan.chalke@enterprisedb.com> writes:
> Attached patch.

I would be astonished if this fixes anything.  The code still doesn't
know which paths are referenced by which other ones, and so the place
where we free a previously-added path can't know what to do.

I've speculated about adding some form of reference counting to paths
(maybe just a "pin" flag rather than a full refcount) so that we could
be smarter about this.  The existing kluge for "don't free IndexPaths"
could be replaced by setting the pin mark on any IndexPath that we
make a bitmap path from.  Up to now it hasn't seemed necessary to
generalize that hack, but maybe it's time.  Can you show a concrete
case where we are freeing a still-referenced path?

            regards, tom lane



Re: unrecognized node type while displaying a Path due to dangling pointer

От
Jeevan Chalke
Дата:
Hi Tom,

On Tue, Jul 11, 2023 at 4:30 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeevan Chalke <jeevan.chalke@enterprisedb.com> writes:
> Attached patch.

I would be astonished if this fixes anything.  The code still doesn't
know which paths are referenced by which other ones, and so the place
where we free a previously-added path can't know what to do.

I've speculated about adding some form of reference counting to paths
(maybe just a "pin" flag rather than a full refcount) so that we could
be smarter about this.  The existing kluge for "don't free IndexPaths"
could be replaced by setting the pin mark on any IndexPath that we
make a bitmap path from.  Up to now it hasn't seemed necessary to
generalize that hack, but maybe it's time.  Can you show a concrete
case where we are freeing a still-referenced path?

As mentioned earlier, while debugging some issues, we have put an elog displaying the foreignrel contents using nodeToString(). Like below:

@@ -1238,6 +1238,8 @@ postgresGetForeignPlan(PlannerInfo *root,
    bool        has_limit = false;
    ListCell   *lc;
 
+   elog(INFO, "foreignrel: %s", nodeToString(foreignrel));
+

And ran the postgres_fdw regression and observed many warnings saying "could not dump unrecognized node type".  Here are the queries retrieved and adjusted from postgres_fdw.sql

CREATE EXTENSION postgres_fdw;
CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname 'postgres', port '5432');
CREATE USER MAPPING FOR CURRENT_USER SERVER loopback;
CREATE TABLE t1 (c1 int NOT NULL, c2 int NOT NULL, CONSTRAINT t1_pkey PRIMARY KEY (c1));
INSERT INTO t1 SELECT id, id % 10 FROM generate_series(1, 1000) id;
ANALYZE t1;
CREATE FOREIGN TABLE ft2 (c1 int NOT NULL, c2 int NOT NULL) SERVER loopback OPTIONS (schema_name 'public', table_name 't1');

explain (verbose, costs off)
select c2, sum(c1) from ft2 group by c2 having avg(c1) < 500 and sum(c1) < 49800 order by c2;


With the above elog() in place, we can see the warning. And the pathlist has a second path as empty ({}). Which got freed but has a reference in this foreignrel.

Thanks
 

                        regards, tom lane


--
Jeevan Chalke
Senior Staff SDE, Database Architect, and Manager
Product Development




edbpostgres.com

Re: unrecognized node type while displaying a Path due to dangling pointer

От
Tom Lane
Дата:
So what's going on here is that create_ordered_paths() does this:

    foreach(lc, input_rel->pathlist)
    {
        Path       *input_path = (Path *) lfirst(lc);

        if (/* input path is suitably sorted already */)
            sorted_path = input_path;
        else
            /* build a sorted path atop input_path */

        /* Add projection step if needed */
        if (sorted_path->pathtarget != target)
            sorted_path = apply_projection_to_path(root, ordered_rel,
                                                   sorted_path, target);

        add_path(ordered_rel, sorted_path);
    }

Thus, if the input RelOptInfo has a path that already has the correct
ordering and output target, we'll try to add that path directly to
the output RelOptInfo.  This is cheating in at least two ways:

1. The path's parent link isn't right: it still points to the input rel.

2. As per Jeevan's report, we now potentially have two different links
to the path.  add_path could reject and free the path immediately,
or it could do so later while comparing it to some path offered later
for the output RelOptInfo, and either case leads to a dangling pointer
in the input RelOptInfo's pathlist.

Now, the reason we get away with #2 is that nothing looks at the lower
RelOptInfo's pathlist anymore after create_ordered_paths: we will only
be interested in paths that contribute to a surviving Path in the
output RelOptInfo, and those will be linked directly from the upper
Path.  However, that's clearly kind of fragile, plus it's a bit
surprising that nobody has complained about #1.

We could probably fix this by creating a rule that you *must*
wrap a Path for a lower RelOptInfo into some sort of wrapper
Path before offering it as a candidate for an upper RelOptInfo.
(This could be cross-checked by having add_path Assert that
new_path->parent == parent_rel.  The wrapper could be a do-nothing
ProjectionPath, perhaps.)  But I think there are multiple places
taking similar shortcuts, so I'm a bit worried about how much overhead
we'll add for what seems likely to be only a debugging annoyance.

A low-cost fix perhaps could be to unlink the lower rel's whole
path list (set input_rel->pathlist = NIL, also zero the related
fields such as cheapest_path) once we've finished selecting the
paths we want for the upper rel.  That's not great for debuggability
either, but maybe it's the most appropriate compromise.

I don't recall how clearly I understood this while writing the
upper-planner-pathification patch years ago.  I think I did
realize the code was cheating, but if so I failed to document
it, so far as I can see.

            regards, tom lane



Re: unrecognized node type while displaying a Path due to dangling pointer

От
David Rowley
Дата:
On Wed, 12 Jul 2023 at 08:46, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> A low-cost fix perhaps could be to unlink the lower rel's whole
> path list (set input_rel->pathlist = NIL, also zero the related
> fields such as cheapest_path) once we've finished selecting the
> paths we want for the upper rel.  That's not great for debuggability
> either, but maybe it's the most appropriate compromise.

I've not taken the time to fully understand this, but from reading the
thread, I'm not immediately understanding why we can't just shallow
copy the Path from the other RelOptInfo and replace the parent before
using it in the upper RelOptInfo.  Can you explain?

David



Re: unrecognized node type while displaying a Path due to dangling pointer

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> I've not taken the time to fully understand this, but from reading the
> thread, I'm not immediately understanding why we can't just shallow
> copy the Path from the other RelOptInfo and replace the parent before
> using it in the upper RelOptInfo.  Can you explain?

I did think about that, but "shallow copy a Path" seems nontrivial
because the Path structs are all different sizes.  Maybe it is worth
building some infrastructure to support that?

            regards, tom lane



Re: unrecognized node type while displaying a Path due to dangling pointer

От
David Rowley
Дата:
On Wed, 12 Jul 2023 at 14:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I did think about that, but "shallow copy a Path" seems nontrivial
> because the Path structs are all different sizes.  Maybe it is worth
> building some infrastructure to support that?

It seems a reasonable thing to have to do.  It seems the minimum thing
we could do to ensure each Path is only mentioned in at most 1
RelOptInfo.

I see GetExistingLocalJoinPath() in foreign.c might be related to this
problem, per:

> * If the inner or outer subpath of the chosen path is a ForeignScan, we
> * replace it with its outer subpath.  For this reason, and also because the
> * planner might free the original path later, the path returned by this
> * function is a shallow copy of the original.  There's no need to copy
> * the substructure, so we don't.

so that function could probably disappear if we had this.

David



Re: unrecognized node type while displaying a Path due to dangling pointer

От
David Rowley
Дата:
On Wed, 12 Jul 2023 at 14:50, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Wed, 12 Jul 2023 at 14:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I did think about that, but "shallow copy a Path" seems nontrivial
> > because the Path structs are all different sizes.  Maybe it is worth
> > building some infrastructure to support that?
>
> It seems a reasonable thing to have to do.  It seems the minimum thing
> we could do to ensure each Path is only mentioned in at most 1
> RelOptInfo.

I've attached a draft patch which adds copyObjectFlat() and supports
all Node types asides from the ones mentioned in @extra_tags in
gen_node_support.pl.  This did require including all the node header
files in copyfuncs.c, which that file seems to have avoided until now.

I also didn't do anything about ExtensibleNode types. I assume just
copying the ExtensibleNode isn't good enough. To flat copy the actual
node I think would require adding a new function to
ExtensibleNodeMethods.

I was also unsure what we should do when shallow copying a List. The
problem there is if we just do a shallow copy, a repalloc() on the
elements array would end up pfreeing memory that might be used by a
shallow copied clone.  Perhaps List is not unique in that regard?
Maybe the solution there is to add a special case and list_copy()
Lists like what is done in copyObjectImpl().

I'm hoping the attached patch will at least assist in moving the
discussion along.

David

Вложения

Re: unrecognized node type while displaying a Path due to dangling pointer

От
Tom Lane
Дата:
David Rowley <dgrowleyml@gmail.com> writes:
> On Wed, 12 Jul 2023 at 14:50, David Rowley <dgrowleyml@gmail.com> wrote:
>> On Wed, 12 Jul 2023 at 14:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I did think about that, but "shallow copy a Path" seems nontrivial
>>> because the Path structs are all different sizes.  Maybe it is worth
>>> building some infrastructure to support that?

>> It seems a reasonable thing to have to do.  It seems the minimum thing
>> we could do to ensure each Path is only mentioned in at most 1
>> RelOptInfo.

> ...
> I also didn't do anything about ExtensibleNode types. I assume just
> copying the ExtensibleNode isn't good enough. To flat copy the actual
> node I think would require adding a new function to
> ExtensibleNodeMethods.

Yeah, the problem I've got with this approach is that flat-copying
FDW and Custom paths would require extending the respective APIs.
While that's a perfectly reasonable ask if we only need to do this
in HEAD, it would be a nonstarter for released branches.  Is it
okay to only fix this issue in HEAD?

> I was also unsure what we should do when shallow copying a List.

The proposal is to shallow-copy a Path node.  List is not a kind
of Path, so how does List get into it?  (Lists below Paths would
not get copied, by definition.)

            regards, tom lane



Re: unrecognized node type while displaying a Path due to dangling pointer

От
David Rowley
Дата:
On Mon, 17 Jul 2023 at 15:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I also didn't do anything about ExtensibleNode types. I assume just
> > copying the ExtensibleNode isn't good enough. To flat copy the actual
> > node I think would require adding a new function to
> > ExtensibleNodeMethods.
>
> Yeah, the problem I've got with this approach is that flat-copying
> FDW and Custom paths would require extending the respective APIs.
> While that's a perfectly reasonable ask if we only need to do this
> in HEAD, it would be a nonstarter for released branches.  Is it
> okay to only fix this issue in HEAD?

CustomPaths, I didn't think about those. That certainly makes it more
complex.  I also now see the header comment for struct CustomPath
mentioning that we don't copy Paths:

 * Core code must avoid assuming that the CustomPath is only as large as
 * the structure declared here; providers are allowed to make it the first
 * element in a larger structure.  (Since the planner never copies Paths,
 * this doesn't add any complication.)  However, for consistency with the
 * FDW case, we provide a "custom_private" field in CustomPath; providers
 * may prefer to use that rather than define another struct type.

Are there any legitimate reasons to look at the input_rel's pathlist
again aside from debugging? I can't think of any. Perhaps back
branches can be fixed by just emptying the path lists and NULLifying
the cheapest paths as you mentioned last week.

> > I was also unsure what we should do when shallow copying a List.
>
> The proposal is to shallow-copy a Path node.  List is not a kind
> of Path, so how does List get into it?  (Lists below Paths would
> not get copied, by definition.)

The patch contained infrastructure to copy any Node type. Not just
Paths. Perhaps that's more than what's needed, but it seemed more
effort to limit it just to Path types than to make it "work" for all
Node types.

David.



Re: unrecognized node type while displaying a Path due to dangling pointer

От
Ashutosh Bapat
Дата:
On Tue, Jul 18, 2023 at 5:05 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Mon, 17 Jul 2023 at 15:31, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > I also didn't do anything about ExtensibleNode types. I assume just
> > > copying the ExtensibleNode isn't good enough. To flat copy the actual
> > > node I think would require adding a new function to
> > > ExtensibleNodeMethods.
> >
> > Yeah, the problem I've got with this approach is that flat-copying
> > FDW and Custom paths would require extending the respective APIs.
> > While that's a perfectly reasonable ask if we only need to do this
> > in HEAD, it would be a nonstarter for released branches.  Is it
> > okay to only fix this issue in HEAD?
>
> CustomPaths, I didn't think about those. That certainly makes it more
> complex.  I also now see the header comment for struct CustomPath
> mentioning that we don't copy Paths:

Somewhere upthread Tom suggested using a dummy projection path. Add a
projection path on top of input path and add the projection path to
output rel's list. That will work right?

There's some shallow copying code in reparameterize_path_by_childrel()
but that's very specific to the purpose there and doesn't consider
Custom or Foreign paths.

--
Best Wishes,
Ashutosh Bapat



Re: unrecognized node type while displaying a Path due to dangling pointer

От
Ashutosh Bapat
Дата:
On Tue, Jul 11, 2023 at 4:30 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Jeevan Chalke <jeevan.chalke@enterprisedb.com> writes:
> > Attached patch.
>
> I would be astonished if this fixes anything.  The code still doesn't
> know which paths are referenced by which other ones, and so the place
> where we free a previously-added path can't know what to do.
>
> I've speculated about adding some form of reference counting to paths
> (maybe just a "pin" flag rather than a full refcount) so that we could
> be smarter about this.  The existing kluge for "don't free IndexPaths"
> could be replaced by setting the pin mark on any IndexPath that we
> make a bitmap path from.  Up to now it hasn't seemed necessary to
> generalize that hack, but maybe it's time.  Can you show a concrete
> case where we are freeing a still-referenced path?

Set of patches in [1] add infrastructure to reference, link and unlink
paths.The patches are raw and have some TODOs there. But I think that
infrastructure will solve this problem as a side effect. Please take a
look and let me know if this is as per your speculation. It's more
than just pinning though.

The patch set uses references to free memory consumed by paths which
remain unused. The memory consumed is substantial when partitionwise
join is used and there are thousands of partitions.

[1] https://www.postgresql.org/message-id/CAExHW5tUcVsBkq9qT%3DL5vYz4e-cwQNw%3DKAGJrtSyzOp3F%3DXacA%40mail.gmail.com

--
Best Wishes,
Ashutosh Bapat