Обсуждение: Parent/child context relation in pg_get_backend_memory_contexts()

Поиск
Список
Период
Сортировка

Parent/child context relation in pg_get_backend_memory_contexts()

От
Melih Mutlu
Дата:
Hi hackers,

pg_get_backend_memory_contexts() (and pg_backend_memory_contexts view)
does not display parent/child relation between contexts reliably.
Current version of this function only shows the name of parent context
for each context. The issue here is that it's not guaranteed that
context names are unique. So, this makes it difficult to find the
correct parent of a context.

How can knowing the correct parent context be useful? One important
use-case can be that it would allow us to sum up all the space used by
a particular context and all other subcontexts which stem from that
context.
Calculating this sum is helpful since currently
(total/used/free)_bytes returned by this function does not include
child contexts. For this reason, only looking into the related row in
pg_backend_memory_contexts does not help us to understand how many
bytes that context is actually taking.

Simplest approach to solve this could be just adding two new fields,
id and parent_id, in pg_get_backend_memory_contexts() and ensuring
each context has a unique id. This way allows us to build a correct
memory context "tree".

Please see the attached patch which introduces those two fields.
Couldn't find an existing unique identifier to use. The patch simply
assigns an id during the execution of
pg_get_backend_memory_contexts() and does not store those id's
anywhere. This means that these id's may be different in each call.

With this change, here's a query to find how much space used by each
context including its children:

> WITH RECURSIVE cte AS (
>     SELECT id, total_bytes, id as root, name as root_name
>     FROM memory_contexts
> UNION ALL
>     SELECT r.id, r.total_bytes, cte.root, cte.root_name
>     FROM memory_contexts r
>     INNER JOIN cte ON r.parent_id = cte.id
> ),
> memory_contexts AS (
>     SELECT * FROM pg_backend_memory_contexts
> )
> SELECT root as id, root_name as name, sum(total_bytes)
> FROM cte
> GROUP BY root, root_name
> ORDER BY sum DESC;


You should see that TopMemoryContext is the one with highest allocated
space since all other contexts are simply created under
TopMemoryContext.


Also; even though having a correct link between parent/child contexts
can be useful to find out many other things as well by only writing
SQL queries, it might require complex recursive queries similar to the
one in case of total_bytes including children. Maybe, we can also
consider adding such frequently used and/or useful information as new
fields in pg_get_backend_memory_contexts() too.


I appreciate any comment/feedback on this.

Thanks,
-- 
Melih Mutlu
Microsoft

Вложения

Re: Parent/child context relation in pg_get_backend_memory_contexts()

От
Melih Mutlu
Дата:
Hi hackers,


Melih Mutlu <m.melihmutlu@gmail.com>, 16 Haz 2023 Cum, 17:03 tarihinde şunu yazdı:
With this change, here's a query to find how much space used by each
context including its children:

> WITH RECURSIVE cte AS (
>     SELECT id, total_bytes, id as root, name as root_name
>     FROM memory_contexts
> UNION ALL
>     SELECT r.id, r.total_bytes, cte.root, cte.root_name
>     FROM memory_contexts r
>     INNER JOIN cte ON r.parent_id = cte.id
> ),
> memory_contexts AS (
>     SELECT * FROM pg_backend_memory_contexts
> )
> SELECT root as id, root_name as name, sum(total_bytes)
> FROM cte
> GROUP BY root, root_name
> ORDER BY sum DESC;

Given that the above query to get total bytes including all children is still a complex one, I decided to add an additional info in pg_backend_memory_contexts.
The new "path" field displays an integer array that consists of ids of all parents for the current context. This way it's easier to tell whether a context is a child of another context, and we don't need to use recursive queries to get this info.

Here how pg_backend_memory_contexts would look like with this patch:

postgres=# SELECT name, id, parent, parent_id, path
FROM pg_backend_memory_contexts
ORDER BY total_bytes DESC LIMIT 10;
          name           | id  |      parent      | parent_id |     path
-------------------------+-----+------------------+-----------+--------------
 CacheMemoryContext      |  27 | TopMemoryContext |         0 | {0}
 Timezones               | 124 | TopMemoryContext |         0 | {0}
 TopMemoryContext        |   0 |                  |           |
 MessageContext          |   8 | TopMemoryContext |         0 | {0}
 WAL record construction | 118 | TopMemoryContext |         0 | {0}
 ExecutorState           |  18 | PortalContext    |        17 | {0,16,17}
 TupleSort main          |  19 | ExecutorState    |        18 | {0,16,17,18}
 TransactionAbortContext |  14 | TopMemoryContext |         0 | {0}
 smgr relation table     |  10 | TopMemoryContext |         0 | {0}
 GUC hash table          | 123 | GUCMemoryContext |       122 | {0,122}
(10 rows)



An example query to calculate the total_bytes including its children for a context (say CacheMemoryContext) would look like this:

WITH contexts AS (
  SELECT * FROM pg_backend_memory_contexts
)
SELECT sum(total_bytes) 
FROM contexts 
WHERE ARRAY[(SELECT id FROM contexts WHERE name = 'CacheMemoryContext')] <@ path;


We still need to use cte since ids are not persisted and might change in each run of pg_backend_memory_contexts. Materializing the result can prevent any inconsistencies due to id change. Also it can be even good for performance reasons as well.

Any thoughts?

Thanks,
--
Melih Mutlu
Microsoft
Вложения

Re: Parent/child context relation in pg_get_backend_memory_contexts()

От
Andres Freund
Дата:
Hi,

On 2023-08-04 21:16:49 +0300, Melih Mutlu wrote:
> Melih Mutlu <m.melihmutlu@gmail.com>, 16 Haz 2023 Cum, 17:03 tarihinde şunu
> yazdı:
> 
> > With this change, here's a query to find how much space used by each
> > context including its children:
> >
> > > WITH RECURSIVE cte AS (
> > >     SELECT id, total_bytes, id as root, name as root_name
> > >     FROM memory_contexts
> > > UNION ALL
> > >     SELECT r.id, r.total_bytes, cte.root, cte.root_name
> > >     FROM memory_contexts r
> > >     INNER JOIN cte ON r.parent_id = cte.id
> > > ),
> > > memory_contexts AS (
> > >     SELECT * FROM pg_backend_memory_contexts
> > > )
> > > SELECT root as id, root_name as name, sum(total_bytes)
> > > FROM cte
> > > GROUP BY root, root_name
> > > ORDER BY sum DESC;
> >
> 
> Given that the above query to get total bytes including all children is
> still a complex one, I decided to add an additional info in
> pg_backend_memory_contexts.
> The new "path" field displays an integer array that consists of ids of all
> parents for the current context. This way it's easier to tell whether a
> context is a child of another context, and we don't need to use recursive
> queries to get this info.

I think that does make it a good bit easier. Both to understand and to use.



> Here how pg_backend_memory_contexts would look like with this patch:
> 
> postgres=# SELECT name, id, parent, parent_id, path
> FROM pg_backend_memory_contexts
> ORDER BY total_bytes DESC LIMIT 10;
>           name           | id  |      parent      | parent_id |     path
> -------------------------+-----+------------------+-----------+--------------
>  CacheMemoryContext      |  27 | TopMemoryContext |         0 | {0}
>  Timezones               | 124 | TopMemoryContext |         0 | {0}
>  TopMemoryContext        |   0 |                  |           |
>  MessageContext          |   8 | TopMemoryContext |         0 | {0}
>  WAL record construction | 118 | TopMemoryContext |         0 | {0}
>  ExecutorState           |  18 | PortalContext    |        17 | {0,16,17}
>  TupleSort main          |  19 | ExecutorState    |        18 | {0,16,17,18}
>  TransactionAbortContext |  14 | TopMemoryContext |         0 | {0}
>  smgr relation table     |  10 | TopMemoryContext |         0 | {0}
>  GUC hash table          | 123 | GUCMemoryContext |       122 | {0,122}
> (10 rows)

Would we still need the parent_id column?


> +
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>context_id</structfield> <type>int4</type>
> +      </para>
> +      <para>
> +       Current context id
> +      </para></entry>
> +     </row>

I think the docs here need to warn that the id is ephemeral and will likely
differ in the next invocation.


> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>parent_id</structfield> <type>int4</type>
> +      </para>
> +      <para>
> +       Parent context id
> +      </para></entry>
> +     </row>
> +
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>path</structfield> <type>int4</type>
> +      </para>
> +      <para>
> +       Path to reach the current context from TopMemoryContext
> +      </para></entry>
> +     </row>

Perhaps we should include some hint here how it could be used?


>      </tbody>
>     </tgroup>
>    </table>
> diff --git a/src/backend/utils/adt/mcxtfuncs.c b/src/backend/utils/adt/mcxtfuncs.c
> index 92ca5b2f72..81cb35dd47 100644
> --- a/src/backend/utils/adt/mcxtfuncs.c
> +++ b/src/backend/utils/adt/mcxtfuncs.c
> @@ -20,6 +20,7 @@
>  #include "mb/pg_wchar.h"
>  #include "storage/proc.h"
>  #include "storage/procarray.h"
> +#include "utils/array.h"
>  #include "utils/builtins.h"
>  
>  /* ----------
> @@ -28,6 +29,8 @@
>   */
>  #define MEMORY_CONTEXT_IDENT_DISPLAY_SIZE    1024
>  
> +static Datum convert_path_to_datum(List *path);
> +
>  /*
>   * PutMemoryContextsStatsTupleStore
>   *        One recursion level for pg_get_backend_memory_contexts.
> @@ -35,9 +38,10 @@
>  static void
>  PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
>                                   TupleDesc tupdesc, MemoryContext context,
> -                                 const char *parent, int level)
> +                                 const char *parent, int level, int *context_id,
> +                                 int parent_id, List *path)
>  {
> -#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS    9
> +#define PG_GET_BACKEND_MEMORY_CONTEXTS_COLS    12
>  
>      Datum        values[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
>      bool        nulls[PG_GET_BACKEND_MEMORY_CONTEXTS_COLS];
> @@ -45,6 +49,7 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
>      MemoryContext child;
>      const char *name;
>      const char *ident;
> +    int  current_context_id = (*context_id)++;
>  
>      Assert(MemoryContextIsValid(context));
>  
> @@ -103,13 +108,29 @@ PutMemoryContextsStatsTupleStore(Tuplestorestate *tupstore,
>      values[6] = Int64GetDatum(stat.freespace);
>      values[7] = Int64GetDatum(stat.freechunks);
>      values[8] = Int64GetDatum(stat.totalspace - stat.freespace);
> +    values[9] = Int32GetDatum(current_context_id);
> +
> +    if(parent_id < 0)
> +        /* TopMemoryContext has no parent context */
> +        nulls[10] = true;
> +    else
> +        values[10] = Int32GetDatum(parent_id);
> +
> +    if (path == NIL)
> +        nulls[11] = true;
> +    else
> +        values[11] = convert_path_to_datum(path);
> +
>      tuplestore_putvalues(tupstore, tupdesc, values, nulls);
>  
> +    path = lappend_int(path, current_context_id);
>      for (child = context->firstchild; child != NULL; child = child->nextchild)
>      {
> -        PutMemoryContextsStatsTupleStore(tupstore, tupdesc,
> -                                         child, name, level + 1);
> +        PutMemoryContextsStatsTupleStore(tupstore, tupdesc, child, name,
> +                                         level+1, context_id,
> +                                         current_context_id, path);
>      }
> +    path = list_delete_last(path);
>  }
>  
>  /*
> @@ -120,10 +141,15 @@ Datum
>  pg_get_backend_memory_contexts(PG_FUNCTION_ARGS)
>  {
>      ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo;
> +    int context_id = 0;
> +    List *path = NIL;
> +
> +    elog(LOG, "pg_get_backend_memory_contexts called");
>  
>      InitMaterializedSRF(fcinfo, 0);
>      PutMemoryContextsStatsTupleStore(rsinfo->setResult, rsinfo->setDesc,
> -                                     TopMemoryContext, NULL, 0);
> +                                     TopMemoryContext, NULL, 0, &context_id,
> +                                     -1, path);
>  
>      return (Datum) 0;
>  }
> @@ -193,3 +219,26 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
>  
>      PG_RETURN_BOOL(true);
>  }
> +
> +/*
> + * Convert a list of context ids to a int[] Datum
> + */
> +static Datum
> +convert_path_to_datum(List *path)
> +{
> +    Datum       *datum_array;
> +    int            length;
> +    ArrayType  *result_array;
> +    ListCell   *lc;
> +
> +    length = list_length(path);
> +    datum_array = (Datum *) palloc(length * sizeof(Datum));
> +    length = 0;
> +    foreach(lc, path)
> +    {
> +        datum_array[length++] = Int32GetDatum((int) lfirst_int(lc));

The "(int)" in front of lfirst_int() seems redundant?


I think it'd be good to have some minimal test for this. E.g. checking that
there's multiple contexts below cache memory context or such.

Greetings,

Andres Freund



Re: Parent/child context relation in pg_get_backend_memory_contexts()

От
Stephen Frost
Дата:
Greetings,

* Melih Mutlu (m.melihmutlu@gmail.com) wrote:
> Melih Mutlu <m.melihmutlu@gmail.com>, 16 Haz 2023 Cum, 17:03 tarihinde şunu
> yazdı:
>
> > With this change, here's a query to find how much space used by each
> > context including its children:
> >
> > > WITH RECURSIVE cte AS (
> > >     SELECT id, total_bytes, id as root, name as root_name
> > >     FROM memory_contexts
> > > UNION ALL
> > >     SELECT r.id, r.total_bytes, cte.root, cte.root_name
> > >     FROM memory_contexts r
> > >     INNER JOIN cte ON r.parent_id = cte.id
> > > ),
> > > memory_contexts AS (
> > >     SELECT * FROM pg_backend_memory_contexts
> > > )
> > > SELECT root as id, root_name as name, sum(total_bytes)
> > > FROM cte
> > > GROUP BY root, root_name
> > > ORDER BY sum DESC;
>
> Given that the above query to get total bytes including all children is
> still a complex one, I decided to add an additional info in
> pg_backend_memory_contexts.
> The new "path" field displays an integer array that consists of ids of all
> parents for the current context. This way it's easier to tell whether a
> context is a child of another context, and we don't need to use recursive
> queries to get this info.

Nice, this does seem quite useful.

> Here how pg_backend_memory_contexts would look like with this patch:
>
> postgres=# SELECT name, id, parent, parent_id, path
> FROM pg_backend_memory_contexts
> ORDER BY total_bytes DESC LIMIT 10;
>           name           | id  |      parent      | parent_id |     path
> -------------------------+-----+------------------+-----------+--------------
>  CacheMemoryContext      |  27 | TopMemoryContext |         0 | {0}
>  Timezones               | 124 | TopMemoryContext |         0 | {0}
>  TopMemoryContext        |   0 |                  |           |
>  MessageContext          |   8 | TopMemoryContext |         0 | {0}
>  WAL record construction | 118 | TopMemoryContext |         0 | {0}
>  ExecutorState           |  18 | PortalContext    |        17 | {0,16,17}
>  TupleSort main          |  19 | ExecutorState    |        18 | {0,16,17,18}
>  TransactionAbortContext |  14 | TopMemoryContext |         0 | {0}
>  smgr relation table     |  10 | TopMemoryContext |         0 | {0}
>  GUC hash table          | 123 | GUCMemoryContext |       122 | {0,122}
> (10 rows)
>
> An example query to calculate the total_bytes including its children for a
> context (say CacheMemoryContext) would look like this:
>
> WITH contexts AS (
> SELECT * FROM pg_backend_memory_contexts
> )
> SELECT sum(total_bytes)
> FROM contexts
> WHERE ARRAY[(SELECT id FROM contexts WHERE name = 'CacheMemoryContext')] <@
> path;

I wonder if we should perhaps just include
"total_bytes_including_children" as another column?  Certainly seems
like a very useful thing that folks would like to see.  We could do that
either with C, or even something as simple as changing the view to do
something like:

WITH contexts AS MATERIALIZED (
  SELECT * FROM pg_get_backend_memory_contexts()
)
SELECT
  *,
  coalesce
  (
    (
      (SELECT sum(total_bytes) FROM contexts WHERE ARRAY[a.id] <@ path)
      + total_bytes
    ),
    total_bytes
  ) AS total_bytes_including_children
FROM contexts a;

> We still need to use cte since ids are not persisted and might change in
> each run of pg_backend_memory_contexts. Materializing the result can
> prevent any inconsistencies due to id change. Also it can be even good for
> performance reasons as well.

I don't think we really want this to be materialized, do we?  Where this
is particularly interesting is when it's being dumped to the log ( ...
though I wish we could do better than that and hope we do in the future)
while something is ongoing in a given backend and if we do that a few
times we are able to see what's changing in terms of allocations,
whereas if we materialized it (when?  transaction start?  first time
it's asked for?) then we'd only ever get the one view from whenever the
snapshot was taken.

> Any thoughts?

Generally +1 from me for working on improving this.

Thanks!

Stephen

Вложения

Re: Parent/child context relation in pg_get_backend_memory_contexts()

От
Andres Freund
Дата:
Hi,

On 2023-10-18 15:53:30 -0400, Stephen Frost wrote:
> > Here how pg_backend_memory_contexts would look like with this patch:
> > 
> > postgres=# SELECT name, id, parent, parent_id, path
> > FROM pg_backend_memory_contexts
> > ORDER BY total_bytes DESC LIMIT 10;
> >           name           | id  |      parent      | parent_id |     path
> > -------------------------+-----+------------------+-----------+--------------
> >  CacheMemoryContext      |  27 | TopMemoryContext |         0 | {0}
> >  Timezones               | 124 | TopMemoryContext |         0 | {0}
> >  TopMemoryContext        |   0 |                  |           |
> >  MessageContext          |   8 | TopMemoryContext |         0 | {0}
> >  WAL record construction | 118 | TopMemoryContext |         0 | {0}
> >  ExecutorState           |  18 | PortalContext    |        17 | {0,16,17}
> >  TupleSort main          |  19 | ExecutorState    |        18 | {0,16,17,18}
> >  TransactionAbortContext |  14 | TopMemoryContext |         0 | {0}
> >  smgr relation table     |  10 | TopMemoryContext |         0 | {0}
> >  GUC hash table          | 123 | GUCMemoryContext |       122 | {0,122}
> > (10 rows)
> > 
> > An example query to calculate the total_bytes including its children for a
> > context (say CacheMemoryContext) would look like this:
> > 
> > WITH contexts AS (
> > SELECT * FROM pg_backend_memory_contexts
> > )
> > SELECT sum(total_bytes)
> > FROM contexts
> > WHERE ARRAY[(SELECT id FROM contexts WHERE name = 'CacheMemoryContext')] <@
> > path;
> 
> I wonder if we should perhaps just include
> "total_bytes_including_children" as another column?  Certainly seems
> like a very useful thing that folks would like to see.

The "issue" is where to stop - should we also add that for some of the other
columns? They are a bit less important, but not that much.


> > We still need to use cte since ids are not persisted and might change in
> > each run of pg_backend_memory_contexts. Materializing the result can
> > prevent any inconsistencies due to id change. Also it can be even good for
> > performance reasons as well.
> 
> I don't think we really want this to be materialized, do we?  Where this
> is particularly interesting is when it's being dumped to the log ( ...
> though I wish we could do better than that and hope we do in the future)
> while something is ongoing in a given backend and if we do that a few
> times we are able to see what's changing in terms of allocations,
> whereas if we materialized it (when?  transaction start?  first time
> it's asked for?) then we'd only ever get the one view from whenever the
> snapshot was taken.

I think the comment was just about the need to use a CTE, because self-joining
with divergent versions of pg_backend_memory_contexts would not always work
out well.

Greetings,

Andres Freund



Re: Parent/child context relation in pg_get_backend_memory_contexts()

От
Stephen Frost
Дата:
Greetings,

* Andres Freund (andres@anarazel.de) wrote:
> On 2023-10-18 15:53:30 -0400, Stephen Frost wrote:
> > > Here how pg_backend_memory_contexts would look like with this patch:
> > >
> > > postgres=# SELECT name, id, parent, parent_id, path
> > > FROM pg_backend_memory_contexts
> > > ORDER BY total_bytes DESC LIMIT 10;
> > >           name           | id  |      parent      | parent_id |     path
> > > -------------------------+-----+------------------+-----------+--------------
> > >  CacheMemoryContext      |  27 | TopMemoryContext |         0 | {0}
> > >  Timezones               | 124 | TopMemoryContext |         0 | {0}
> > >  TopMemoryContext        |   0 |                  |           |
> > >  MessageContext          |   8 | TopMemoryContext |         0 | {0}
> > >  WAL record construction | 118 | TopMemoryContext |         0 | {0}
> > >  ExecutorState           |  18 | PortalContext    |        17 | {0,16,17}
> > >  TupleSort main          |  19 | ExecutorState    |        18 | {0,16,17,18}
> > >  TransactionAbortContext |  14 | TopMemoryContext |         0 | {0}
> > >  smgr relation table     |  10 | TopMemoryContext |         0 | {0}
> > >  GUC hash table          | 123 | GUCMemoryContext |       122 | {0,122}
> > > (10 rows)
> > >
> > > An example query to calculate the total_bytes including its children for a
> > > context (say CacheMemoryContext) would look like this:
> > >
> > > WITH contexts AS (
> > > SELECT * FROM pg_backend_memory_contexts
> > > )
> > > SELECT sum(total_bytes)
> > > FROM contexts
> > > WHERE ARRAY[(SELECT id FROM contexts WHERE name = 'CacheMemoryContext')] <@
> > > path;
> >
> > I wonder if we should perhaps just include
> > "total_bytes_including_children" as another column?  Certainly seems
> > like a very useful thing that folks would like to see.
>
> The "issue" is where to stop - should we also add that for some of the other
> columns? They are a bit less important, but not that much.

I'm not sure the others really make sense to aggregate in this way as
free space isn't able to be moved between contexts.  That said, if
someone wants it then I'm not against that.  I'm actively in support of
adding an aggregated total though as that, at least to me, seems to be
very useful to have.

Thanks,

Stephen

Вложения

Re: Parent/child context relation in pg_get_backend_memory_contexts()

От
Melih Mutlu
Дата:
Hi,

Thanks for reviewing.
Attached the updated patch v3.

Andres Freund <andres@anarazel.de>, 12 Eki 2023 Per, 19:23 tarihinde şunu yazdı:
> Here how pg_backend_memory_contexts would look like with this patch:
>
> postgres=# SELECT name, id, parent, parent_id, path
> FROM pg_backend_memory_contexts
> ORDER BY total_bytes DESC LIMIT 10;
>           name           | id  |      parent      | parent_id |     path
> -------------------------+-----+------------------+-----------+--------------
>  CacheMemoryContext      |  27 | TopMemoryContext |         0 | {0}
>  Timezones               | 124 | TopMemoryContext |         0 | {0}
>  TopMemoryContext        |   0 |                  |           |
>  MessageContext          |   8 | TopMemoryContext |         0 | {0}
>  WAL record construction | 118 | TopMemoryContext |         0 | {0}
>  ExecutorState           |  18 | PortalContext    |        17 | {0,16,17}
>  TupleSort main          |  19 | ExecutorState    |        18 | {0,16,17,18}
>  TransactionAbortContext |  14 | TopMemoryContext |         0 | {0}
>  smgr relation table     |  10 | TopMemoryContext |         0 | {0}
>  GUC hash table          | 123 | GUCMemoryContext |       122 | {0,122}
> (10 rows)

Would we still need the parent_id column?

I guess not. Assuming the path column is sorted from TopMemoryContext to the parent one level above, parent_id can be found using the path column if needed.
Removed parent_id.


> +
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>context_id</structfield> <type>int4</type>
> +      </para>
> +      <para>
> +       Current context id
> +      </para></entry>
> +     </row>

I think the docs here need to warn that the id is ephemeral and will likely
differ in the next invocation.

Done.

> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>parent_id</structfield> <type>int4</type>
> +      </para>
> +      <para>
> +       Parent context id
> +      </para></entry>
> +     </row>
> +
> +     <row>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>path</structfield> <type>int4</type>
> +      </para>
> +      <para>
> +       Path to reach the current context from TopMemoryContext
> +      </para></entry>
> +     </row>

Perhaps we should include some hint here how it could be used?

I added more explanation but not sure if that is what you asked for. Do you want a hint that is related to a more specific use case?

> +     length = list_length(path);
> +     datum_array = (Datum *) palloc(length * sizeof(Datum));
> +     length = 0;
> +     foreach(lc, path)
> +     {
> +             datum_array[length++] = Int32GetDatum((int) lfirst_int(lc));

The "(int)" in front of lfirst_int() seems redundant?

Removed.

I think it'd be good to have some minimal test for this. E.g. checking that
there's multiple contexts below cache memory context or such.

Added new tests in sysview.sql.


Stephen Frost <sfrost@snowman.net>, 18 Eki 2023 Çar, 22:53 tarihinde şunu yazdı:
I wonder if we should perhaps just include
"total_bytes_including_children" as another column?  Certainly seems
like a very useful thing that folks would like to see.  We could do that
either with C, or even something as simple as changing the view to do
something like:

WITH contexts AS MATERIALIZED (
  SELECT * FROM pg_get_backend_memory_contexts()
)
SELECT
  *,
  coalesce
  (
    (
      (SELECT sum(total_bytes) FROM contexts WHERE ARRAY[a.id] <@ path)
      + total_bytes
    ),
    total_bytes
  ) AS total_bytes_including_children
FROM contexts a;

I added a "total_bytes_including_children" column as you suggested. Did that with C since it seemed faster than doing it by changing the view.

-- Calculating total_bytes_including_children by modifying the view
postgres=# select * from pg_backend_memory_contexts ;
Time: 30.462 ms

-- Calculating total_bytes_including_children with C
postgres=# select * from pg_backend_memory_contexts ;
Time: 1.511 ms


Thanks,
--
Melih Mutlu
Microsoft
Вложения

Re: Parent/child context relation in pg_get_backend_memory_contexts()

От
torikoshia
Дата:
Thanks for working on this improvement!

On 2023-10-23 21:02, Melih Mutlu wrote:
> Hi,
> 
> Thanks for reviewing.
> Attached the updated patch v3.

I reviewed v3 patch and here are some minor comments:

> +     <row>
> +      <entry role="catalog_table_entry"><para 
> role="column_definition">
> +       <structfield>path</structfield> <type>int4</type>

Should 'int4' be 'int4[]'?
Other system catalog columns such as pg_groups.grolist distinguish 
whther the type is a array or not.

> +       Path to reach the current context from TopMemoryContext. 
> Context ids in
> +       this list represents all parents of the current context. This 
> can be
> +       used to build the parent and child relation.

It seems last "." is not necessary considering other explanations for 
each field end without it.

+                                const char *parent, int level, int 
*context_id,
+                                List *path, Size 
*total_bytes_inc_chidlren)

'chidlren' -> 'children'


+   elog(LOG, "pg_get_backend_memory_contexts called");

Is this message necessary?


There was warning when applying the patch:

   % git apply 
../patch/pg_backend_memory_context_refine/v3-0001-Adding-id-parent_id-into-pg_backend_memory_contex.patch
   
../patch/pg_backend_memory_context_refine/v3-0001-Adding-id-parent_id-into-pg_backend_memory_contex.patch:282: 
trailing whitespace.
   select count(*) > 0
   
../patch/pg_backend_memory_context_refine/v3-0001-Adding-id-parent_id-into-pg_backend_memory_contex.patch:283: 
trailing whitespace.
   from contexts
   warning: 2 lines add whitespace errors.

-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation



Re: Parent/child context relation in pg_get_backend_memory_contexts()

От
Melih Mutlu
Дата:
Hi,

Thanks for reviewing. Please find the updated patch attached.

torikoshia <torikoshia@oss.nttdata.com>, 4 Ara 2023 Pzt, 07:43 tarihinde şunu yazdı:
I reviewed v3 patch and here are some minor comments:

> +     <row>
> +      <entry role="catalog_table_entry"><para
> role="column_definition">
> +       <structfield>path</structfield> <type>int4</type>

Should 'int4' be 'int4[]'?
Other system catalog columns such as pg_groups.grolist distinguish
whther the type is a array or not.

Right! Done.
 

> +       Path to reach the current context from TopMemoryContext.
> Context ids in
> +       this list represents all parents of the current context. This
> can be
> +       used to build the parent and child relation.

It seems last "." is not necessary considering other explanations for
each field end without it.

Done.
 
+                                const char *parent, int level, int
*context_id,
+                                List *path, Size
*total_bytes_inc_chidlren)

'chidlren' -> 'children'

Done.


+   elog(LOG, "pg_get_backend_memory_contexts called");

Is this message necessary?

I guess I added this line for debugging and then forgot to remove. Now removed.

There was warning when applying the patch:

   % git apply
../patch/pg_backend_memory_context_refine/v3-0001-Adding-id-parent_id-into-pg_backend_memory_contex.patch

../patch/pg_backend_memory_context_refine/v3-0001-Adding-id-parent_id-into-pg_backend_memory_contex.patch:282:
trailing whitespace.
   select count(*) > 0

../patch/pg_backend_memory_context_refine/v3-0001-Adding-id-parent_id-into-pg_backend_memory_contex.patch:283:
trailing whitespace.
   from contexts
   warning: 2 lines add whitespace errors.

Fixed.

Thanks,
--
Melih Mutlu
Microsoft
Вложения

Re: Parent/child context relation in pg_get_backend_memory_contexts()

От
torikoshia
Дата:
On 2024-01-03 20:40, Melih Mutlu wrote:
> Hi,
> 
> Thanks for reviewing. Please find the updated patch attached.
> 
> torikoshia <torikoshia@oss.nttdata.com>, 4 Ara 2023 Pzt, 07:43
> tarihinde şunu yazdı:
> 
>> I reviewed v3 patch and here are some minor comments:
>> 
>>> +     <row>
>>> +      <entry role="catalog_table_entry"><para
>>> role="column_definition">
>>> +       <structfield>path</structfield> <type>int4</type>
>> 
>> Should 'int4' be 'int4[]'?
>> Other system catalog columns such as pg_groups.grolist distinguish
>> whther the type is a array or not.
> 
> Right! Done.
> 
>>> +       Path to reach the current context from TopMemoryContext.
>>> Context ids in
>>> +       this list represents all parents of the current context.
>> This
>>> can be
>>> +       used to build the parent and child relation.
>> 
>> It seems last "." is not necessary considering other explanations
>> for
>> each field end without it.
> 
> Done.
> 
>> +                                const char *parent, int level, int
>> *context_id,
>> +                                List *path, Size
>> *total_bytes_inc_chidlren)
>> 
>> 'chidlren' -> 'children'
> 
> Done.
> 
>> +   elog(LOG, "pg_get_backend_memory_contexts called");
>> 
>> Is this message necessary?
> 
> I guess I added this line for debugging and then forgot to remove. Now
> removed.
> 
>> There was warning when applying the patch:
>> 
>> % git apply
>> 
> ../patch/pg_backend_memory_context_refine/v3-0001-Adding-id-parent_id-into-pg_backend_memory_contex.patch
>> 
>> 
> ../patch/pg_backend_memory_context_refine/v3-0001-Adding-id-parent_id-into-pg_backend_memory_contex.patch:282:
>> 
>> trailing whitespace.
>> select count(*) > 0
>> 
>> 
> ../patch/pg_backend_memory_context_refine/v3-0001-Adding-id-parent_id-into-pg_backend_memory_contex.patch:283:
>> 
>> trailing whitespace.
>> from contexts
>> warning: 2 lines add whitespace errors.
> 
> Fixed.
> 
> Thanks,--
> 
> Melih Mutlu
> Microsoft

Thanks for updating the patch.

> +     <row>
> +      <entry role="catalog_table_entry"><para 
> role="column_definition">
> +       <structfield>context_id</structfield> <type>int4</type>
> +      </para>
> +      <para>
> +       Current context id. Note that the context id is a temporary id 
> and may
> +       change in each invocation
> +      </para></entry>
> +     </row>
> +
> +     <row>
> +      <entry role="catalog_table_entry"><para 
> role="column_definition">
> +       <structfield>path</structfield> <type>int4[]</type>
> +      </para>
> +      <para>
> +       Path to reach the current context from TopMemoryContext. 
> Context ids in
> +       this list represents all parents of the current context. This 
> can be
> +       used to build the parent and child relation
> +      </para></entry>
> +     </row>
> +
> +     <row>
> +      <entry role="catalog_table_entry"><para 
> role="column_definition">
> +       <structfield>total_bytes_including_children</structfield> 
> <type>int8</type>
> +      </para>
> +      <para>
> +       Total bytes allocated for this memory context including its 
> children
> +      </para></entry>
> +     </row>

These columns are currently added to the bottom of the table, but it may 
be better to put semantically similar items close together and change 
the insertion position with reference to other system views. For 
example,

- In pg_group and pg_user, 'id' is placed on the line following 'name', 
so 'context_id' be placed on the line following 'name'
- 'path' is similar with 'parent' and 'level' in that these are 
information about the location of the context, 'path' be placed to next 
to them.

If we do this, orders of columns in the system view should be the same, 
I think.


> +   ListCell   *lc;
> +
> +   length = list_length(path);
> +   datum_array = (Datum *) palloc(length * sizeof(Datum));
> +   length = 0;
> +   foreach(lc, path)
> +   {
> +       datum_array[length++] = Int32GetDatum(lfirst_int(lc));
> +   }

14dd0f27d have introduced new macro foreach_int.
It seems to be able to make the code a bit simpler and the commit log 
says this macro is primarily intended for use in new code. For example:

|    int id;
|
|    length = list_length(path);
|    datum_array = (Datum *) palloc(length * sizeof(Datum));
|    length = 0;
|    foreach_int(id, path)
|    {
|        datum_array[length++] = Int32GetDatum(id);
|    }



-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation



Re: Parent/child context relation in pg_get_backend_memory_contexts()

От
Melih Mutlu
Дата:
Hi,

Thanks for reviewing.

torikoshia <torikoshia@oss.nttdata.com>, 10 Oca 2024 Çar, 09:37 tarihinde şunu yazdı:
> +     <row>
> +      <entry role="catalog_table_entry"><para
> role="column_definition">
> +       <structfield>context_id</structfield> <type>int4</type>
> +      </para>
> +      <para>
> +       Current context id. Note that the context id is a temporary id
> and may
> +       change in each invocation
> +      </para></entry>
> +     </row>
> +
> +     <row>
> +      <entry role="catalog_table_entry"><para
> role="column_definition">
> +       <structfield>path</structfield> <type>int4[]</type>
> +      </para>
> +      <para>
> +       Path to reach the current context from TopMemoryContext.
> Context ids in
> +       this list represents all parents of the current context. This
> can be
> +       used to build the parent and child relation
> +      </para></entry>
> +     </row>
> +
> +     <row>
> +      <entry role="catalog_table_entry"><para
> role="column_definition">
> +       <structfield>total_bytes_including_children</structfield>
> <type>int8</type>
> +      </para>
> +      <para>
> +       Total bytes allocated for this memory context including its
> children
> +      </para></entry>
> +     </row>

These columns are currently added to the bottom of the table, but it may
be better to put semantically similar items close together and change
the insertion position with reference to other system views. For
example,

- In pg_group and pg_user, 'id' is placed on the line following 'name',
so 'context_id' be placed on the line following 'name'
- 'path' is similar with 'parent' and 'level' in that these are
information about the location of the context, 'path' be placed to next
to them.

If we do this, orders of columns in the system view should be the same,
I think.

I've done what you suggested. Also moved "total_bytes_including_children" right after "total_bytes".


14dd0f27d have introduced new macro foreach_int.
It seems to be able to make the code a bit simpler and the commit log
says this macro is primarily intended for use in new code. For example:

Makes sense. Done.

Thanks,
--
Melih Mutlu
Microsoft
Вложения

Re: Parent/child context relation in pg_get_backend_memory_contexts()

От
torikoshia
Дата:
On 2024-01-16 18:41, Melih Mutlu wrote:
> Hi,
> 
> Thanks for reviewing.
> 
> torikoshia <torikoshia@oss.nttdata.com>, 10 Oca 2024 Çar, 09:37
> tarihinde şunu yazdı:
> 
>>> +     <row>
>>> +      <entry role="catalog_table_entry"><para
>>> role="column_definition">
>>> +       <structfield>context_id</structfield> <type>int4</type>
>>> +      </para>
>>> +      <para>
>>> +       Current context id. Note that the context id is a
>> temporary id
>>> and may
>>> +       change in each invocation
>>> +      </para></entry>
>>> +     </row>
>>> +
>>> +     <row>
>>> +      <entry role="catalog_table_entry"><para
>>> role="column_definition">
>>> +       <structfield>path</structfield> <type>int4[]</type>
>>> +      </para>
>>> +      <para>
>>> +       Path to reach the current context from TopMemoryContext.
>>> Context ids in
>>> +       this list represents all parents of the current context.
>> This
>>> can be
>>> +       used to build the parent and child relation
>>> +      </para></entry>
>>> +     </row>
>>> +
>>> +     <row>
>>> +      <entry role="catalog_table_entry"><para
>>> role="column_definition">
>>> +       <structfield>total_bytes_including_children</structfield>
>>> <type>int8</type>
>>> +      </para>
>>> +      <para>
>>> +       Total bytes allocated for this memory context including
>> its
>>> children
>>> +      </para></entry>
>>> +     </row>
>> 
>> These columns are currently added to the bottom of the table, but it
>> may
>> be better to put semantically similar items close together and
>> change
>> the insertion position with reference to other system views. For
>> example,
>> 
>> - In pg_group and pg_user, 'id' is placed on the line following
>> 'name',
>> so 'context_id' be placed on the line following 'name'
>> - 'path' is similar with 'parent' and 'level' in that these are
>> information about the location of the context, 'path' be placed to
>> next
>> to them.
>> 
>> If we do this, orders of columns in the system view should be the
>> same,
>> I think.
> 
> I've done what you suggested. Also moved
> "total_bytes_including_children" right after "total_bytes".
> 
>> 14dd0f27d have introduced new macro foreach_int.
>> It seems to be able to make the code a bit simpler and the commit
>> log
>> says this macro is primarily intended for use in new code. For
>> example:
> 
> Makes sense. Done.

Thanks for updating the patch!

> +       Current context id. Note that the context id is a temporary id 
> and may
> +       change in each invocation
> +      </para></entry>
> +     </row>

It clearly states that the context id is temporary, but I am a little 
concerned about users who write queries that refer to this view multiple 
times without using CTE.

If you agree, how about adding some description like below you mentioned 
before?

> We still need to use cte since ids are not persisted and might change 
> in
> each run of pg_backend_memory_contexts. Materializing the result can
> prevent any inconsistencies due to id change. Also it can be even good 
> for
> performance reasons as well.

We already have additional description below the table which explains 
each column of the system view. For example pg_locks:
https://www.postgresql.org/docs/devel/view-pg-locks.html


Also giving an example query something like this might be useful.

   -- show all the parent context names of ExecutorState
   with contexts as (
     select * from pg_backend_memory_contexts
   )
   select name from contexts where array[context_id] <@ (select path from 
contexts where name = 'ExecutorState');


-- 
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation



Re: Parent/child context relation in pg_get_backend_memory_contexts()

От
Michael Paquier
Дата:
On Fri, Jan 19, 2024 at 05:41:45PM +0900, torikoshia wrote:
> We already have additional description below the table which explains each
> column of the system view. For example pg_locks:
> https://www.postgresql.org/docs/devel/view-pg-locks.html

I was reading the patch, and using int[] as a representation of the
path of context IDs up to the top-most parent looks a bit strange to
me, with the relationship between each parent -> child being
preserved, visibly, based on the order of the elements in this array
made of temporary IDs compiled on-the-fly during the function
execution.  Am I the only one finding that a bit strange?  Could it be
better to use a different data type for this path and perhaps switch
to the names of the contexts involved?

It is possible to retrieve this information some WITH RECURSIVE as
well, as mentioned upthread.  Perhaps we could consider documenting
these tricks?
--
Michael

Вложения

Re: Parent/child context relation in pg_get_backend_memory_contexts()

От
Melih Mutlu
Дата:
Hi,

Michael Paquier <michael@paquier.xyz>, 14 Şub 2024 Çar, 10:23 tarihinde şunu yazdı:
On Fri, Jan 19, 2024 at 05:41:45PM +0900, torikoshia wrote:
> We already have additional description below the table which explains each
> column of the system view. For example pg_locks:
> https://www.postgresql.org/docs/devel/view-pg-locks.html

I was reading the patch, and using int[] as a representation of the
path of context IDs up to the top-most parent looks a bit strange to
me, with the relationship between each parent -> child being
preserved, visibly, based on the order of the elements in this array
made of temporary IDs compiled on-the-fly during the function
execution.  Am I the only one finding that a bit strange?  Could it be
better to use a different data type for this path and perhaps switch
to the names of the contexts involved?

Do you find having the path column strange all together? Or only using temporary IDs to generate that column? The reason why I avoid using context names is because there can be multiple contexts with the same name. This makes it difficult to figure out which context, among those with that particular name, is actually included in the path. I couldn't find any other information that is unique to each context.

Thanks,
--
Melih Mutlu
Microsoft

Re: Parent/child context relation in pg_get_backend_memory_contexts()

От
Andres Freund
Дата:
Hi,

On 2024-02-14 16:23:38 +0900, Michael Paquier wrote:
> It is possible to retrieve this information some WITH RECURSIVE as well, as
> mentioned upthread.  Perhaps we could consider documenting these tricks?

I think it's sufficiently hard that it's not a reasonable way to do this.

Greetings,

Andres Freund



Re: Parent/child context relation in pg_get_backend_memory_contexts()

От
Michael Paquier
Дата:
On Wed, Apr 03, 2024 at 04:20:39PM +0300, Melih Mutlu wrote:
> Michael Paquier <michael@paquier.xyz>, 14 Şub 2024 Çar, 10:23 tarihinde
> şunu yazdı:
>> I was reading the patch, and using int[] as a representation of the
>> path of context IDs up to the top-most parent looks a bit strange to
>> me, with the relationship between each parent -> child being
>> preserved, visibly, based on the order of the elements in this array
>> made of temporary IDs compiled on-the-fly during the function
>> execution.  Am I the only one finding that a bit strange?  Could it be
>> better to use a different data type for this path and perhaps switch
>> to the names of the contexts involved?
>
> Do you find having the path column strange all together? Or only using
> temporary IDs to generate that column? The reason why I avoid using context
> names is because there can be multiple contexts with the same name. This
> makes it difficult to figure out which context, among those with that
> particular name, is actually included in the path. I couldn't find any
> other information that is unique to each context.

I've been re-reading the patch again to remember what this is about,
and I'm OK with having this "path" column in the catalog.  However,
I'm somewhat confused by the choice of having a temporary number that
shows up in the catalog representation, because this may not be
constant across multiple calls so this still requires a follow-up
temporary ID <-> name mapping in any SQL querying this catalog.  A
second thing is that array does not show the hierarchy of the path;
the patch relies on the order of the elements in the output array
instead.
--
Michael

Вложения

Re: Parent/child context relation in pg_get_backend_memory_contexts()

От
David Rowley
Дата:
On Thu, 4 Apr 2024 at 12:34, Michael Paquier <michael@paquier.xyz> wrote:
> I've been re-reading the patch again to remember what this is about,
> and I'm OK with having this "path" column in the catalog.  However,
> I'm somewhat confused by the choice of having a temporary number that
> shows up in the catalog representation, because this may not be
> constant across multiple calls so this still requires a follow-up
> temporary ID <-> name mapping in any SQL querying this catalog.  A
> second thing is that array does not show the hierarchy of the path;
> the patch relies on the order of the elements in the output array
> instead.

My view on this is that there are a couple of things with the patch
which could be considered separately:

1. Should we have a context_id in the view?
2. Should we also have an array of all parents?

My view is that we really need #1 as there's currently no reliable way
to determine a context's parent as the names are not unique.   I do
see that Melih has mentioned this is temporary in:

+      <para>
+       Current context id. Note that the context id is a temporary id and may
+       change in each invocation
+      </para></entry>

For #2, I'm a bit less sure about this. I know Andres would like to
see this array added, but equally WITH RECURSIVE would work.  Does the
array of parents completely eliminate the need for recursive queries?
I think the array works for anything that requires all parents or some
fixed (would be) recursive level, but there might be some other
condition to stop recursion other than the recursion level that
someone needs to do.    What I'm trying to get at is; do we need to
document the WITH RECURSIVE stuff anyway? and if we do, is it still
worth having the parents array?

David