Обсуждение: EXPLAIN BUFFERS and I/O timing accounting questions

Поиск
Список
Период
Сортировка

EXPLAIN BUFFERS and I/O timing accounting questions

От
Maciek Sakrejda
Дата:
Hello,

I ran across an EXPLAIN plan and had some questions about some of its details. The BUFFERS docs say

>The number of blocks shown for an upper-level node includes those used by all its child nodes.

I initially assumed this would be cumulative, but I realized it's probably not because some of the blocks affected by each child will actually overlap. But this particular plan has a Shared Hit Blocks at the root (an Aggregate) that is smaller than some of its children (three ModifyTables and a CTE Scan). This seems to contradict the documentation (since if children overlap fully in their buffers usage, the parent should still have a cost equal to the costliest child)--any idea what's up? I can send the whole plan (attached? inline? it's ~15kb) if that helps.

I also noticed the I/O Read Time (from track_io_timing) of two children in this plan is equal to the I/O Read Time in the root. Is I/O time potentially fully parallelized across children? There are no parallel workers according to the plan, so I'm surprised at this and would like to understand better.

Also, a tangential question: why is the top-level structure of a JSON plan an array? I've only ever seen one root node with a Plan key there.

Thanks,
Maciek

Re: EXPLAIN BUFFERS and I/O timing accounting questions

От
Maciek Sakrejda
Дата:
Also, I noticed that in this plan, the root (again, an Aggregate) has 0 Temp Read Blocks, but two of its children (two of the ModifyTable nodes) have non-zero Temp Read Blocks. Again, this contradicts the documentation, as these costs are stated to be cumulative. Any ideas?

Thanks,
Maciek

Re: EXPLAIN BUFFERS and I/O timing accounting questions

От
Andres Freund
Дата:
Hi,

On 2019-10-21 23:18:32 -0700, Maciek Sakrejda wrote:
> I ran across an EXPLAIN plan and had some questions about some of its
> details. The BUFFERS docs say
> 
> >The number of blocks shown for an upper-level node includes those used by
> all its child nodes.
> 
> I initially assumed this would be cumulative, but I realized it's probably
> not because some of the blocks affected by each child will actually
> overlap.

Note that the buffer access stats do *not* count the number of distinct
buffers accessed, but that they purely the number of buffer
accesses.

It'd be really expensive to count the number of distinct buffers
accessed, although I guess one could make it only expensive by using
something like hyperloglog (although that will still be hard, due to
buffer replacement etc).


> But this particular plan has a Shared Hit Blocks at the root (an
> Aggregate) that is smaller than some of its children (three ModifyTables
> and a CTE Scan).

Do you have an example?  I assume what's going on is that the cost of
the CTE is actually attributed (in equal parts or something like that)
to all places using the CTE. Do the numbers add up if you just exclude
the CTE?


> This seems to contradict the documentation (since if
> children overlap fully in their buffers usage, the parent should still have
> a cost equal to the costliest child)--any idea what's up? I can send the
> whole plan (attached? inline? it's ~15kb) if that helps.

Or just relevant top-level excerpts.


> Also, a tangential question: why is the top-level structure of a JSON plan
> an array? I've only ever seen one root node with a Plan key there.

IIRC one can get multiple plans when there's a DO ALSO rule. There might
be other ways to get there too.

Greetings,

Andres Freund



Re: EXPLAIN BUFFERS and I/O timing accounting questions

От
Maciek Sakrejda
Дата:
On Thu, Oct 24, 2019 at 2:25 PM Andres Freund <andres@anarazel.de> wrote:
> Note that the buffer access stats do *not* count the number of distinct
> buffers accessed, but that they purely the number of buffer
> accesses.

You mean, even within a single node? That is, if a node hits a block ten times, that counts as ten blocks hit? And if it reads a block and then needs it three more times, that's one read plus three hit?

> Do you have an example?

Sure, here's the "abridged" plan:

[{ "Plan": {
  "Node Type": "Aggregate",
  "Plan Rows": 1,
  "Plan Width": 8,
  "Total Cost": 26761745.14,
  "Actual Rows": 1,
  "I/O Read Time": 234129.299,
  "I/O Write Time": 0,
  "Local Hit Blocks": 0,
  "Local Read Blocks": 0,
  "Local Dirtied Blocks": 0,
  "Local Written Blocks": 0,
  "Shared Hit Blocks": 4847762,
  "Shared Read Blocks": 1626312,
  "Shared Dirtied Blocks": 541014,
  "Shared Written Blocks": 0,
  "Temp Read Blocks": 0,
  "Temp Written Blocks": 4786,
  "Plans": [
    {
      "Node Type": "ModifyTable",
      "Operation": "Delete",
      "Parent Relationship": "InitPlan",
      "Plan Rows": 13943446,
      "Plan Width": 6,
      "Total Cost": 25774594.63,
      "Actual Rows": 2178416,
      "I/O Read Time": 234129.299,
      "I/O Write Time": 0,
      "Local Hit Blocks": 0,
      "Local Read Blocks": 0,
      "Local Dirtied Blocks": 0,
      "Local Written Blocks": 0,
      "Shared Hit Blocks": 4847762,
      "Shared Read Blocks": 1626312,
      "Shared Dirtied Blocks": 541014,
      "Shared Written Blocks": 0,
      "Temp Read Blocks": 0,
      "Temp Written Blocks": 0,
      "Plans": "<elided>"
    },
    {
      "Node Type": "ModifyTable",
      "Operation": "Delete",
      "Parent Relationship": "InitPlan",
      "Plan Rows": 63897788,
      "Plan Width": 38,
      "Total Cost": 315448.53,
      "Actual Rows": 0,
      "I/O Read Time": 30529.231,
      "I/O Write Time": 0,
      "Local Hit Blocks": 0,
      "Local Read Blocks": 0,
      "Local Dirtied Blocks": 0,
      "Local Written Blocks": 0,
      "Shared Hit Blocks": 12964205,
      "Shared Read Blocks": 83260,
      "Shared Dirtied Blocks": 48256,
      "Shared Written Blocks": 0,
      "Temp Read Blocks": 4788,
      "Temp Written Blocks": 0,
      "Plans": "<elided>"
    },
    {
      "Node Type": "ModifyTable",
      "Operation": "Delete",
      "Parent Relationship": "InitPlan",
      "Plan Rows": 45657680,
      "Plan Width": 38,
      "Total Cost": 357974.43,
      "Actual Rows": 0,
      "I/O Read Time": 24260.512,
      "I/O Write Time": 0,
      "Local Hit Blocks": 0,
      "Local Read Blocks": 0,
      "Local Dirtied Blocks": 0,
      "Local Written Blocks": 0,
      "Shared Hit Blocks": 10521264,
      "Shared Read Blocks": 64450,
      "Shared Dirtied Blocks": 36822,
      "Shared Written Blocks": 0,
      "Temp Read Blocks": 4788,
      "Temp Written Blocks": 1,
      "Plans": "<elided>"
    },
    {
      "Node Type": "CTE Scan",
      "Parent Relationship": "Outer",
      "Plan Rows": 13943446,
      "Plan Width": 8,
      "Total Cost": 278868.92,
      "Actual Rows": 2178416,
      "I/O Read Time": 234129.299,
      "I/O Write Time": 0,
      "Local Hit Blocks": 0,
      "Local Read Blocks": 0,
      "Local Dirtied Blocks": 0,
      "Local Written Blocks": 0,
      "Shared Hit Blocks": 4847762,
      "Shared Read Blocks": 1626312,
      "Shared Dirtied Blocks": 541014,
      "Shared Written Blocks": 0,
      "Temp Read Blocks": 0,
      "Temp Written Blocks": 4786
    }
  ]
}}]

Let me know if I removed anything I shouldn't have and I can follow up with extra info.

>  I assume what's going on is that the cost of
> the CTE is actually attributed (in equal parts or something like that)
> to all places using the CTE. Do the numbers add up if you just exclude
> the CTE?

Not really--it looks like the full Shared Blocks Hit cost in the root is the same as the CTE by itself. This is playing around with the plan in a node console:

> p[0].Plan['Shared Hit Blocks']
4847762
> p[0].Plan.Plans.map(p => p['Node Type'])
[ 'ModifyTable', 'ModifyTable', 'ModifyTable', 'CTE Scan' ]
> p[0].Plan.Plans.map(p => p['Shared Hit Blocks'])
[ 4847762, 12964205, 10521264, 4847762 ]

> IIRC one can get multiple plans when there's a DO ALSO rule. There might
> be other ways to get there too.

Thanks, good to know.

Re: EXPLAIN BUFFERS and I/O timing accounting questions

От
Andres Freund
Дата:
Hi,

On 2019-10-24 16:31:39 -0700, Maciek Sakrejda wrote:
> On Thu, Oct 24, 2019 at 2:25 PM Andres Freund <andres@anarazel.de> wrote:
> > Note that the buffer access stats do *not* count the number of distinct
> > buffers accessed, but that they purely the number of buffer
> > accesses.
> 
> You mean, even within a single node? That is, if a node hits a block ten
> times, that counts as ten blocks hit? And if it reads a block and then
> needs it three more times, that's one read plus three hit?

Correct. It's basically the number of lookups in the buffer
pool. There's some nodes that will kind repeatedly use the same buffer,
without increasing the count. E.g. a seqscan will keep the current page
pinned until all the tuples on the page have been returned.

Consider e.g. an nested loop indexscan - how would we determine whether
we've previously looked at a buffer within the indexscan, without
drastically increasing the resources?


> > Do you have an example?
> 
> Sure, here's the "abridged" plan:
> 
> [{ "Plan": {
>   "Node Type": "Aggregate",
>   "Plan Rows": 1,
>   "Plan Width": 8,
>   "Total Cost": 26761745.14,
>   "Actual Rows": 1,
>   "I/O Read Time": 234129.299,
>   "I/O Write Time": 0,
>   "Local Hit Blocks": 0,
>   "Local Read Blocks": 0,
>   "Local Dirtied Blocks": 0,
>   "Local Written Blocks": 0,
>   "Shared Hit Blocks": 4847762,
>   "Shared Read Blocks": 1626312,
>   "Shared Dirtied Blocks": 541014,
>   "Shared Written Blocks": 0,
>   "Temp Read Blocks": 0,
>   "Temp Written Blocks": 4786,
>   "Plans": [
>     {
>       "Node Type": "ModifyTable",
>       "Operation": "Delete",
>       "Parent Relationship": "InitPlan",
>       "Plan Rows": 13943446,
>       "Plan Width": 6,
>       "Total Cost": 25774594.63,
>       "Actual Rows": 2178416,
>       "I/O Read Time": 234129.299,
>       "I/O Write Time": 0,
>       "Local Hit Blocks": 0,
>       "Local Read Blocks": 0,
>       "Local Dirtied Blocks": 0,
>       "Local Written Blocks": 0,
>       "Shared Hit Blocks": 4847762,
>       "Shared Read Blocks": 1626312,
>       "Shared Dirtied Blocks": 541014,
>       "Shared Written Blocks": 0,
>       "Temp Read Blocks": 0,
>       "Temp Written Blocks": 0,
>       "Plans": "<elided>"
>     },
...

I think this may be partially confusing due to the way the json output
looks. Which is so bad that it's imo fair to call it a bug.  Here's text
output to a similar-ish query:


Aggregate  (cost=112.50..112.51 rows=1 width=8) (actual time=35.893..35.894 rows=1 loops=1)
  Output: count(*)
  Buffers: shared hit=6015 dirtied=15
  CTE foo
    ->  Delete on public.p  (cost=0.00..45.00 rows=3000 width=6) (actual time=0.235..28.239 rows=3000 loops=1)
          Output: p.data
          Delete on public.p
          Delete on public.c1
          Delete on public.c2
          Buffers: shared hit=6015 dirtied=15
          ->  Seq Scan on public.p  (cost=0.00..15.00 rows=1000 width=6) (actual time=0.161..1.375 rows=1000 loops=1)
                Output: p.ctid
                Buffers: shared hit=5 dirtied=5
          ->  Seq Scan on public.c1  (cost=0.00..15.00 rows=1000 width=6) (actual time=0.147..1.314 rows=1000 loops=1)
                Output: c1.ctid
                Buffers: shared hit=5 dirtied=5
          ->  Seq Scan on public.c2  (cost=0.00..15.00 rows=1000 width=6) (actual time=0.145..1.170 rows=1000 loops=1)
                Output: c2.ctid
                Buffers: shared hit=5 dirtied=5
  ->  CTE Scan on foo  (cost=0.00..60.00 rows=3000 width=0) (actual time=0.243..34.083 rows=3000 loops=1)
        Output: foo.data
        Buffers: shared hit=6015 dirtied=15
Planning Time: 0.508 ms
Execution Time: 36.512 ms

Note that the node below the Aggregate is actually the CTE, and that
that the DELETEs are below that. But the json, slightly abbreviated,
looks like:

[
  {
    "Plan": {
      "Node Type": "Aggregate",
      "Strategy": "Plain",
      "Shared Hit Blocks": 6015,
      "Shared Read Blocks": 0,
      "Shared Dirtied Blocks": 15,
      "Shared Written Blocks": 0,
      "Plans": [
        {
          "Node Type": "ModifyTable",
          "Operation": "Delete",
          "Parent Relationship": "InitPlan",
          "Subplan Name": "CTE foo",
          "Output": ["p.data"],
          "Target Tables": [
            {
              "Relation Name": "p",
              "Schema": "public",
              "Alias": "p"
            },
            {
              "Relation Name": "c1",
              "Schema": "public",
              "Alias": "c1"
            },
            {
              "Relation Name": "c2",
              "Schema": "public",
              "Alias": "c2"
            }
          ],
          "Shared Hit Blocks": 6015,
          "Shared Read Blocks": 0,
          "Shared Dirtied Blocks": 15,
          "Shared Written Blocks": 0,
          "Plans": [
            {
              "Node Type": "Seq Scan",
              "Parent Relationship": "Member",
              "Output": ["p.ctid"],
              "Shared Hit Blocks": 5,
              "Shared Read Blocks": 0,
              "Shared Dirtied Blocks": 5,
            },
            {
              "Node Type": "Seq Scan",
              "Parent Relationship": "Member",
              "Parallel Aware": false,
              "Relation Name": "c1",
              "Schema": "public",
              "Shared Hit Blocks": 5,
              "Shared Read Blocks": 0,
              "Shared Dirtied Blocks": 5,
              "Shared Written Blocks": 0,
              "Local Hit Blocks": 0,
            },
            {
              "Node Type": "Seq Scan",
              "Parent Relationship": "Member",
              "Shared Hit Blocks": 5,
              "Shared Read Blocks": 0,
              "Shared Dirtied Blocks": 5,
              "Shared Written Blocks": 0,
            }
          ]
        },
        {
          "Node Type": "CTE Scan",
          "Parent Relationship": "Outer",
          "Parallel Aware": false,
          "CTE Name": "foo",
          "Alias": "foo",
          "Startup Cost": 0.00,
          "Total Cost": 60.00,
          "Plan Rows": 3000,
          "Plan Width": 0,
          "Actual Startup Time": 0.258,
          "Actual Total Time": 12.737,
          "Actual Rows": 3000,
          "Actual Loops": 1,
          "Output": ["foo.data"],
          "Shared Hit Blocks": 6015,
          "Shared Read Blocks": 0,
          "Shared Dirtied Blocks": 15,
          "Shared Written Blocks": 0,
        }
      ]

But I still don't quite get how the IO adds up in your case.

Perhaps you could send me the full plan and query privately? And, if you
have access to that, the plain text explain?

Greetings,

Andres Freund