Hi,
This thread has been quiet for a while, but I'd like to share some
thoughts.
+1 to the idea of improving visibility into parallel worker saturation.
But overall, we should improve parallel processing visibility, so DBAs can
detect trends in parallel usage ( is the workload doing more parallel, or doing less )
and have enough data to either tune the workload or change parallel GUCs.
>> We can output this at the LOG level to avoid running the server at
>> DEBUG1 level. There are a few other cases where we are not able to
>> spawn the worker or process and those are logged at the LOG level. For
>> example, "could not fork autovacuum launcher process .." or "too many
>> background workers". So, not sure, if this should get a separate
>> treatment. If we fear this can happen frequently enough that it can
>> spam the LOG then a GUC may be worthwhile.
> I think we should definitely be afraid of that. I am in favor of a separate GUC.
Currently explain ( analyze ) will give you the "Workers Planned"
and "Workers launched". Logging this via auto_explain is possible, so I am
not sure we need additional GUCs or debug levels for this info.
-> Gather (cost=10430.00..10430.01 rows=2 width=8) (actual tim
e=131.826..134.325 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
>> What I was wondering was whether we would be better off putting this
>> into the statistics collector, vs. doing it via logging. Both
>> approaches seem to have pros and cons.
>>
>> I think it could be easier for users to process the information if it
>> is available via some view, so there is a benefit in putting this into
>> the stats subsystem.
> Unless we do this instead.
Adding cumulative stats is a much better idea.
3 new columns can be added to pg_stat_database:
workers_planned,
workers_launched,
parallel_operations - There could be more than 1 operation
per query, if for example there are multiple Parallel Gather
operations in a plan.
With these columns, monitoring tools can trend if there is more
or less parallel work happening over time ( by looking at parallel
operations ) or if the workload is suffering from parallel saturation.
workers_planned/workers_launched < 1 means there is a lack
of available worker processes.
Also, We could add this information on a per query level as well
in pg_stat_statements, but this can be taken up in a seperate
discussion.
Regards,
--
Sami Imseih
Amazon Web Services (AWS)