Обсуждение: Generate call graphs in run-time
Hi hackers,
BACKGROUND
I have a database centric system consisting of thousands of stored procedures,
where each call to any top-level function makes use of a lot of subfunctions in many layers.
I was about to manually produce call graphs covering all important parts of the system.
I realized this was a huge and boring task. Automation to the rescue.
Simply parsing the functions source code would not be sufficient as I had some functions using EXECUTE.
I also didn't know exactly which functions where top-level functions, and didn't know exactly
which functions were actually being used in production.
Because of this I decided to sample data in run-time to get a real-life picture of the system.
Any functions not being called in productions are not that important to include in the documentation anyway.
DESCRIPTION
Generates call graphs of function calls within a transaction in run-time.
The call graphs are written the first time they are seen in a session to the log.
The format of the log entry is a digraph in the GraphViz DOT language of the oids.
IMPLEMENTATION
New structures added to pgstat:
static List *call_stack
all parent functions at the current depth in the current transaction
static HTAB *call_graph
all unique pair of function calls so far in the current transaction
static HTAB *seen_call_graphs
all unique call graphs seen in the current session to only log once per call graph
typedef struct PgStat_CallGraphEdge
contains caller_oid and called_oid
In pgstat_init_function_usage(), each call to a function will add the caller->called oid pair unless already set in the hash.
When pgstat_end_function_usage() is called, the current function oid is removed from the List of parent oids.
In AtEOXact_PgStat(), called upon commit/rollback, the call graph is sorted and written to the log unless already seen in the session.
The variables are resetted in pgstat_clear_snapshot().
This functionality is probably something one would like to enable only temporarily in the production environment.
A new configuration parameter would therefore be good, just like track_functions. Perhaps track_callgraphs?
Instead of writing the call graphs to the postgres log, it would ne more useful to let the statistics collector keep track of the call graphs, to allow easier access than having to parse through the log file.
Perhaps the call graph would be represented as an oid[] array, and be the primary key (or unique) in some new statistics table,
with any interesting columns, perhaps number of calls or any of the data provided by pg_stat_user_functions and pg_stat_user_tables.
EXAMPLE
example_functions.sql contains the functions a(), ab(), ac(), aca(), acb().
I executed the functions a() and ac() and got this in my log file:
2012-01-09 19:08:30.983 CET,"joel","test",28461,"[local]",4f0b2d18.6f2d,1,"SELECT",2012-01-09 19:08:24 CET,3/0,0,LOG,00000,"digraph {37615->37616;37615->37617;37617->37618;37617->37619}",,,,,,"select a();",,,"psql"
2012-01-09 19:08:39.837 CET,"joel","test",28461,"[local]",4f0b2d18.6f2d,2,"SELECT",2012-01-09 19:08:24 CET,3/0,0,LOG,00000,"digraph {37617->37618;37617->37619}",,,,,,"select ac();",,,"psql"
The perl script pg_callgraph.pl replaces the oids with actual function names before generating the call graphs using GraphVIz:
digraph {a->ab;a->ac;ac->aca;ac->acb}
digraph {ac->aca;ac->acb}
The GraphVIz dot command is then used to generate nice call graphs in PNG.
I've also added two real-life examples of quite complex call graphs:
6edd2b9b520ff2428c461fad43f29131a6c6604f.png
607e272a35cf2ac8579173f97147a32de9784877.png
Best regards,
Joel Jacobson
Вложения
On 01/09/2012 10:00 PM, Joel Jacobson wrote: > Because of this I decided to sample data in run-time to get a real-life > picture of the system. > Any functions not being called in productions are not that important to > include in the documentation anyway. FWIW I have a similar problem - with a similar solution. I'll try to describe it, in case there is some synergy to be leveraged :) > In pgstat_init_function_usage(), each call to a function will add the > caller->called oid pair unless already set in the hash. > When pgstat_end_function_usage() is called, the current function oid is > removed from the List of parent oids. > In AtEOXact_PgStat(), called upon commit/rollback, the call graph is > sorted and written to the log unless already seen in the session. > The variables are resetted in pgstat_clear_snapshot(). > My approach was to add parent oid to the per-backend function stats structure - PgStat_BackendFunctionEntry. Also, I changed the hash key for that structure to (oid, parent) pair. This means that within the backend the function usage is always tracked with the context of calling function. This has the nice property that you get the per-parent usage stats as well. Also the additional lists for parent tracking are avoided. During pgstat_report_stat() the call graph (with stats) is output to logs and the statistics uploaded to collector -- with the parent oid removed. > This functionality is probably something one would like to enable only > temporarily in the production environment. > A new configuration parameter would therefore be good, just like > track_functions. Perhaps track_callgraphs? > I opted to always track the function parents -- this is very cheap. The logging on the other hand is quite heavy and needs to be explicitly configured (I used log_usage_stats GUC). There is a patch for this and we do use it in production for occasional troubleshooting and dependency analysis. Can't attach immediately though -- it has some extra cruft in it that needs to be cleaned up. > Instead of writing the call graphs to the postgres log, it would ne more > useful to let the statistics collector keep track of the call graphs, to > allow easier access than having to parse through the log file. Indeed. Something like a pg_stat_user_function_details view would be very useful. Something along the lines of: Column | Type | --------------+--------+funcid | oid |parentfuncid | oid | <-- newschemaname | name |funcname | name |calls | bigint |total_time | bigint |self_time | bigint | And then rewrite pg_stat_user_functions by aggregating the detailed view. That'd make the individual pg_stat_get_function* functions a bit slower, but that is probably a non-issue - at least not if the pg_stat_user_functions view is rewritten to use a SRF. regards, Martin
On Mon, Jan 16, 2012 at 2:23 PM, Martin Pihlak <martin.pihlak@gmail.com> wrote: > My approach was to add parent oid to the per-backend function stats > structure - PgStat_BackendFunctionEntry. Also, I changed the hash key > for that structure to (oid, parent) pair. This means that within the > backend the function usage is always tracked with the context of > calling function. This has the nice property that you get the per-parent > usage stats as well. Also the additional lists for parent tracking are > avoided. > > During pgstat_report_stat() the call graph (with stats) is output > to logs and the statistics uploaded to collector -- with the parent oid > removed. Since you only care about the parentfuncid in one level, it looks like you will only be able to get a total call graph of all possible function calls, and not each unique call graph per transaction. If you have two separate transactions: T1: a->b, b->c T2: b->d You would have two unique call graphs {a->b, b->c} and {b->d}. The global call graph, where you only keep track of all unique parentfuncid->funcid pairs, would be {a->b, b->c, b->d}, which lacks the information on what different call graphs are actually being executed per transaction. Also, why remove the parent oid when uploading the statistics to the collector? It would be nice to have the statistics for each function per parent, to see where you have a bottleneck which might only be occurring in a function when called from a specific parent. Even more fine-grained would be to have the statistics per unique call-graph, i.e. the entire tree of functions called in the transactions. > There is a patch for this and we do use it in production for occasional > troubleshooting and dependency analysis. Can't attach immediately > though -- it has some extra cruft in it that needs to be cleaned up. I would highly appreciate a patch, don't worry about cleaning up, I can do that, unless it's some code you can't share for other reasons. > Indeed. Something like a pg_stat_user_function_details view would be > very useful. Something along the lines of: > > Column | Type | > --------------+--------+ > funcid | oid | > parentfuncid | oid | <-- new > schemaname | name | > funcname | name | > calls | bigint | > total_time | bigint | > self_time | bigint | funcid->parentfuncid might be sufficient for performance optimizations, but to automatically generate directional graphs of all unique call graphs in run-time, you would need all the unique pairs of funcid->parentfuncid as a singel column, probably a sorted array of oids[][], example: [[1,2],[1,3],[2,4],[2,5]] if the call craph would be {1->2, 1->3, 2->4, 2->5}. > > And then rewrite pg_stat_user_functions by aggregating the detailed > view. That'd make the individual pg_stat_get_function* functions a > bit slower, but that is probably a non-issue - at least not if the > pg_stat_user_functions view is rewritten to use a SRF. > > regards, > Martin
On 01/17/2012 11:13 AM, Joel Jacobson wrote: > Since you only care about the parentfuncid in one level, it looks like > you will only be able to get a total call graph of all possible > function calls, and not each unique call graph per transaction. True, for my purposes (function dependencies and performance analysis) the global graph is sufficient. > Also, why remove the parent oid when uploading the statistics to the collector? > It would be nice to have the statistics for each function per parent, > to see where you have a bottleneck which might only be occurring in a > function when called from a specific parent. I guess I was just lazy at the time I wrote it. But it shouldn't be too much of an effort to store the global call graph in statistics collector. Unique call graphs would be somewhat more complicated I guess. >> There is a patch for this and we do use it in production for occasional >> troubleshooting and dependency analysis. Can't attach immediately >> though -- it has some extra cruft in it that needs to be cleaned up. > > I would highly appreciate a patch, don't worry about cleaning up, I > can do that, unless it's some code you can't share for other reasons. > Patch attached. It was developed against 9.1, but also applies to HEAD but gives some fuzz and offsets. It adds 2 GUC variables: log_function_calls and log_usage_stats. The first just output function statistics to log (with no parent info). With log_usage_stats enabled, it also outputs detailed function usage plus relation usage. So you basically get output such as: # select * from pgq.get_consumer_info(); LOG: duration: 11.153 ms statement: select * from pgq.get_consumer_info(); LOG: function call: pgq.get_consumer_info(0) calls=1 time=9726 self_time=536 LOG: USAGE STATISTICS DETAIL: ! object access stats: ! F 1892464226 0 pgq.get_consumer_info(0) calls=1 time=9726 self_time=536 ! F 1892464228 1892464226 pgq.get_consumer_info(2) calls=1 time=9190 self_time=9190 ! r 167558000 pgq.queue: blks_read=28 blks_hit=28 ! r 167557988 pgq.consumer: blks_read=56 blks_hit=56 ! r 167558021 pgq.subscription: blks_read=54 blks_hit=50 ! r 167558050 pgq.tick: blks_read=103 blks_hit=102 ! i 1892464189 pgq.queue_pkey: scans=1 tup_ret=37 tup_fetch=37 blks_read=2 blks_hit=2 ! i 167557995 pgq.consumer_pkey: scans=56 tup_ret=56 tup_fetch=56 blks_read=57 blks_hit=56 ! i 1892464195 pgq.subscription_pkey: scans=37 tup_ret=156 tup_fetch=56 blks_read=127 blks_hit=123 ! i 167558058 pgq.tick_pkey: scans=112 tup_ret=103 tup_fetch=103 blks_read=367 blks_hit=366 > funcid->parentfuncid might be sufficient for performance > optimizations, but to automatically generate directional graphs of all > unique call graphs in run-time, you would need all the unique pairs of > funcid->parentfuncid as a singel column, probably a sorted array of > oids[][], example: [[1,2],[1,3],[2,4],[2,5]] if the call craph would > be {1->2, 1->3, 2->4, 2->5}. > Hmm, array would probably work, although I wonder if there is a more elegant solution ... regards, Martin