Обсуждение: 2 json explain plans for the same query/plan - why does one have constants while the other has parameter markers?

Поиск
Список
Период
Сортировка
We are currently on 13.  We are capturing the explain plans for query executions taking 1 second or longer and storing the json files.  We are most of the way through implementing a home grown solution to generate a consistent hash value for a query plan, so we can find queries with multiple plans.  I've attached 2 query plans that we've captured that differ in a seemingly strange way.  (All executions are from the same exact code path.)  One of the plans has parameter markers in the predicates in the values for "Recheck Cond" and "Index Cond", while the other does not.  

Any insight into why we are seeing parameter markers in the body of the query plan?

Examples of the parameter markers:
                "Recheck Cond": "((destinationid = $1) AND (contactid IS NOT NULL) AND (status = $2))",
                    "Index Cond": "((destinationid = $1) AND (contactid IS NOT NULL) AND (status = $2))",

What we normally see:
                "Recheck Cond": "((destinationid = 67) AND (contactid IS NOT NULL) AND (status = 1))",
                    "Index Cond": "((destinationid = 67) AND (contactid IS NOT NULL) AND (status = 1))",

The full query text:
SELECT /* ISNULL:pc_message.FrozenSetID:, KeyTable:pc_message; */ qRoot.ID col0, qRoot.CreationTime col1
FROM pc_message qRoot
WHERE qRoot.DestinationID = $1 AND qRoot.Status = $2 AND qRoot.contactID IS NOT NULL AND qRoot.FrozenSetID IS NULL AND qRoot.SendOrder IN
     (
        SELECT MIN (qRoot0.SendOrder) col0
        FROM pc_message qRoot0
        WHERE qRoot0.DestinationID = $3 AND qRoot0.contactID = qRoot.contactID)
ORDER BY col1 ASC, col0 ASC LIMIT 100000

Thanks,
Jerry
Вложения
Jerry Brenner <jbrenner@guidewire.com> writes:
> We are currently on 13.  We are capturing the explain plans for query
> executions taking 1 second or longer and storing the json files.  We are
> most of the way through implementing a home grown solution to generate a
> consistent hash value for a query plan, so we can find queries with
> multiple plans.  I've attached 2 query plans that we've captured that
> differ in a seemingly strange way.  (All executions are from the same exact
> code path.)  One of the plans has parameter markers in the predicates in
> the values for "Recheck Cond" and "Index Cond", while the other does not.
> Any insight into why we are seeing parameter markers in the body of the
> query plan?

The one with parameter markers is a "generic" plan for a parameterized
query.  When you get a plan without parameter markers for the same
input query, that's a "custom" plan in which concrete values of the
parameters have been substituted, possibly allowing const-simplification
and more accurate rowcount estimates.  The backend will generally try
custom plans a few times and then try a generic plan to see if that's
meaningfully slower -- if not, replanning each time is deemed to be
wasteful.

            regards, tom lane



Thanks for the quick response! That was very helpful!
 My impression is that almost all of the plans being captured are "custom", but now I know that I need to look closer.  We also store the execution times, so we can look at the execution order for queries that are executed often enough to seem like they should stay in the cache.  The addition of the new timestamp columns in pg_stat_statements in 17 will also help us get a better sense of how long the query had been in the cache.

On Fri, Dec 8, 2023 at 4:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jerry Brenner <jbrenner@guidewire.com> writes:
> We are currently on 13.  We are capturing the explain plans for query
> executions taking 1 second or longer and storing the json files.  We are
> most of the way through implementing a home grown solution to generate a
> consistent hash value for a query plan, so we can find queries with
> multiple plans.  I've attached 2 query plans that we've captured that
> differ in a seemingly strange way.  (All executions are from the same exact
> code path.)  One of the plans has parameter markers in the predicates in
> the values for "Recheck Cond" and "Index Cond", while the other does not.
> Any insight into why we are seeing parameter markers in the body of the
> query plan?

The one with parameter markers is a "generic" plan for a parameterized
query.  When you get a plan without parameter markers for the same
input query, that's a "custom" plan in which concrete values of the
parameters have been substituted, possibly allowing const-simplification
and more accurate rowcount estimates.  The backend will generally try
custom plans a few times and then try a generic plan to see if that's
meaningfully slower -- if not, replanning each time is deemed to be
wasteful.

                        regards, tom lane

Can you consider adding an attribute to the explain plan json in a future release (to plan?) to denote if the plan is a "custom" vs "generic" plan?  The use of $N variables for both parameter markers and InitPlan and SubPlan makes it harder to programmatically determine the type of plan (and in our case tell if 2 plans only differ by "custom" vs "generic").

We use numeric constants in our queries in a small number of cases where we know that there's no potential PII, there's a small number of values and that there's a high probability that the data is skewed.  pc_message contains messages to be sent to external systems and hence is a volatile table and the data in the DestinationID column can be highly skewed. In theory, could using a constant instead of a bind variable for this predicate help the optimizer?  

Thanks,
Jerry

On Fri, Dec 8, 2023 at 5:04 PM Jerry Brenner <jbrenner@guidewire.com> wrote:
Thanks for the quick response! That was very helpful!
 My impression is that almost all of the plans being captured are "custom", but now I know that I need to look closer.  We also store the execution times, so we can look at the execution order for queries that are executed often enough to seem like they should stay in the cache.  The addition of the new timestamp columns in pg_stat_statements in 17 will also help us get a better sense of how long the query had been in the cache.

On Fri, Dec 8, 2023 at 4:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jerry Brenner <jbrenner@guidewire.com> writes:
> We are currently on 13.  We are capturing the explain plans for query
> executions taking 1 second or longer and storing the json files.  We are
> most of the way through implementing a home grown solution to generate a
> consistent hash value for a query plan, so we can find queries with
> multiple plans.  I've attached 2 query plans that we've captured that
> differ in a seemingly strange way.  (All executions are from the same exact
> code path.)  One of the plans has parameter markers in the predicates in
> the values for "Recheck Cond" and "Index Cond", while the other does not.
> Any insight into why we are seeing parameter markers in the body of the
> query plan?

The one with parameter markers is a "generic" plan for a parameterized
query.  When you get a plan without parameter markers for the same
input query, that's a "custom" plan in which concrete values of the
parameters have been substituted, possibly allowing const-simplification
and more accurate rowcount estimates.  The backend will generally try
custom plans a few times and then try a generic plan to see if that's
meaningfully slower -- if not, replanning each time is deemed to be
wasteful.

                        regards, tom lane