Обсуждение: optimizing common subqueries

Поиск
Список
Период
Сортировка

optimizing common subqueries

От
Kevin Murphy
Дата:
My understanding is that PG does not eliminate common subqueries within
complex queries.  I just wanted to confirm this.

I also tried wrapping subqueries in STABLE or IMMUTABLE functions
returning SETOF, but from the PG 8.0.3 log, I see that the function is
invoked redundantly anyway.  I know that disk buffering has the effect
of optimizing this sort of thing to some extent, but I was wondering if
I can do better.

Any other possibilities?

The code below shows what I am trying to do.  I created functions to
make it easy for you to see where the common subqueries occur (note that
one of the functions invokes the other function, which increases the
redundancy.)

CREATE FUNCTION pmids_by_mention(text) RETURNS SETOF integer AS $$
  SELECT pmid
  FROM normalized_genes
  WHERE symbol IN (
    SELECT DISTINCT symbol
    FROM normalized_genes
    WHERE mention = $1
  );
$$ LANGUAGE SQL STRICT IMMUTABLE;

CREATE FUNCTION common_pmids(text,text) RETURNS SETOF integer AS $$
  SELECT * FROM pmids_by_mention($1)
  INTERSECT
  SELECT * FROM pmids_by_mention($2)
$$ LANGUAGE SQL STRICT IMMUTABLE;

SELECT PMID, COUNT(*) AS total
FROM (
  SELECT pmid FROM pmids_by_mention('mycn') as pmid
    WHERE pmid IN (
      SELECT * FROM common_pmids('mycn','trka')
    )
  UNION ALL
  SELECT pmid FROM pmids_by_mention('trka') as pmid
    WHERE pmid IN (
      SELECT * FROM common_pmids('mycn','trka')
    )
) AS subq
GROUP BY pmid
ORDER BY total desc;

I doubt anybody cares, but ... I am doing an article lookup by genes
which are mentioned in articles (pmid), where each gene may be referred
to indirectly via an alias (mention).  Each gene symbol has many
aliases/mentions.  (Unfortunately, it is also possible but rare for an
alias/mention to map to more than one gene symbol).  The query logic is
as follows.  For each mention/alias supplied by the user, find all
articles connected to the gene (or, rarely, genes), even indirectly.
Take the intersection of these sets to find the articles/pmid's
containing all the specified genes.  Unfortunately, in order to rank the
articles by relevance, it is further necessary to come up with a result
set containing one row for each qualifying gene mention in the set of
matching articles.  There can be any number of search terms (up to some
limit), so the actual query has to be built  dynamically on the fly by
the application.

Thanks,
Kevin Murphy


Re: optimizing common subqueries

От
"John D. Burger"
Дата:
As I understand it, Postgres's query planner considers only trees of
joins - I don't know what the technical implications are of using DAG
plans, other than the obvious blowup in planning space.

I was recently in a similar situation, where a script essentially
needed to do a self-join on the result of a complex query.  The script
uses a temp table to store the results of the first query, and then
does a second query using the temp table - effectively, I have done
common-subexpression reduction by hand.  This repeated fragment of your
example:

      SELECT * FROM common_pmids('mycn','trka')

might be a candidate for such treatment.

- John Burger
   MITRE


Re: optimizing common subqueries

От
"Jim C. Nasby"
Дата:
You can put the complex query into a subquery in the from clause and
PostgreSQL will normally do a good job with that. See one of the other
replies in this thread.

On Wed, Oct 05, 2005 at 11:23:20AM -0400, John D. Burger wrote:
> As I understand it, Postgres's query planner considers only trees of
> joins - I don't know what the technical implications are of using DAG
> plans, other than the obvious blowup in planning space.
>
> I was recently in a similar situation, where a script essentially
> needed to do a self-join on the result of a complex query.  The script
> uses a temp table to store the results of the first query, and then
> does a second query using the temp table - effectively, I have done
> common-subexpression reduction by hand.  This repeated fragment of your
> example:
>
>      SELECT * FROM common_pmids('mycn','trka')
>
> might be a candidate for such treatment.
>
> - John Burger
>   MITRE
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461