Обсуждение: What type of index do I need for this JOIN?
I'm doing a JOIN which is very slow: JOIN t ON t.f1 LIKE (q.f1 || '%') t1 has an INDEX on (f1, f2) which I thought would help for this. But Postgres seems to still use a (very slow) Nested Loop. What type of index would be appropriate for this? (My goal is to join on a substring starting from the first character)
On 08/14/2013 06:05 AM, Robert James wrote: > I'm doing a JOIN which is very slow: > > JOIN t ON t.f1 LIKE (q.f1 || '%') > > t1 has an INDEX on (f1, f2) which I thought would help for this. But > Postgres seems to still use a (very slow) Nested Loop. What type of > index would be appropriate for this? You'll need a text_pattern_ops index. http://www.postgresql.org/docs/current/static/indexes-opclass.html http://www.postgresql.org/docs/current/static/indexes-types.html "The optimizer can also use a B-tree index for queries involving the pattern matching operators LIKE and ~ if the pattern is a constant and is anchored to the beginning of the string — for example, col LIKE 'foo%' or col ~ '^foo', but not col LIKE '%bar'. However, if your database does not use the C locale you will need to create the index with a special operator class to support indexing of pattern-matching queries; see Section 11.9 below." -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 8/13/13, Craig Ringer <craig@2ndquadrant.com> wrote: > On 08/14/2013 06:05 AM, Robert James wrote: >> I'm doing a JOIN which is very slow: >> >> JOIN t ON t.f1 LIKE (q.f1 || '%') >> >> t1 has an INDEX on (f1, f2) which I thought would help for this. But >> Postgres seems to still use a (very slow) Nested Loop. What type of >> index would be appropriate for this? > > You'll need a text_pattern_ops index. > > "The optimizer can also use a B-tree index for queries involving the > pattern matching operators LIKE and ~ if the pattern is a constant and > is anchored to the beginning of the string — for example, col LIKE > 'foo%' or col ~ '^foo', but not col LIKE '%bar'. However, if your > database does not use the C locale you will need to create the index > with a special operator class to support indexing of pattern-matching > queries; see Section 11.9 below." I'm confused: What's the difference between col LIKE 'foo%' and col LIKE f1 || '%' ? Either way, it's anchored to the beginning of the string. And, being that there is a difference (ie the pattern needs to be constant): How will text_pattern_ops help? The only difference I see is that text_patterns_ops is locale sensitive, needed for locale other than C.
On 8/13/2013 8:34 PM, Robert James wrote: > I'm confused: What's the difference between > col LIKE 'foo%' > and > col LIKE f1 || '%' > ? > Either way, it's anchored to the beginning of the string. the first one can be analyzed by the planner, the latter would be much harder to pre-figure. -- john r pierce 37N 122W somewhere on the middle of the left coast
Robert James <srobertjames@gmail.com> wrote: > I'm confused: What's the difference between > col LIKE 'foo%' > and > col LIKE f1 || '%' > ? The planner knows that 'foo%' doesn't start with a wildcard. > Either way, it's anchored to the beginning of the string. Not necessarily. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 8/14/13, Kevin Grittner <kgrittn@ymail.com> wrote: > Robert James <srobertjames@gmail.com> wrote: > >> I'm confused: What's the difference between >> col LIKE 'foo%' >> and >> col LIKE f1 || '%' >> ? > > The planner knows that 'foo%' doesn't start with a wildcard. > >> Either way, it's anchored to the beginning of the string. > > Not necessarily. I see, yes, f1 might include a wildcard. Is there a way to escape f1 so that wildcards aren't interpreted? That's anyway the behavior I need, of course. And will that help the planner? What type of index will help the planner here?
Robert James <srobertjames@gmail.com> wrote: > On 8/14/13, Kevin Grittner <kgrittn@ymail.com> wrote: >> Robert James <srobertjames@gmail.com> wrote: >> >>> I'm confused: What's the difference between >>> col LIKE 'foo%' >>> and >>> col LIKE f1 || '%' >>> ? >> >> The planner knows that 'foo%' doesn't start with a wildcard. >> >>> Either way, it's anchored to the beginning of the string. >> >> Not necessarily. > > I see, yes, f1 might include a wildcard. Exactly. > Is there a way to escape f1 so that wildcards aren't interpreted? > That's anyway the behavior I need, of course. And will that help > the planner? I don't think there is any way to get that to work in general, and in particular you are trying to have the right-hand side of the LIKE treated as having escapes; it would be a lot to ask of the planner to somehow recognize that part of the result of the (concatenation) expression should be treated as escaped and part not. > What type of index will help the planner here? Well, with the query as you have it, you might get a trigram index to be of some help, but I think you might want to give up on LIKE or regular expressions. Perhaps you could do a range test directly, rather than wrangling the wildcards: (col >= f1 AND col <= (f1 || 'zzz')) You would probably want to write a function to calculate that ending value; I'm just trying to give a rough suggestion here. The idea is to avoid the danger of wildcards in the f1 values by constructing the range without scanning for special characters and basing the test on that. You could *also* do the LIKE test if desired. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company