Обсуждение: LIKE, leading percent, bind parameters and indexes

Поиск
Список
Период
Сортировка

LIKE, leading percent, bind parameters and indexes

От
"Rodrigo Hjort"
Дата:
PG-Hackers,<br /><br />I got the following picture:<br /><br />detran=# \d sa_dut.tb_usuario<br />                   
Table"sa_dut.tb_usuario"<br />         Column          |            Type             | Modifiers<br
/>-------------------------+-----------------------------+-----------<br /> numprocesso             |
bigint                     | not null<br /> nome                    | character varying(44)       |<br
/> nomemae                | character varying(44)       |<br /> datanascimento          | date                        |
<br/>Indexes:<br />   "tb_usuario_pkey" PRIMARY KEY, btree (numprocesso)<br />   "ix_usuario_11" btree (nome
varchar_pattern_ops,nomemae varchar_pattern_ops)<br />   "ix_usuario_13" btree (datanascimento, nome
varchar_pattern_ops)<br /><br />As I do not use C locale, I created indexes based on "varchar_pattern_ops".<br />The
issueI'm having is based on the following queries:<br /><br />select * from TB_USUARIO where nome like 'TATIANA
CRISTINAG%'; <br />select * from TB_USUARIO where nome like '%TATIANA CRISTINA G%';<br /><br />For some reasons, I'm
notusing text-search engines, like TSearch2, but only the LIKE operator.<br />Here are the query plans involved:<br
/><br/><br /> detran=# explain analyze select count(*) as x0_0_ from sa_dut.TB_PROCESSO processo0_, sa_dut.TB_USUARIO
usuario1_where (usuario1_.NOME like 'TATIANA CRISTINA G%'  and processo0_.NUMPROCESSO=usuario1_.NUMPROCESSO);<br /><br
/>QUERY PLAN<br
/>--------------------------------------------------------------------------------------------------------------------------------------------------------<br
/> Aggregate (cost=11.94..11.95 rows=1 width=0) (actual time= 143.970..143.972 rows=1 loops=1)<br />  ->  Nested
Loop (cost=0.00..11.94 rows=1 width=0) (actual time=143.935..143.949 rows=1 loops=1)<br />        ->  Index Scan
usingix_usuario_11 on tb_usuario usuario1_ (cost=0.00..6.01 rows=1 width=8) (actual time=93.884..93.889 rows=1
loops=1)<br/>              Index Cond: (((nome)::text ~>=~ 'TATIANA CRISTINA G'::character varying) AND
((nome)::text~<~ 'TATIANA CRISTINA H'::character varying))<br />               Filter: ((nome)::text ~~ 'TATIANA
CRISTINAG%'::text)<br />        ->  Index Scan using tb_processo_pkey on tb_processo processo0_  (cost=0.00..5.91
rows=1width=8) (actual time=50.041..50.044 rows=1 loops=1) <br />              Index Cond: (processo0_.numprocesso =
"outer".numprocesso)<br/> Total runtime: 144.176 ms<br /><br />detran=# explain analyze select count(*) as x0_0_ from
sa_dut.TB_PROCESSOprocesso0_, sa_dut.TB_USUARIO usuario1_ where <br />(usuario1_.NOME like '%TATIANA CRISTINA G%'  and
processo0_.NUMPROCESSO=usuario1_.NUMPROCESSO);<br/><br />QUERY PLAN<br
/>-----------------------------------------------------------------------------------------------------------------------------------------------------
<br/> Aggregate  (cost=67534.55..67534.56 rows=1 width=0) (actual time=8101.957..8101.959 rows=1 loops=1)<br />  -> 
NestedLoop  (cost=0.00..67534.55 rows=1 width=0) (actual time=5404.106..8101.923 rows=1 loops=1)<br />        -> 
SeqScan on tb_usuario usuario1_  (cost= 0.00..67528.62 rows=1 width=8) (actual time=5404.056..8101.862 rows=1
loops=1)<br/>              Filter: ((nome)::text ~~ '%TATIANA CRISTINA G%'::text)<br />        ->  Index Scan using
tb_processo_pkeyon tb_processo<br /> processo0_  (cost=0.00..5.91 rows=1 width=8) (actual time=0.034..0.037 rows=1
loops=1)<br/>              Index Cond: (processo0_.numprocesso = "outer".numprocesso)<br /> Total runtime: 8102.105
ms<br/><br /><br />We use Java, and recently we made an effort in order to avoid the leading '%' on LIKE expressions.
<br/>The problem is that it wasn't solved, and then I made the following Java code to verify it.<br /><br />What
happensis that only the "004" block uses the index! The "002" code, which also has no leading percent, does a
sequentialscan. The difference between them is that "002" uses bind parameters. <br /><br />Is it concerned to the JDBC
Driveror PostgreSQL itself? What could be done in order to fix it?<br />I could use static parameters, but then the
querieswould have to be reparsed each time on the backend, missing cache advantages. <br /><br
/>****************************************************************************************************<br/>package
db;<br/><br />import java.sql.Connection;<br />import java.sql.DriverManager;<br />import java.sql.PreparedStatement
;<br/>import java.sql.ResultSet;<br />import java.sql.SQLException;<br /><br />public class SelectLike {<br /><br />  
publicSelectLike() {<br />       long qtd = 0L, inicio = 0L, tempo[] = {0,0,0,0};<br /><br />       try {<br
/>          Class.forName("org.postgresql.Driver");<br />       } catch (ClassNotFoundException e) {<br />          
e.printStackTrace();<br/>       }<br /><br />       Connection con = null;<br />       String dbURL =
"jdbc:postgresql://10.15.61.6/database";<br />       try {<br />           con = DriverManager.getConnection(dbURL,
"user","password");<br /><br />           String sql = "select count(*) as x0_0_ from<br />sa_dut.TB_PROCESSO
processo0_,sa_dut.TB_USUARIO usuario1_ where <br />(usuario1_.NOME like ? and
processo0_.NUMPROCESSO=usuario1_.NUMPROCESSO)";<br/>           String nome = "TATIANA CRISTINA G";<br /><br
/>          PreparedStatement ps = null;<br />           ResultSet rs = null; <br /><br />           //001 - '%NAME%'
binded<br/>           if (ps != null) ps.close();<br />           ps = con.prepareStatement(sql);<br />          
ps.setString(1,"%" + nome + "%");<br />           inicio = System.currentTimeMillis();<br />           rs =
ps.executeQuery();<br/>           rs.next();<br />           qtd = rs.getLong(1);<br />           rs.close();<br
/>          tempo[0] = System.currentTimeMillis() - inicio;<br /><br />            //002 - 'NAME%' binded<br
/>          if (ps != null) ps.close();<br />           ps = con.prepareStatement(sql);<br />           ps.setString(1,
nome+ "%");<br />           inicio = System.currentTimeMillis ();<br />           rs = ps.executeQuery();<br
/>          rs.next();<br />           qtd = rs.getLong(1);<br />           rs.close();<br />           tempo[1] =
System.currentTimeMillis()- inicio;<br /><br />           //003 - '%NAME%' static <br />           if (ps != null)
ps.close();<br/>           String sql1 = sql.replaceFirst("\\?", "'%" + nome + "%'");<br />           ps =
con.prepareStatement(sql1);<br/>           inicio = System.currentTimeMillis ();<br />           rs =
ps.executeQuery();<br/>           rs.next();<br />           qtd = rs.getLong(1);<br />           rs.close();<br
/>          tempo[2] = System.currentTimeMillis() - inicio;<br /><br />           //004 - 'NAME%' static <br
/>          if (ps != null) ps.close();<br />           String sql2 = sql.replaceFirst("\\?", "'" + nome + "%'");<br
/>          ps = con.prepareStatement(sql2);<br />           inicio = System.currentTimeMillis ();<br />           rs =
ps.executeQuery();<br/>           rs.next();<br />           qtd = rs.getLong(1);<br />           rs.close();<br
/>          tempo[3] = System.currentTimeMillis() - inicio;<br /><br />           ps.close();<br />           
con.close();<br/>       } catch (SQLException e) {<br />           e.printStackTrace();<br />       }<br /><br />      
System.out.println("QTD:" + qtd + "\n\n");<br />       for (int ii = 0; ii < tempo.length; ii++)<br />          
System.out.println(ii+ ": " + tempo[ii]);<br />   }<br /><br />   public static void main(String[] args) {<br />      
newSelectLike();<br />   }<br /><br />}<br
/>****************************************************************************************************<br /><br />--
<br/>Regards,<br /><br />Rodrigo Hjort<br /><a href="http://icewall.org/~hjort">http://icewall.org/~hjort</a><br /><br
/>

Re: LIKE, leading percent, bind parameters and indexes

От
Tom Lane
Дата:
"Rodrigo Hjort" <rodrigo.hjort@gmail.com> writes:
> What happens is that only the "004" block uses the index! The "002" code,
> which also has no leading percent, does a sequential scan. The difference
> between them is that "002" uses bind parameters.

Yeah.  The LIKE index optimization depends on seeing a constant LIKE
pattern at plan time --- otherwise the planner doesn't know what
indexscan parameters to generate.  So a bound-parameter query loses.

Ideas for improving this situation are welcome ... it's not an easy
problem ...
        regards, tom lane


Re: LIKE, leading percent, bind parameters and indexes

От
"Qingqing Zhou"
Дата:
"Tom Lane" <tgl@sss.pgh.pa.us> wrote
>
> Yeah.  The LIKE index optimization depends on seeing a constant LIKE
> pattern at plan time --- otherwise the planner doesn't know what
> indexscan parameters to generate.  So a bound-parameter query loses.
>

AFAICS the problem is not restricted to LIKE, we can easily find a lot of
similar problems caused by the actual parameters. For example, SeqScan vs.
IndexScan vs. BitmapIndexScan for a range query. So an improvement is
definitely needed.

> Ideas for improving this situation are welcome ... it's not an easy
> problem ...
>
IMHO basically we have two ways to get better plan: one is to have a set of
alternative plans for prepare queries. This will add some cost but PREPARE
is supposed to do only once against a lot of EXECUTE. But still, the biggest
problem is that number of plans is not controllable.

Another way is to generate a plan on the fly. What we do is to let some
REPLAN nodes sit on top of some critical plan node: at the execution, we
will compare the actual numbers we get and the estimated number we have
(mabye "rows"?), once we find that a re-plan efforts might be deserved, we
will get a new plan on the fly. In this way, I think a not-too-big patch
will do. I remember there is a paper talking about this somewhere but not
remember clearly. -- This method can handle the range query problem above,
but not for LIKE. So we may have to kludge some code to handle LIKE
especially :-(.

Regards,
Qingqing






Re: LIKE, leading percent, bind parameters and indexes

От
"Zeugswetter Andreas DCP SD"
Дата:
> AFAICS the problem is not restricted to LIKE, we can easily find a lot
of
> similar problems caused by the actual parameters. For example, SeqScan
vs.
> IndexScan vs. BitmapIndexScan for a range query. So an improvement is
> definitely needed.

> Another way is to generate a plan on the fly. What we do is to let
some
> REPLAN nodes sit on top of some critical plan node: at the execution,
we
> will compare the actual numbers we get and the estimated number we
have

Since we are deciding this on histogram data, it seems we could "store"
the ranges (and exception values) where this plan is not good, and
replan in
case the new value does not fit.

This would also imply, that we postpone (part of the) planning until we
get the
first values, when the node cost largly depends on the supplied value.

Andreas


Re: LIKE, leading percent, bind parameters and indexes

От
"Rodrigo Hjort"
Дата:
I'm not used to the PG Internals. But let me see if I understood that.

The LIKE operator, when applied on a static string and it is not preceded by '%', causes the planner to search for some indexes in the table in order to make a index scan. Otherwise, i.e. using leading '%' on static text or bound paremeter, makes the planner always do a sequential scan. Is that the scenario?

--
Rodrigo Hjort
http://icewall.org/~hjort


2006/5/23, Tom Lane <tgl@sss.pgh.pa.us>:
"Rodrigo Hjort" <rodrigo.hjort@gmail.com> writes:
> What happens is that only the "004" block uses the index! The "002" code,
> which also has no leading percent, does a sequential scan. The difference
> between them is that "002" uses bind parameters.

Yeah.  The LIKE index optimization depends on seeing a constant LIKE
pattern at plan time --- otherwise the planner doesn't know what
indexscan parameters to generate.  So a bound-parameter query loses.

Ideas for improving this situation are welcome ... it's not an easy
problem ...

                        regards, tom lane

Re: LIKE, leading percent, bind parameters and indexes

От
Andrew Sullivan
Дата:
On Thu, May 25, 2006 at 02:18:10PM -0300, Rodrigo Hjort wrote:
> make a index scan. Otherwise, i.e. using leading '%' on static text or bound
> paremeter, makes the planner always do a sequential scan. Is that the
> scenario?

I think more exactly, the planner can't possibly know how to plan an
indexscan with a leading '%', because it has nowhere to start.

Think of it this way: if you go to the public library, and say, "I
want a book.  I can't remember its name exactly, but it starts with
'daytime'," you can find it by going to the title index and browsing
for things that start that way.  If you go to the public library, and
say, "There's this book I want, but I can't remember the title.  It's
red," you're going to have a lot of books to look through.  Maybe all
of them.

If it were important enough -- say you left a $10,000 cheque inside
-- you might just start looking.  Maybe you'll get lucky, and hit it.  

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
I remember when computers were frustrating because they *did* exactly what 
you told them to.  That actually seems sort of quaint now.    --J.D. Baldwin


Re: LIKE, leading percent, bind parameters and indexes

От
Dave Cramer
Дата:
These are two confusing issues.

One is the use of a leading percent sign.

What Tom pointed out was with a bound parameter the planner can't  
make any assumptions about indexes.

Leading percent signs can be made to use indexes by creating a  
functional index on the column which reverses the order of the  
column, then using the same function in the select

Dave
On 25-May-06, at 1:46 PM, Andrew Sullivan wrote:

> On Thu, May 25, 2006 at 02:18:10PM -0300, Rodrigo Hjort wrote:
>> make a index scan. Otherwise, i.e. using leading '%' on static  
>> text or bound
>> paremeter, makes the planner always do a sequential scan. Is that the
>> scenario?
>
> I think more exactly, the planner can't possibly know how to plan an
> indexscan with a leading '%', because it has nowhere to start.
>
> Think of it this way: if you go to the public library, and say, "I
> want a book.  I can't remember its name exactly, but it starts with
> 'daytime'," you can find it by going to the title index and browsing
> for things that start that way.  If you go to the public library, and
> say, "There's this book I want, but I can't remember the title.  It's
> red," you're going to have a lot of books to look through.  Maybe all
> of them.
>
> If it were important enough -- say you left a $10,000 cheque inside
> -- you might just start looking.  Maybe you'll get lucky, and hit  
> it.  
>
> A
>
> -- 
> Andrew Sullivan  | ajs@crankycanuck.ca
> I remember when computers were frustrating because they *did*  
> exactly what
> you told them to.  That actually seems sort of quaint now.
>         --J.D. Baldwin
>
> ---------------------------(end of  
> broadcast)---------------------------
> TIP 6: explain analyze is your friend
>



Re: LIKE, leading percent, bind parameters and indexes

От
"Rodrigo Hjort"
Дата:
I think more exactly, the planner can't possibly know how to plan an
indexscan with a leading '%', because it has nowhere to start.

The fact is that index scan is performed on LIKE expression on a string not preceded by '%', except when bound parameter is used.

select * from table where field like 'THE NAME%'; -- index scan
select * from table where field like '%THE NAME%'; -- seq scan
select * from table where field like :bind_param; -- seq scan (always)

Regards,

Rodrigo Hjort
http://icewall.org/~hjort

Re: LIKE, leading percent, bind parameters and indexes

От
Greg Stark
Дата:
"Rodrigo Hjort" <rodrigo.hjort@gmail.com> writes:

> > I think more exactly, the planner can't possibly know how to plan an
> > indexscan with a leading '%', because it has nowhere to start.
> 
> The fact is that index scan is performed on LIKE expression on a string not
> preceded by '%', except when bound parameter is used.
> 
> select * from table where field like 'THE NAME%'; -- index scan
> select * from table where field like '%THE NAME%'; -- seq scan
> select * from table where field like :bind_param; -- seq scan (always)

Just for reference I found that both Oracle and MSSQL (back when last I used
it, many years ago) did use an index scan for the following case:

select * from table where field like :bind_param || '%'

At the time this seemed perfectly logical but now that I have more experience
it seems hard to justify. There's no principled reason to think this is any
more likely than a plain :bind_param to be an indexable scan. 

However in practice this worked great. I rarely if ever put % characters into
the bind parameter and the index scan was exactly what I, as a user, expected.

Even if there's resistance to having this form be treated as indexable there
is certainly a use case for something like this. If not this then something
like

WHERE escape(:bind_param)||'%'

but that would be pretty hard to recognize, certainly much harder than a
simple :bind_param || '%'.


-- 
greg



Re: LIKE, leading percent, bind parameters and indexes

От
"Jim C. Nasby"
Дата:
On Thu, May 25, 2006 at 08:41:17PM -0300, Rodrigo Hjort wrote:
> >
> >I think more exactly, the planner can't possibly know how to plan an
> >indexscan with a leading '%', because it has nowhere to start.
> >
> 
> The fact is that index scan is performed on LIKE expression on a string not
> preceded by '%', except when bound parameter is used.
> 
> select * from table where field like 'THE NAME%'; -- index scan
> select * from table where field like '%THE NAME%'; -- seq scan
> select * from table where field like :bind_param; -- seq scan (always)

Since I'm somewhat doubtful of coming up with a generic means for
dealing with plan changes based on different bound parameter values any
time soon...

How difficult would it be to make LIKE check the value of the bound
parameter for a starting % and use that information to decide on a query
plan? IMHO this is worth making into a special case in the planner,
because it's very easy to detect and makes a tremendous difference in
the query plan/performance.

Also, might a bitmap scan be a win for the %string case? Presumably it's
much faster to find matching rows via an index and then go back into the
heap for them; unless you're matching a heck of a lot of rows.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: LIKE, leading percent, bind parameters and indexes

От
Martijn van Oosterhout
Дата:
On Fri, May 26, 2006 at 11:38:41AM -0500, Jim C. Nasby wrote:
> > select * from table where field like 'THE NAME%'; -- index scan
> > select * from table where field like '%THE NAME%'; -- seq scan
> > select * from table where field like :bind_param; -- seq scan (always)
>
> How difficult would it be to make LIKE check the value of the bound
> parameter for a starting % and use that information to decide on a query
> plan? IMHO this is worth making into a special case in the planner,
> because it's very easy to detect and makes a tremendous difference in
> the query plan/performance.

Planning doesn't work that way. Like is just a function invokation, the
planner doesn't ask or tell it where it is in the plan. And it's not
the function that determines how the query is planned.

> Also, might a bitmap scan be a win for the %string case? Presumably it's
> much faster to find matching rows via an index and then go back into the
> heap for them; unless you're matching a heck of a lot of rows.

This is an interesting thought. Currently, AFAICS, the bitmap-scan code
only considers operators that are indexable, just like for narmal index
scans. However, in this case the query could scan the entire index,
apply the LIKE to each one and produce a bitmap of possible matches.
Then do a bitmap scan over the table to check the results.

Not just LIKE could use this, but any function marked STABLE. You'd
have to weigh up the cost of scanning the *entire* index (because we
don't have any actual restriction clauses) against avoiding a full
table scan.

Actually, if you're going to scan the whole index, maybe you can use
the recent changes that allow VACUUM to scan the index sequentially,
rather than by index order. Surely a sequential disk scan over the
index to create the bitmap would a big win.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: LIKE, leading percent, bind parameters and indexes

От
"Mark Woodward"
Дата:
> On Thu, May 25, 2006 at 08:41:17PM -0300, Rodrigo Hjort wrote:
>> >
>> >I think more exactly, the planner can't possibly know how to plan an
>> >indexscan with a leading '%', because it has nowhere to start.
>> >
>>
>> The fact is that index scan is performed on LIKE expression on a string
>> not
>> preceded by '%', except when bound parameter is used.
>>
>> select * from table where field like 'THE NAME%'; -- index scan
>> select * from table where field like '%THE NAME%'; -- seq scan
>> select * from table where field like :bind_param; -- seq scan (always)
>
> Since I'm somewhat doubtful of coming up with a generic means for
> dealing with plan changes based on different bound parameter values any
> time soon...
>
> How difficult would it be to make LIKE check the value of the bound
> parameter for a starting % and use that information to decide on a query
> plan? IMHO this is worth making into a special case in the planner,
> because it's very easy to detect and makes a tremendous difference in
> the query plan/performance.
>

My solution is a function in one of my libraries called "strrev()" which
returns the reverse of a string. I make a function index of a
strrev(field). Then, just search where strrev('%the name') like
strrev(field);




Re: LIKE, leading percent, bind parameters and indexes

От
Tom Lane
Дата:
Martijn van Oosterhout <kleptog@svana.org> writes:
> On Fri, May 26, 2006 at 11:38:41AM -0500, Jim C. Nasby wrote:
>> Also, might a bitmap scan be a win for the %string case? Presumably it's
>> much faster to find matching rows via an index and then go back into the
>> heap for them; unless you're matching a heck of a lot of rows.

> This is an interesting thought. Currently, AFAICS, the bitmap-scan code
> only considers operators that are indexable, just like for narmal index
> scans. However, in this case the query could scan the entire index,
> apply the LIKE to each one and produce a bitmap of possible matches.
> Then do a bitmap scan over the table to check the results.

> Not just LIKE could use this, but any function marked STABLE. You'd
> have to weigh up the cost of scanning the *entire* index (because we
> don't have any actual restriction clauses) against avoiding a full
> table scan.

I've been thinking about this.  The general case is that you could have
some "auxiliary conditions" (arbitrary nonvolatile expressions using
only columns present in the index) as well as the regular index
qualification conditions (possibly zero of these).  AFAICS it wouldn't
matter if the indexscan is bitmap or regular.  It seems fairly doable,
and would have some nice side effects --- for example, the ancient
bugaboo that "foo IS NULL" isn't an indexable operator would be
partially assuaged.  But there are a couple of gotchas:

* It doesn't work for indexes that store "compressed" keys instead of
the original column value; which lets out GiST and GIN, at least with
some opclasses.  I'd be inclined to just implement it for btree and
maybe hash, rather than bothering with checking opclasses.  (I don't
think we have any official way for a GiST/GIN opclass to show whether
it does any key compression, anyhow.)

* Up to now, the only functions directly invoked by an index AM were
members of index opclasses; and since opclasses can only be defined by
superusers, there was at least some basis for trusting the functions
to behave sanely.  But if an index AM is going to invoke arbitrary
user-defined expressions then more care is needed.  What's particularly
bothering me is the notion of executing arbitrary functions while
holding a buffer lock on an index page.  If the arbitrary functions go
off and scan other tables (or even the same table) I think it wouldn't
be too hard to get into deadlock situations, especially across multiple
backends.  And deadlocks on LWLocks are really nasty: there's no
deadlock detection, no timeout, and no way out short of SIGQUIT/SIGKILL.
That would make it a security hole, even if the conditions needed to
trigger it are so bizarre they'd never arise in normal usage.

Given that btree now works page-at-a-time in all cases, we could imagine
fixing this by postponing checks of auxiliary conditions until after
we release the buffer lock.  This would require making copies of all
index tuples that pass the regular index quals as we scan the page
(needing at most BLCKSZ workspace), releasing the buffer lock, and then
applying the auxiliary conditions to filter out tuples we don't want to
return.  The extra data-copying is annoying but probably still beats a
trip to the heap.  It might be best to just copy the whole page out
of shared buffers and into a local page, release the lock immediately,
and then go on with checking both indexquals and auxiliary conditions
in one pass.  This would be more data-copying but the improvement in
locality of access to the shared buffer might repay that.

I don't recall the locking rules for hash in any detail, but probably
something similar would work for hash, assuming anyone even wants to
bother with it.

Comments?
        regards, tom lane


Re: LIKE, leading percent, bind parameters and indexes

От
Martijn van Oosterhout
Дата:
On Sat, May 27, 2006 at 10:57:05AM -0400, Tom Lane wrote:
> * Up to now, the only functions directly invoked by an index AM were
> members of index opclasses; and since opclasses can only be defined by
> superusers, there was at least some basis for trusting the functions
> to behave sanely.  But if an index AM is going to invoke arbitrary
> user-defined expressions then more care is needed.  What's particularly
> bothering me is the notion of executing arbitrary functions while
> holding a buffer lock on an index page.

Actually, for a first pass I was considering doing it within the
nodeIndexScan.c/nodeBitmapScan.c and not within the AM at all. But I
just remembered, the index interface has no way to return the actual
values in the index, so you can't do that :(

So other than being careful with locking, you don't see any objections?
How about the suggestion of using a sequential index scan like the
recent changes to VACUUM in the case that there are no regular index
quals?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: LIKE, leading percent, bind parameters and indexes

От
Tom Lane
Дата:
Martijn van Oosterhout <kleptog@svana.org> writes:
> How about the suggestion of using a sequential index scan like the
> recent changes to VACUUM in the case that there are no regular index
> quals?

Nonstarter (hint: the solution we found for VACUUM assumes there can
be only one).
        regards, tom lane


Re: LIKE, leading percent, bind parameters and indexes

От
Heikki Linnakangas
Дата:
On Sat, 27 May 2006, Martijn van Oosterhout wrote:

> On Sat, May 27, 2006 at 10:57:05AM -0400, Tom Lane wrote:
>> * Up to now, the only functions directly invoked by an index AM were
>> members of index opclasses; and since opclasses can only be defined by
>> superusers, there was at least some basis for trusting the functions
>> to behave sanely.  But if an index AM is going to invoke arbitrary
>> user-defined expressions then more care is needed.  What's particularly
>> bothering me is the notion of executing arbitrary functions while
>> holding a buffer lock on an index page.
>
> Actually, for a first pass I was considering doing it within the
> nodeIndexScan.c/nodeBitmapScan.c and not within the AM at all. But I
> just remembered, the index interface has no way to return the actual
> values in the index, so you can't do that :(

This discussion reminds me of the idea to do index-only scans, 
returning tuples directly from an index without hitting the heap 
at all. MVCC is the main problem there, but it would be nice that 
whatever you come up with here would be usable if we ever implement 
index-only scans.

I don't know the planner internals, so this might not make any sense at 
all, but how about having separate index scan and fetch nodes. Index scan 
would return index tuples and fetch would get the corresponding heap 
tuples. You could then have whatever you want between them, perhaps 
deferring the fetch step until just before returning the rows to the 
client.

- Heikki


Re: LIKE, leading percent, bind parameters and indexes

От
Martijn van Oosterhout
Дата:
On Sun, May 28, 2006 at 10:43:18PM +0300, Heikki Linnakangas wrote:
> I don't know the planner internals, so this might not make any sense at
> all, but how about having separate index scan and fetch nodes. Index scan
> would return index tuples and fetch would get the corresponding heap
> tuples. You could then have whatever you want between them, perhaps
> deferring the fetch step until just before returning the rows to the
> client.

That's kinda what a bitmap scan does. Although, we never fetch tuples
unless you're going to use the result in some way...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: LIKE, leading percent, bind parameters and indexes

От
Tom Lane
Дата:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On Sat, 27 May 2006, Martijn van Oosterhout wrote:
>> Actually, for a first pass I was considering doing it within the
>> nodeIndexScan.c/nodeBitmapScan.c and not within the AM at all. But I
>> just remembered, the index interface has no way to return the actual
>> values in the index, so you can't do that :(

> This discussion reminds me of the idea to do index-only scans, 
> returning tuples directly from an index without hitting the heap 
> at all. MVCC is the main problem there, but it would be nice that 
> whatever you come up with here would be usable if we ever implement 
> index-only scans.

Given my worries about needing to copy the index tuples anyway, maybe
the right way to approach this is to add a separate AM entry point
that's like amgettuple except it hands you back whole index tuples and
not just the heap TID part.  This would only be implemented by those
AMs that store the unmodified original index tuple (ie, not GiST/GIN).
Then the filtering on auxiliary conditions can be done once in the
executor code as envisioned by Martijn, and we'd also have the AM support
in place to do generalized separate index and heap scans.

I recall some discussion of using something like this to implement
joining before visiting the heap, in situations where all the join
keys are available in an index but there are too many rows for nestloop
index joining to be sufficient.  You'd pull the join keys from the
index, run merge or hash join, and then visit the heap only for the
candidate join rows.  It hasn't got done yet but it seemed like a
potentially good idea at the time.  [ pokes around... ]  The original
discussion was Red Hat private, apparently, but I mentioned it here:
http://archives.postgresql.org/pgsql-hackers/2004-05/msg00944.php
Some of that is probably superseded now by bitmap indexscans, but
not all of it.
        regards, tom lane


Re: LIKE, leading percent, bind parameters and indexes

От
Martijn van Oosterhout
Дата:
On Sat, May 27, 2006 at 11:52:40AM -0400, Tom Lane wrote:
> Martijn van Oosterhout <kleptog@svana.org> writes:
> > How about the suggestion of using a sequential index scan like the
> > recent changes to VACUUM in the case that there are no regular index
> > quals?
>
> Nonstarter (hint: the solution we found for VACUUM assumes there can
> be only one).

Bummer, I was envisioning allowing index AMs to have another access
method, the unordered sequential scan. Just like we consider a random
access of a real table to be more expensive than a seq scan, and index
scan that seeks a lot would be more expensive that a sequential scan.
So if you know you're going to scan most of an index, scanning
sequentially would be cheaper...

Ah well,

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.