Обсуждение: 'LIKE' enhancement suggestion

Поиск
Список
Период
Сортировка

'LIKE' enhancement suggestion

От
JB
Дата:
I have a 50MB +- table with about 70,000 records which on which I was
doing LIKE selects. It was taking approx 20 secs to complete the search.
The table is something like...

CREATE TABLE info ( lastname char(50), street_name char(50), ...(etc omitted)
);

CREATE INDEX nx_info1 ON info (lastname);
CREATE INDEX nx_info2 ON info (street_name);

on which I was doing...

SELECT * FROM info WHERE street_name LIKE 'MAIN%';

...this would take about 20 secs to complete. Because the wildness only
happens at the end of the search string, I changed the query to...

SELECT * FROM info WHERE substring( street_name from 1 to 4 ) = 'MAIN';

...this takes under 2 secs. I wrote a piece of code in python to do this
automatically for me but it seems to me that the parser/optimizer could
take a look at this case and re-write the query with the '=' instead of
the 'LIKE'. I've looked through the 'C' code to see where this could
happen but it is too thick for me to sort out with my schedule, so I
thought I'd make the suggestion here.

cheers
jim
-- 
If everything is coming your way then you're in the wrong lane.


Re: [HACKERS] 'LIKE' enhancement suggestion

От
Tom Lane
Дата:
JB <jimbag@kw.igs.net> writes:
> SELECT * FROM info WHERE street_name LIKE 'MAIN%';

> ...this would take about 20 secs to complete. Because the wildness only
> happens at the end of the search string, I changed the query to...

> SELECT * FROM info WHERE substring( street_name from 1 to 4 ) = 'MAIN';

> ...this takes under 2 secs.

This makes no sense to me at all.  The latter query should be far
slower, because AFAIK there is no optimization for it, whereas there is
an optimization for "foo LIKE 'bar%'".

What version are you running, and what plan does EXPLAIN show for
each of these queries?
        regards, tom lane


Re: [HACKERS] 'LIKE' enhancement suggestion

От
JB
Дата:
I'm running 6.5.2 on RH6.1, 128mb ram, 27gb, P350. I don't understand
this either so please excuse my ignorance. I looked up EXPLAIN and
here's what came out...

---[snip]---
#!/bin/sh
psql -c "EXPLAIN SELECT * FROM info WHERE substring(stname from 1 for 4)
= 'MAIN';"
time psql -c "SELECT * FROM info WHERE substring(stname from 1 for 4) =
'MAIN';"

psql -c "EXPLAIN SELECT * FROM info WHERE stname LIKE 'MAIN%';"
time psql -c "SELECT * FROM info WHERE stname LIKE 'MAIN%';"
---[snip]---

outputs...


Seq Scan on info  (cost=3829.93 rows=15454 width=420)

0.01user 0.01system 0:00.72elapsed 2%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (198major+25minor)pagefaults 0swaps


Index Scan using nx_info1 on info  (cost=1531.12 rows=30 width=420)

0.01user 0.01system 0:00.64elapsed 3%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (198major+25minor)pagefaults 0swaps


Obviously the numbers don't support me. I'm quite confused. I was told
that the engine didn't use indexes with 'LIKE' by someone equally
informed as I, and thus the 'substring' change. This worked remarkably
faster so I assumed it to be true. Apparently it is not. There must be
something with the bigger system that I need to look into (mem usage,
etc). My apologies for chewing up bandwidth. 

jb


Tom Lane wrote:
> 
> JB <jimbag@kw.igs.net> writes:
> > SELECT * FROM info WHERE street_name LIKE 'MAIN%';
> 
> > ...this would take about 20 secs to complete. Because the wildness only
> > happens at the end of the search string, I changed the query to...
> 
> > SELECT * FROM info WHERE substring( street_name from 1 to 4 ) = 'MAIN';
> 
> > ...this takes under 2 secs.
> 
> This makes no sense to me at all.  The latter query should be far
> slower, because AFAIK there is no optimization for it, whereas there is
> an optimization for "foo LIKE 'bar%'".
> 
> What version are you running, and what plan does EXPLAIN show for
> each of these queries?
> 
>                         regards, tom lane
> 
> ************

-- 
If everything is coming your way then you're in the wrong lane.


Re: [HACKERS] 'LIKE' enhancement suggestion

От
Tom Lane
Дата:
JB <jimbag@kw.igs.net> writes:
> My apologies for chewing up bandwidth. 

Not at all!  Just because I don't understand it does not mean
you haven't found an effect worth looking into ;-)

> I'm running 6.5.2 on RH6.1, 128mb ram, 27gb, P350.

OK, cool.  We've had a couple of weird-looking questions that turned
out to be from people running ancient releases, so "what version" is
something we all routinely ask now.

> ---[snip]---
> #!/bin/sh
> psql -c "EXPLAIN SELECT * FROM info WHERE substring(stname from 1 for 4)
> = 'MAIN';"
> time psql -c "SELECT * FROM info WHERE substring(stname from 1 for 4) =
> 'MAIN';"

> psql -c "EXPLAIN SELECT * FROM info WHERE stname LIKE 'MAIN%';"
> time psql -c "SELECT * FROM info WHERE stname LIKE 'MAIN%';"
> ---[snip]---

> outputs...

> Seq Scan on info  (cost=3829.93 rows=15454 width=420)

> 0.01user 0.01system 0:00.72elapsed 2%CPU (0avgtext+0avgdata
> 0maxresident)k
> 0inputs+0outputs (198major+25minor)pagefaults 0swaps

> Index Scan using nx_info1 on info  (cost=1531.12 rows=30 width=420)

> 0.01user 0.01system 0:00.64elapsed 3%CPU (0avgtext+0avgdata
> 0maxresident)k
> 0inputs+0outputs (198major+25minor)pagefaults 0swaps


> Obviously the numbers don't support me. I'm quite confused.

"time psql" doesn't really tell you anything much, since the CPU
numbers it cites only cover the psql front end, not the backend
database server.  You can put some faith in the "elapsed time"
values, but only if your machine is otherwise idle.  In this case
you have readings 0.72 and 0.64, which are IMHO too close to call;
you'd need to make a longer-running test case to have much confidence
in the results.

But you said before that you saw 20 sec vs. 2 sec, which is surely
a significant difference (barring major load variations from other
programs on your machine); can you duplicate that?

> I was told that the engine didn't use indexes with 'LIKE' by someone
> equally informed as I, and thus the 'substring' change.

Postgres does use an index for "foo LIKE 'bar%'" if it can.  6.5
is not very bright about this when you have USE_LOCALE enabled,
but 7.0 is smarter.

> There must be something with the bigger system that I need to
> look into (mem usage, etc).

It's worth looking into.  Feel free to contact me off-list if you
want to probe further.
        regards, tom lane


Re: [HACKERS] 'LIKE' enhancement suggestion

От
The Hermit Hacker
Дата:
On Tue, 7 Mar 2000, JB wrote:

> Obviously the numbers don't support me. I'm quite confused. I was told
> that the engine didn't use indexes with 'LIKE' by someone equally
> informed as I
Bruce made mods at least a release, if not more, back that allowed
a LIKE to use an index *if and only if* the only "varying" part was the
tail end ...