Обсуждение: using text search

Поиск
Список
Период
Сортировка

using text search

От
Rafał Pietrak
Дата:
... or not (I'm not quite sure)

Hello,

I have the following tables:

CREATE TABLE orders (info text, ....);
CREATE TABLE keywords (phrase text, .....);

And I need to find all the ORDERS rows, which conain a PHRASE present in
the info column ... like so:
SELECT o.* from orders o join keywords k on (o.info ~~ '%' || k.phrase
|| '%');

... only this does not work, since:
ERROR:  argument of JOIN/ON must be type boolean, not type text

is this possible in SQL? Or may be this a job for "ts_something()"
(havent' learned to use them, yet)???

-R


Re: using text search

От
Alfonso Afonso
Дата:
Hi Rafal

Maybe you should think or consider to have normalized database to do this kind of select.

I mean that if the keyword is a key you could add a column to table orders and have it ordered and indexed without
havingto reevaluate the query every time. 

About your question, if you are searching a text inside another text you could obtain bizarre results and consume a lot
ofresources, but my approximation to your problem would be doing direct select instead of join select (in fact it is
almostthe same, but more readable on this cases), like: 

select o.* from orders as o, keywords as k
where o.info like ('%' || k.phrase || '%')

Hope help you.

Bye for now

El 22/04/2013, a las 14:15, Rafał Pietrak <rafal@zorro.isa-geek.com> escribió:

> ... or not (I'm not quite sure)
>
> Hello,
>
> I have the following tables:
>
> CREATE TABLE orders (info text, ....);
> CREATE TABLE keywords (phrase text, .....);
>
> And I need to find all the ORDERS rows, which conain a PHRASE present in the info column ... like so:
> SELECT o.* from orders o join keywords k on (o.info ~~ '%' || k.phrase || '%');
>
> ... only this does not work, since:
> ERROR:  argument of JOIN/ON must be type boolean, not type text
>
> is this possible in SQL? Or may be this a job for "ts_something()" (havent' learned to use them, yet)???
>
> -R
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Alfonso Afonso
(personal)







Re: using text search

От
Rafał Pietrak
Дата:
Hi,

W dniu 04/22/2013 05:57 PM, Alfonso Afonso pisze:
> Hi Rafal
>
> Maybe you should think or consider to have normalized database to do this kind of select.
>
> I mean that if the keyword is a key you could add a column to table orders and have it ordered and indexed without
havingto reevaluate the query every time. 
I wouldn't like to do that for the following reasons:
1. postgres is actually a "slave" database, where data is copied for www
publication from "root" database. the copying is achieved by means of
files produced by an export program, over which I don't have any
control; not to mention, that I don't have any control over the
layour/design/access to the "root" database. Under such circumstances
I'd rather keep postgres schema as close to the file format, not to the
"things" I do with the data afterwords - just in case "they" choose to
change the format.
2. The whole lot is reinitialized by night, so normalization would have
to be rerun as frequently.
3. The actual text-in-text search is not going to be very frequent -
like once a week, or after every import; I can materialize a VIEW for
that purpose, but it would save me an ocasional second query-run, when I
spot errors in results of the first. Not much of a gain, but I keep that
in mind for the future.

>
> About your question, if you are searching a text inside another text you could obtain bizarre results and consume a
lotof resources, but my approximation to your problem 
Yes. Now I can see the "lot of resources": the query (like you suggest
below) currently runs for 5 minutes and haven't ended, yet. And this is
really bad, since my "volumes" are the following:
1. currently KEYWORDS table contains just 4 (four) rows.
2. currently ORDERS contains c.a. 1mln records.
3. currently SELECT * from ORDERS where info ~~ ('%' ||
'some-test-pattern' || '%'); complets in 1sec.
4. let's (for the purpose of this emial clearity) assing XX*PAT*XX to: "
info ~~ ('%' || 'some-test-pattern' || '%')", for different PATTERNS
5. currently SELECT * from ORDERS where (XX*PAT1*XX) or (XX*PAT2*XX);
executes in 1.7sec.

So putting 'some-test-pattern' into a table makes things "nonelinearly"
worse.

I can provide EXPLAIN ANALYSE of both cases If that would help, but that
will need some time, since the "two table" variant haven't finished in
5min, and I killed it before knowing how long it takes to complete.


-R

> would be doing direct select instead of join select (in fact it is almost the same, but more readable on this cases),
like:
>
> select o.* from orders as o, keywords as k
> where o.info like ('%' || k.phrase || '%')
>
> Hope help you.
>
> Bye for now
>
> El 22/04/2013, a las 14:15, Rafał Pietrak <rafal@zorro.isa-geek.com> escribió:
>
>> ... or not (I'm not quite sure)
>>
>> Hello,
>>
>> I have the following tables:
>>
>> CREATE TABLE orders (info text, ....);
>> CREATE TABLE keywords (phrase text, .....);
>>
>> And I need to find all the ORDERS rows, which conain a PHRASE present in the info column ... like so:
>> SELECT o.* from orders o join keywords k on (o.info ~~ '%' || k.phrase || '%');
>>
>> ... only this does not work, since:
>> ERROR:  argument of JOIN/ON must be type boolean, not type text
>>
>> is this possible in SQL? Or may be this a job for "ts_something()" (havent' learned to use them, yet)???
>>
>> -R
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
> Alfonso Afonso
> (personal)
>
>
>
>
>
>
>



Re: using text search

От
Alfonso Afonso
Дата:
And using substring function? Do you have the same results?

I understand your arguments about normalization but, as you know, BNF is the best SQL Developer friend ;)

I used to avoid text fields and search/filter them (when it is possible), mainly because this consumption issues.... if the query has been on execution for more than 5 minutes, the virtual table that postgresql is preparing in memory should be huge.

Bye for now and good luck :)

El 22/04/2013, a las 18:12, Rafał Pietrak <rafal@zorro.isa-geek.com> escribió:

About your question, if you are searching a text inside another text you could obtain bizarre results and consume a lot of resources, but my approximation to your problem


Alfonso Afonso
(personal)





Re: using text search

От
Alfonso Afonso
Дата:
I forgot to say that the function is "position ( txtseach in txtcomplete)" :)

Bye

El 22/04/2013, a las 19:36, Alfonso Afonso <aafonsoc@gmail.com> escribió:

And using substring function? Do you have the same results?

I understand your arguments about normalization but, as you know, BNF is the best SQL Developer friend ;)

I used to avoid text fields and search/filter them (when it is possible), mainly because this consumption issues.... if the query has been on execution for more than 5 minutes, the virtual table that postgresql is preparing in memory should be huge.

Bye for now and good luck :)

El 22/04/2013, a las 18:12, Rafał Pietrak <rafal@zorro.isa-geek.com> escribió:

About your question, if you are searching a text inside another text you could obtain bizarre results and consume a lot of resources, but my approximation to your problem


Alfonso Afonso
(personal)






Alfonso Afonso
(personal)





Re: using text search

От
Rafał Pietrak
Дата:
W dniu 04/22/2013 08:43 PM, Alfonso Afonso pisze:
I forgot to say that the function is "position ( txtseach in txtcomplete)" :)

Bye


Alfonso, thenx

But if I may: How can I use that function? In a context of my problem?

then again. At the edge of desperation, I'm thinking of writing a function, that will fetch all the KEYWORDS in one query, then cook explicit WHERE clause by string operations, and then EXECUTE it. With (currently) four keywords, I'd expect such function to return results within 5 seconds at most.

but I'd expect that there should be a way to "tell this" to postgresql SQL directly. Isn't it?


-R


Re: using text search

От
Alfonso Afonso
Дата:
Hi Rafal

This function returns the position where the substring is found, so you could do a query with clause
position(table1.fieldin table2.field) 

The 0 result is not found and maybe, in your case, is faster the use of internal text functions instead of like
comparison...hope helps. 

Bye

El 23/04/2013, a las 11:24, Rafał Pietrak <rafal@zorro.isa-geek.com> escribió:

> W dniu 04/22/2013 08:43 PM, Alfonso Afonso pisze:
>> I forgot to say that the function is "position ( txtseach in txtcomplete)" :)
>>
>> Bye
>
>
> Alfonso, thenx
>
> But if I may: How can I use that function? In a context of my problem?
>
> then again. At the edge of desperation, I'm thinking of writing a function, that will fetch all the KEYWORDS in one
query,then cook explicit WHERE clause by string operations, and then EXECUTE it. With (currently) four keywords, I'd
expectsuch function to return results within 5 seconds at most. 
>
> but I'd expect that there should be a way to "tell this" to postgresql SQL directly. Isn't it?
>
>
> -R
>
>

Alfonso Afonso
(personal)







Re: using text search

От
Rafał Pietrak
Дата:
Ha! Got it!

for enybody whois interested:

with tst(regexp) as (SELECT '(' || array_to_string(array_agg(phrase),
'|') || ')' from KEYWORDS) select o.* from ORDERS o, tst t where o.info
~ t.regexp;

execution time: 6400ms. (keywords=4, orders=1mln)

BTW: does anybody know if there is an index, that could improve the
performence of the above regexp?

-R


W dniu 04/23/2013 02:30 PM, Alfonso Afonso pisze:
> Hi Rafal
>
> This function returns the position where the substring is found, so you could do a query with clause
position(table1.fieldin table2.field) 
>
> The 0 result is not found and maybe, in your case, is faster the use of internal text functions instead of like
comparison...hope helps. 
>
> Bye
>
> El 23/04/2013, a las 11:24, Rafał Pietrak <rafal@zorro.isa-geek.com> escribió:
>
>> W dniu 04/22/2013 08:43 PM, Alfonso Afonso pisze:
>>> I forgot to say that the function is "position ( txtseach in txtcomplete)" :)
>>>
>>> Bye
>>
>> Alfonso, thenx
>>
>> But if I may: How can I use that function? In a context of my problem?
>>
>> then again. At the edge of desperation, I'm thinking of writing a function, that will fetch all the KEYWORDS in one
query,then cook explicit WHERE clause by string operations, and then EXECUTE it. With (currently) four keywords, I'd
expectsuch function to return results within 5 seconds at most. 
>>
>> but I'd expect that there should be a way to "tell this" to postgresql SQL directly. Isn't it?
>>
>>
>> -R
>>
>>
> Alfonso Afonso
> (personal)
>
>
>
>
>
>
>