Обсуждение: Text search

Поиск
Список
Период
Сортировка

Text search

От
Chris Roffler
Дата:

I have a text column in a table. We store XML in this column. Now I want to search for tags and values

Example data:

<bank>
 
<name>Citi Bank</name>
  .....
  .....
/
<bank>

I would like to run the following query:

select * from xxxx where to_tsvector('english',xml_column) @@ to_tsquery('<name>Citi Bank</name>')

This works fine but it also works for any tag as long as the name 'Citi Bank' is present.

How do I have to setup my search in order for this to work so I get an exact match for the tag and value ?

Re: Text search

От
Richard Huxton
Дата:
On 16/03/10 10:29, Chris Roffler wrote:
> I have a text column in a table. We store XML in this column. Now I want to
> search for tags and values

> select * from xxxx where to_tsvector('english',xml_column) @@
> to_tsquery('<name>Citi
> Bank</name>')
>
> This works fine but it also works for any tag as long as the name 'Citi
> Bank' is present.
>
> How do I have to setup my search in order for this to work so I get an exact
> match for the tag and value ?

Would you not be better off with an XML data-type if that's how you want
to treat the data?
http://www.postgresql.org/docs/8.4/static/datatype-xml.html
http://www.postgresql.org/docs/8.4/static/functions-xml.html
http://www.postgresql.org/docs/8.4/static/xml2.html

--
   Richard Huxton
   Archonet Ltd

Re: Text search

От
Chris Roffler
Дата:
Richard

I tried all that and you can see it on this thread, there are some limitations on indexs on xpath work 



On Tue, Mar 16, 2010 at 2:21 PM, Richard Huxton <dev@archonet.com> wrote:
On 16/03/10 10:29, Chris Roffler wrote:
I have a text column in a table. We store XML in this column. Now I want to
search for tags and values

select * from xxxx where to_tsvector('english',xml_column) @@
to_tsquery('<name>Citi
Bank</name>')

This works fine but it also works for any tag as long as the name 'Citi
Bank' is present.

How do I have to setup my search in order for this to work so I get an exact
match for the tag and value ?

Would you not be better off with an XML data-type if that's how you want to treat the data?
http://www.postgresql.org/docs/8.4/static/datatype-xml.html
http://www.postgresql.org/docs/8.4/static/functions-xml.html
http://www.postgresql.org/docs/8.4/static/xml2.html


--
 Richard Huxton
 Archonet Ltd

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Text search

От
Richard Huxton
Дата:
On 16/03/10 12:36, Chris Roffler wrote:
> Richard
>
> I tried all that and you can see it on this thread, there are some
> limitations on indexs on xpath work
>
> http://archives.postgresql.org/pgsql-general/2010-03/msg00270.php
> <http://archives.postgresql.org/pgsql-general/2010-03/msg00270.php>

OK - I've read through your other thread and I think you either:
1. Don't want to use a relational database for this (use an XML database
instead)
2. Want to make the data at least slightly relational.

You're trying to index multiple fragments of a value. In your case it
seems to be <name> tags within an XML fragment containing a list of
<attributes>. That's exactly equivalent to trying to get an index search
for '%ABC%' on text.

There's nothing you can do without exposing the structure of your value
to the database.

You could write a custom parser for tsearch so it picked out only the
relevant pieces of XML. That's probably more work than you want though.

You could run an xslt transform over the xml fragments and extract what
you want and then use tsearch to index that, I suppose. Similarly, you
might be able to do the same via xslt and xquery.

Finally, and to my mind most sensibly, if you want to search attributes,
then store attributes. Parse out your XML and have an "attributes" table
(id, name, value, last_changed, changed_by). That's not brilliant
because every value will just be text, but at least each attribute is
its own row.

--
   Richard Huxton
   Archonet Ltd

Re: Text search

От
Richard Huxton
Дата:
On 16/03/10 13:49, Richard Huxton wrote:
> You could run an xslt transform over the xml fragments and extract what
> you want and then use tsearch to index that, I suppose. Similarly, you
> might be able to do the same via xslt and xquery.

Actually, if it's only attribute names you're interested in you could do
it with xpath

Something like (untested):

ALTER TABLE time_series ADD attr_names text;

UPDATE time_series SET attr_names = array_to_string(
     xpath('*/Attribute/Name/text()', external_attributes)
     ,' '
);

CREATE INDEX fti_attr_names ON time_series USING gin(
   to_tsvector('simple', attr_names)
);

SELECT * FROM time_series WHERE
   to_tsvector('simple', attr_names)
   @@
   to_tsquery('simple', 'attribute22');

I'd probably just store the tsvector rather than text unless the text is
of some use in itself.

If you plan to do anything with the attributes it'd still be better to
split them out into their own table though.

--
   Richard Huxton
   Archonet Ltd

Re: Text search

От
Chris Roffler
Дата:
Richard

thanks for the pointers .... unfortunately its not just attribute names.

Here is what I am thinking of doing;

In a first step I run a query 

SELECT id  FROM time_series WHERE
 to_tsvector(xml_string)
 @@
 to_tsquery( anystring );

 then I load the actual xml string into memory for each id found and use xpath to search the document in memory. This will at least use my text index on the first hit.

Thanks
Chris 



On Tue, Mar 16, 2010 at 4:16 PM, Richard Huxton <dev@archonet.com> wrote:
On 16/03/10 13:49, Richard Huxton wrote:
You could run an xslt transform over the xml fragments and extract what
you want and then use tsearch to index that, I suppose. Similarly, you
might be able to do the same via xslt and xquery.

Actually, if it's only attribute names you're interested in you could do it with xpath

Something like (untested):

ALTER TABLE time_series ADD attr_names text;

UPDATE time_series SET attr_names = array_to_string(
   xpath('*/Attribute/Name/text()', external_attributes)
   ,' '
);

CREATE INDEX fti_attr_names ON time_series USING gin(
 to_tsvector('simple', attr_names)
);

SELECT * FROM time_series WHERE
 to_tsvector('simple', attr_names)
 @@
 to_tsquery('simple', 'attribute22');

I'd probably just store the tsvector rather than text unless the text is of some use in itself.

If you plan to do anything with the attributes it'd still be better to split them out into their own table though.


--
 Richard Huxton
 Archonet Ltd

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general