Обсуждение: Text search
I have a text column in a table. We store XML in this column. Now I want to search for tags and values
Example data:
<bank>
<name>Citi Bank</name>
.....
.....
/<bank>
I would like to run the following query:
select * from xxxx where to_tsvector('english',xml_column) @@ to_tsquery('<name>Citi Bank</name>
')
This works fine but it also works for any tag as long as the name 'Citi Bank' is present.
How do I have to setup my search in order for this to work so I get an exact match for the tag and value ?
On 16/03/10 10:29, Chris Roffler wrote: > I have a text column in a table. We store XML in this column. Now I want to > search for tags and values > select * from xxxx where to_tsvector('english',xml_column) @@ > to_tsquery('<name>Citi > Bank</name>') > > This works fine but it also works for any tag as long as the name 'Citi > Bank' is present. > > How do I have to setup my search in order for this to work so I get an exact > match for the tag and value ? Would you not be better off with an XML data-type if that's how you want to treat the data? http://www.postgresql.org/docs/8.4/static/datatype-xml.html http://www.postgresql.org/docs/8.4/static/functions-xml.html http://www.postgresql.org/docs/8.4/static/xml2.html -- Richard Huxton Archonet Ltd
Richard
I tried all that and you can see it on this thread, there are some limitations on indexs on xpath work
On Tue, Mar 16, 2010 at 2:21 PM, Richard Huxton <dev@archonet.com> wrote:
On 16/03/10 10:29, Chris Roffler wrote:I have a text column in a table. We store XML in this column. Now I want to
search for tags and valuesWould you not be better off with an XML data-type if that's how you want to treat the data?select * from xxxx where to_tsvector('english',xml_column) @@
to_tsquery('<name>Citi
Bank</name>')
This works fine but it also works for any tag as long as the name 'Citi
Bank' is present.
How do I have to setup my search in order for this to work so I get an exact
match for the tag and value ?
http://www.postgresql.org/docs/8.4/static/datatype-xml.html
http://www.postgresql.org/docs/8.4/static/functions-xml.html
http://www.postgresql.org/docs/8.4/static/xml2.html
--
Richard Huxton
Archonet Ltd
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 16/03/10 12:36, Chris Roffler wrote: > Richard > > I tried all that and you can see it on this thread, there are some > limitations on indexs on xpath work > > http://archives.postgresql.org/pgsql-general/2010-03/msg00270.php > <http://archives.postgresql.org/pgsql-general/2010-03/msg00270.php> OK - I've read through your other thread and I think you either: 1. Don't want to use a relational database for this (use an XML database instead) 2. Want to make the data at least slightly relational. You're trying to index multiple fragments of a value. In your case it seems to be <name> tags within an XML fragment containing a list of <attributes>. That's exactly equivalent to trying to get an index search for '%ABC%' on text. There's nothing you can do without exposing the structure of your value to the database. You could write a custom parser for tsearch so it picked out only the relevant pieces of XML. That's probably more work than you want though. You could run an xslt transform over the xml fragments and extract what you want and then use tsearch to index that, I suppose. Similarly, you might be able to do the same via xslt and xquery. Finally, and to my mind most sensibly, if you want to search attributes, then store attributes. Parse out your XML and have an "attributes" table (id, name, value, last_changed, changed_by). That's not brilliant because every value will just be text, but at least each attribute is its own row. -- Richard Huxton Archonet Ltd
On 16/03/10 13:49, Richard Huxton wrote: > You could run an xslt transform over the xml fragments and extract what > you want and then use tsearch to index that, I suppose. Similarly, you > might be able to do the same via xslt and xquery. Actually, if it's only attribute names you're interested in you could do it with xpath Something like (untested): ALTER TABLE time_series ADD attr_names text; UPDATE time_series SET attr_names = array_to_string( xpath('*/Attribute/Name/text()', external_attributes) ,' ' ); CREATE INDEX fti_attr_names ON time_series USING gin( to_tsvector('simple', attr_names) ); SELECT * FROM time_series WHERE to_tsvector('simple', attr_names) @@ to_tsquery('simple', 'attribute22'); I'd probably just store the tsvector rather than text unless the text is of some use in itself. If you plan to do anything with the attributes it'd still be better to split them out into their own table though. -- Richard Huxton Archonet Ltd
Richard
thanks for the pointers .... unfortunately its not just attribute names.
Here is what I am thinking of doing;
In a first step I run a query
SELECT id FROM time_series WHERE
to_tsvector(xml_string)
@@
to_tsquery( anystring );
to_tsvector(xml_string)
@@
to_tsquery( anystring );
then I load the actual xml string into memory for each id found and use xpath to search the document in memory. This will at least use my text index on the first hit.
Thanks
Chris
On Tue, Mar 16, 2010 at 4:16 PM, Richard Huxton <dev@archonet.com> wrote:
On 16/03/10 13:49, Richard Huxton wrote:Actually, if it's only attribute names you're interested in you could do it with xpathYou could run an xslt transform over the xml fragments and extract what
you want and then use tsearch to index that, I suppose. Similarly, you
might be able to do the same via xslt and xquery.
Something like (untested):
ALTER TABLE time_series ADD attr_names text;
UPDATE time_series SET attr_names = array_to_string(
xpath('*/Attribute/Name/text()', external_attributes)
,' '
);
CREATE INDEX fti_attr_names ON time_series USING gin(
to_tsvector('simple', attr_names)
);
SELECT * FROM time_series WHERE
to_tsvector('simple', attr_names)
@@
to_tsquery('simple', 'attribute22');
I'd probably just store the tsvector rather than text unless the text is of some use in itself.
If you plan to do anything with the attributes it'd still be better to split them out into their own table though.
--
Richard Huxton
Archonet Ltd
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general