Обсуждение: XML Index again

Поиск
Список
Период
Сортировка

XML Index again

От
Chris Roffler
Дата:
I still have some problems with my xml index

CREATE INDEX xml_index
  ON time_series
  USING btree
  ((
  (xpath('/AttributeList/Attributes/Attribute/Name/text()', external_attributes))[1]::text));

When I run the following query the index is not used :

select id from time_series where  
array_upper(
(xpath('/AttributeList/Attributes/Attribute[Name="Attribute122021"]', external_attributes))
, 1) > 0

Any Idea on how to configure the index ?

Thanks
Chris



Re: XML Index again

От
Alban Hertroys
Дата:
On 7 Mar 2010, at 11:02, Chris Roffler wrote:

> I still have some problems with my xml index
>
> CREATE INDEX xml_index
>   ON time_series
>   USING btree
>   ((
>   (xpath('/AttributeList/Attributes/Attribute/Name/text()', external_attributes))[1]::text));
>
> When I run the following query the index is not used :
>
> select id from time_series where
> array_upper(
> (xpath('/AttributeList/Attributes/Attribute[Name="Attribute122021"]', external_attributes))
> , 1) > 0
>
> Any Idea on how to configure the index ?

There are a couple of cases where Postgres won't use your index, but in this case it's quite clearly because you're
askingfor (quite) a different expression than the one you indexed. 

You seem to want to test for the existence of nodes with a specific name, maybe this is what you're looking for?:

SELECT id FROM time_series t1 WHERE EXISTS (
    SELECT 1
      FROM time_series t2
     WHERE (xpath('/AttributeList/Attributes/Attribute/Name/text()', external_attributes))[1]::text =
('Attribute122021',external_attributes) 
       AND t2.id = t1.id
);

It's just a guess at what you're trying to do, so I may very well have gotten it wrong. The important part is that you
needto use the expression you indexed in your where clause, or the database has no idea you mean something similar as
towhat you indexed. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4b9389db296924445911763!



Re: XML Index again

От
Chris
Дата:
Alban

thanks for your replay. Yes I am looking for node exists ...

I'll give it a roll.



>There are a couple of cases where Postgres won't use your index, but in this case it's quite clearly because you're
askingfor (quite) a different expression than the one you indexed. 
>
>You seem to want to test for the existence of nodes with a specific name, maybe this is what you're looking for?:
>
>SELECT id FROM time_series t1 WHERE EXISTS (
>    SELECT 1
>      FROM time_series t2
>     WHERE (xpath('/AttributeList/Attributes/Attribute/Name/text()', external_attributes))[1]::text =
('Attribute122021',external_attributes) 
>       AND t2.id = t1.id
>);
>
>It's just a guess at what you're trying to do, so I may very well have gotten it wrong. The important part is that you
needto use the expression you indexed in your where clause, or the database has no idea you mean something similar as
towhat you indexed. 
>
>Alban Hertroys
>
>--
>If you can't see the forest for the trees,
>cut the trees and you'll see there is no forest.
>
>
>!DSPAM:737,4b9389db296924445911763!
>
>
>
>--
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general


Re: XML Index again

От
Chris Roffler
Дата:
Alban

Thanks for your help, your suggestion worked.

I need another xpath expression to find any Attribute with  Name ="xxxx" under the Attributes node. (not just if it is in a specific position)
see   query below.
How do I create an index for this xpath expression ?

Thanks
Chris 


 SELECT * FROM time_series 
        WHERE array_upper((xpath('/AttributeList/Attributes/Attribute[Name="xxxxx"]',   external_attributes)),1) > 0  


On Sun, Mar 7, 2010 at 1:11 PM, Alban Hertroys <dalroi@solfertje.student.utwente.nl> wrote:
On 7 Mar 2010, at 11:02, Chris Roffler wrote:

> I still have some problems with my xml index
>
> CREATE INDEX xml_index
>   ON time_series
>   USING btree
>   ((
>   (xpath('/AttributeList/Attributes/Attribute/Name/text()', external_attributes))[1]::text));
>
> When I run the following query the index is not used :
>
> select id from time_series where
> array_upper(
> (xpath('/AttributeList/Attributes/Attribute[Name="Attribute122021"]', external_attributes))
> , 1) > 0
>
> Any Idea on how to configure the index ?

There are a couple of cases where Postgres won't use your index, but in this case it's quite clearly because you're asking for (quite) a different expression than the one you indexed.

You seem to want to test for the existence of nodes with a specific name, maybe this is what you're looking for?:

SELECT id FROM time_series t1 WHERE EXISTS (
       SELECT 1
         FROM time_series t2
        WHERE (xpath('/AttributeList/Attributes/Attribute/Name/text()', external_attributes))[1]::text = ('Attribute122021', external_attributes)
          AND t2.id = t1.id
);

It's just a guess at what you're trying to do, so I may very well have gotten it wrong. The important part is that you need to use the expression you indexed in your where clause, or the database has no idea you mean something similar as to what you indexed.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:1034,4b9389d6296921789322580!



Re: XML Index again

От
Alban Hertroys
Дата:
On 8 Mar 2010, at 11:39, Chris Roffler wrote:

> Alban
>
> Thanks for your help, your suggestion worked.
>
> I need another xpath expression to find any Attribute with  Name ="xxxx" under the Attributes node. (not just if it
isin a specific position) 
> see   query below.

Your previous query wasn't about attributes in any specific position - it returned documents that contained more than
zeroattributes matching a given name. What are you trying to do this time? 

> How do I create an index for this xpath expression ?

You don't need to create another index (although one w/o the external_attributes column would probably be more
convenient);the index you have contains those names already. 

Just make sure you use the same expression you used to create the index to match the part in your xml and compare it to
thetext you're looking for. 

If you want to use indexes on your xml, then you'll need to stop putting the variable parts of your queries inside your
xpathexpressions - you make them unindexable that way. So move those [Name='xxxx']'s out of your xpath expressions.
Insteadhave the expressions result in the names so that you can compare them to the names stored in your index(es). 

It won't be as fast as looking for those names using xpath in an xml document, as every attribute name is a candidate
forcomparison now, but at least it's indexable. 

Alternatively you could try to build an index from the names contained in each xml document. Something like:

CREATE INDEX xml_attribute_names
    ON time_series
 USING btree (xpath('/AttributeList/Attributes/Attribute/Name/text()'));

This stores the array of all attribute names in an index. You can query for existence of specific attributes using the
ANYoperator on the resulting array (see archives and docs for how to do that). 

I believe (I've never needed to use arrays) the syntax is:

SELECT * FROM time_series WHERE 'xxxx' = ANY (xpath('/AttributeList/Attributes/Attribute/text()'));

It'll probably be faster than the previous solution, but use more disk space and memory.

>  SELECT * FROM time_series
>         WHERE array_upper((xpath('/AttributeList/Attributes/Attribute[Name="xxxxx"]',   external_attributes)),1) > 0

>
>
> On Sun, Mar 7, 2010 at 1:11 PM, Alban Hertroys <dalroi@solfertje.student.utwente.nl> wrote:
...
> You seem to want to test for the existence of nodes with a specific name, maybe this is what you're looking for?:
>
> SELECT id FROM time_series t1 WHERE EXISTS (
>        SELECT 1
>          FROM time_series t2
>         WHERE (xpath('/AttributeList/Attributes/Attribute/Name/text()', external_attributes))[1]::text =
('Attribute122021',external_attributes) 
>           AND t2.id = t1.id
> );

For clarity, if you would have an index on just that xpath expression - without the external_attributes column - this
querywould look simpler: 

SELECT id FROM time_series t1 WHERE EXISTS (
       SELECT 1
         FROM time_series t2
        WHERE xpath('/AttributeList/Attributes/Attribute/Name/text()')[1]::text = 'Attribute122021'
          AND t2.id = t1.id
);

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4b94df40296929445119198!



Re: XML Index again

От
Alban Hertroys
Дата:
On 8 Mar 2010, at 11:39, Chris Roffler wrote:

> Alban
>
> Thanks for your help, your suggestion worked.
>
> I need another xpath expression to find any Attribute with  Name ="xxxx" under the Attributes node. (not just if it
isin a specific position) 
> see   query below.
> How do I create an index for this xpath expression ?

Come to think of it... You seem to be storing an attribute/value list in the database in XML. Is there any reason not
tostore the attributes and their values as a normal table? Your approach seems a bit over-complicated and you seem to
havesome trouble getting your own mind around it. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4b94e083296929244758856!



Re: XML Index again

От
Chris Roffler
Дата:
Alban

thanks for your response. I understand what you are saying .....

>Your previous query wasn't about attributes in any specific position - it returned documents that contained >more than zero attributes matching a given name. What are you trying to do this time?

And that is exactly my problem, only if the first attribute in the Attributes is a match the query returns successful.
Please see my example below 

Thanks
Chris


CREATE INDEX xml_index
  ON time_series
  USING btree
  ((
  (xpath('/AttributeList/Attributes/Attribute/Name/text()', external_attributes))[1]::text));


SELECT id FROM time_series t1 WHERE EXISTS (
        SELECT 1
          FROM time_series t2
         WHERE (xpath('/AttributeList/Attributes/Attribute/Name/text()', external_attributes))[1]::text = 'xxxxx'
           AND t2.id = t1.id
 );

When I run the select it uses the index. 

The first query I run is with 'xxxxx' = 'Attribute102021' this works like a champ.
The second query is run with 'xxxx' = 'Attribute202021' and returns null. ( this is the second Attribute )


---- XML example 

<AttributeList>
    <Attributes>
        <Attribute>
            <Name>Attribute102021</Name>
            <Value>111111111</Value>
            <LastChanged>2010-03-05T05:44:36.796-05:00</LastChanged>
            <ChangedBy>Chris</ChangedBy>
        </Attribute>
        <Attribute>
            <Name>Attribute202021</Name>
            <Value>222222222</Value>
            <LastChanged>2010-03-05T05:44:36.796-05:00</LastChanged>
            <ChangedBy>Maya</ChangedBy>
        </Attribute>
    </Attributes>
</AttributeList>

On Mon, Mar 8, 2010 at 1:27 PM, Alban Hertroys <dalroi@solfertje.student.utwente.nl> wrote:
On 8 Mar 2010, at 11:39, Chris Roffler wrote:

> Alban
>
> Thanks for your help, your suggestion worked.
>
> I need another xpath expression to find any Attribute with  Name ="xxxx" under the Attributes node. (not just if it is in a specific position)
> see   query below.

Your previous query wasn't about attributes in any specific position - it returned documents that contained more than zero attributes matching a given name. What are you trying to do this time?

> How do I create an index for this xpath expression ?

You don't need to create another index (although one w/o the external_attributes column would probably be more convenient); the index you have contains those names already.

Just make sure you use the same expression you used to create the index to match the part in your xml and compare it to the text you're looking for.

If you want to use indexes on your xml, then you'll need to stop putting the variable parts of your queries inside your xpath expressions - you make them unindexable that way. So move those [Name='xxxx']'s out of your xpath expressions. Instead have the expressions result in the names so that you can compare them to the names stored in your index(es).

It won't be as fast as looking for those names using xpath in an xml document, as every attribute name is a candidate for comparison now, but at least it's indexable.

Alternatively you could try to build an index from the names contained in each xml document. Something like:

CREATE INDEX xml_attribute_names
   ON time_series
 USING btree (xpath('/AttributeList/Attributes/Attribute/Name/text()'));

This stores the array of all attribute names in an index. You can query for existence of specific attributes using the ANY operator on the resulting array (see archives and docs for how to do that).

I believe (I've never needed to use arrays) the syntax is:

SELECT * FROM time_series WHERE 'xxxx' = ANY (xpath('/AttributeList/Attributes/Attribute/text()'));

It'll probably be faster than the previous solution, but use more disk space and memory.

>  SELECT * FROM time_series
>         WHERE array_upper((xpath('/AttributeList/Attributes/Attribute[Name="xxxxx"]',   external_attributes)),1) > 0
>
>
> On Sun, Mar 7, 2010 at 1:11 PM, Alban Hertroys <dalroi@solfertje.student.utwente.nl> wrote:
...
> You seem to want to test for the existence of nodes with a specific name, maybe this is what you're looking for?:
>
> SELECT id FROM time_series t1 WHERE EXISTS (
>        SELECT 1
>          FROM time_series t2
>         WHERE (xpath('/AttributeList/Attributes/Attribute/Name/text()', external_attributes))[1]::text = ('Attribute122021', external_attributes)
>           AND t2.id = t1.id
> );

For clarity, if you would have an index on just that xpath expression - without the external_attributes column - this query would look simpler:

SELECT id FROM time_series t1 WHERE EXISTS (
      SELECT 1
        FROM time_series t2
       WHERE xpath('/AttributeList/Attributes/Attribute/Name/text()')[1]::text = 'Attribute122021'
         AND t2.id = t1.id
);

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:1034,4b94df38296921956520267!



Re: XML Index again

От
Alban Hertroys
Дата:
On 8 Mar 2010, at 13:23, Chris Roffler wrote:

> Alban
>
> thanks for your response. I understand what you are saying .....
>
> >Your previous query wasn't about attributes in any specific position - it returned documents that contained >more
thanzero attributes matching a given name. What are you trying to do this time? 
>
> And that is exactly my problem, only if the first attribute in the Attributes is a match the query returns
successful.
> Please see my example below

Ah yes it would, as you only index the first attribute. I guess you'll have to give indexing the list of attributes a
tryas I outlined in a previous message. 

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


!DSPAM:737,4b94ef13296926894712957!



Re: XML Index again

От
Chris Roffler
Дата:
Yup you are right .... however I am trying to run benchmarks with the two solutions.  

The xml solution will give us more flexibility in the future , just in case we do not have attribute/value lists :)


On Mon, Mar 8, 2010 at 1:33 PM, Alban Hertroys <dalroi@solfertje.student.utwente.nl> wrote:
On 8 Mar 2010, at 11:39, Chris Roffler wrote:

> Alban
>
> Thanks for your help, your suggestion worked.
>
> I need another xpath expression to find any Attribute with  Name ="xxxx" under the Attributes node. (not just if it is in a specific position)
> see   query below.
> How do I create an index for this xpath expression ?

Come to think of it... You seem to be storing an attribute/value list in the database in XML. Is there any reason not to store the attributes and their values as a normal table? Your approach seems a bit over-complicated and you seem to have some trouble getting your own mind around it.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:1034,4b94e081296926114815748!