Обсуждение: what do i need to know about array index?

Поиск
Список
Период
Сортировка

what do i need to know about array index?

От
Prometheus Prometheus
Дата:
hi

i have a table like this
CREATE TABLE test(
  id bigint,
  tag boolean[]
) WITH ( OIDS=FALSE );

this is a parent table of some partition tables
one table contains 1mio entries
well in the future it should - currently only filled with test data

the array is used to 'tag' a row
which means every column of the array has some meaning which can be turned on and off
the tag's topic is created by a user and using some php scripts
the row's are tagged (turned on and off)

now i want to query e.g. table test_1 which has 1 mio entries
one problem i have to mention is a NULL value
since an entry can have 3 values t,f,null but my application expects only 2 values i modified my queries so that
tag[n]=true equals NOT (tag[n]=false OR tag[n] IS NULL)
the problem was that complex queries just ignored NULL values instead of counting as false - which makes sense


to my index problem:
e.g. a query
select id from test_1 where NOT (tag[4]=false OR tag[4] IS NULL);

doesnt use the index
create index idx_test_1 on test(( NOT (tag[4]=false OR tag[4] IS NULL) ));

since my index can contain combinations of many tag[] columns it can look like this
create index idx_test_2 on test_1(( (  ( NOT (tag[1]=false OR tag[1] IS NULL) ) AND  ( NOT (tag[2]=false OR tag[2] IS NULL) ) )  AND NOT (  ( NOT (tag[1]=false OR tag[1] IS NULL) ) AND  ( NOT (tag[3]=false OR tag[3] IS NULL) ) ) ));

which also doesnt get used by the select
SELECT id FROM test WHERE (  ( NOT (tag[1]=false OR tag[1] IS NULL) ) AND  ( NOT (tag[2]=false OR tag[2] IS NULL) ) )  AND NOT (  ( NOT (tag[1]=false OR tag[1] IS NULL) ) AND  ( NOT (tag[3]=false OR tag[3] IS NULL) ) );

vacuum analyze after create index has been done and im using 8.4

o) is there some obvious bug in my usage so the index can't be used? what im missing here?
o) how does the planer know that a seq scan is faster? since im not sure if its just a feature which is not available (like is null index in 8.2) or im doing something wrong im a bit stuck here
o) and for those who have a great idea or suggestion for alternative solutions, please tell me

thx


Your E-mail and More On-the-Go. Get Windows Live Hotmail Free. Sign up now.

Re: what do i need to know about array index?

От
Craig Ringer
Дата:
On 20/07/10 18:27, Prometheus Prometheus wrote:

What's with the pseudonym?

> to my index problem:
> e.g. a query
> select id from test_1 where NOT (tag[4]=false OR tag[4] IS NULL);
>
> doesnt use the index
> create index idx_test_1 on test(( NOT (tag[4]=false OR tag[4] IS NULL) ));

You should be using:

   tag[4] IS DISTINCT FROM 't';

if you want to index on 'true' vs 'false or null'. Your current
expression is buggy for null tag values, as can be seen by evaluating it
step-by-step.

  tag[4]=false  OR tag[4] IS NULL
  NULL  = false OR NULL   IS NULL
  NULL          OR TRUE
  NULL

Remember, "NULL = false" resolves to NULL, and "NULL OR TRUE" is also NULL.

> since my index can contain combinations of many tag[] columns it can
> look like this
> create index idx_test_2 on test_1(( (  ( NOT (tag[1]=false OR tag[1] IS
> NULL) ) AND  ( NOT (tag[2]=false OR tag[2] IS NULL) ) )  AND NOT (  (
> NOT (tag[1]=false OR tag[1] IS NULL) ) AND  ( NOT (tag[3]=false OR
> tag[3] IS NULL) ) ) ));

It'll help make it easier to maintain and more comprehensible if you
wrap that test up in an IMMUTABLE function. 'LANGUAGE SQL' is convenient
for this.

> which also doesnt get used by the select
> SELECT id FROM test WHERE (  ( NOT (tag[1]=false OR tag[1] IS NULL) )
> AND  ( NOT (tag[2]=false OR tag[2] IS NULL) ) )  AND NOT (  ( NOT
> (tag[1]=false OR tag[1] IS NULL) ) AND  ( NOT (tag[3]=false OR tag[3] IS
> NULL) ) );

and use the IMMUTABLE function in your tests.

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

Re: what do i need to know about array index?

От
Prometheus Prometheus
Дата:
> Date: Wed, 21 Jul 2010 12:38:55 +0800
> From: craig@postnewspapers.com.au
> To: prometheus__0@hotmail.com
> CC: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] what do i need to know about array index?
>
> On 20/07/10 18:27, Prometheus Prometheus wrote:
>
> What's with the pseudonym?
nothing special, just an account from my youth ^^
>
> > to my index problem:
> > e.g. a query
> > select id from test_1 where NOT (tag[4]=false OR tag[4] IS NULL);
> >
> > doesnt use the index
> > create index idx_test_1 on test(( NOT (tag[4]=false OR tag[4] IS NULL) ));
>
> You should be using:
>
> tag[4] IS DISTINCT FROM 't';

nice, i didnt know this one
and it works, the index is used

to give others a hint on this
i created 2 indexes where the first one didnt seem to work
the second worked
to know if the first one is ok i disable sequential scan's and queried the db again
now the index was used
so it seems the planner thinks a seqscan works better which may change in future as more rows are added/changed

set enable_seqscan=off;
explain select * from test where tag[4] IS DISTINCT FROM true;

>
> if you want to index on 'true' vs 'false or null'. Your current
> expression is buggy for null tag values, as can be seen by evaluating it
> step-by-step.
>
> tag[4]=false OR tag[4] IS NULL
> NULL = false OR NULL IS NULL
> NULL OR TRUE
> NULL
>
> Remember, "NULL = false" resolves to NULL, and "NULL OR TRUE" is also NULL.

head -> wall
thx for the hint

>
> > since my index can contain combinations of many tag[] columns it can
> > look like this
> > create index idx_test_2 on test_1(( ( ( NOT (tag[1]=false OR tag[1] IS
> > NULL) ) AND ( NOT (tag[2]=false OR tag[2] IS NULL) ) ) AND NOT ( (
> > NOT (tag[1]=false OR tag[1] IS NULL) ) AND ( NOT (tag[3]=false OR
> > tag[3] IS NULL) ) ) ));
>
> It'll help make it easier to maintain and more comprehensible if you
> wrap that test up in an IMMUTABLE function. 'LANGUAGE SQL' is convenient
> for this.

hmm, i see what you mean
ill have to think about how i can get this into my design
anyway thx again

>
> > which also doesnt get used by the select
> > SELECT id FROM test WHERE ( ( NOT (tag[1]=false OR tag[1] IS NULL) )
> > AND ( NOT (tag[2]=false OR tag[2] IS NULL) ) ) AND NOT ( ( NOT
> > (tag[1]=false OR tag[1] IS NULL) ) AND ( NOT (tag[3]=false OR tag[3] IS
> > NULL) ) );
>
> and use the IMMUTABLE function in your tests.

kind regards

>
> --
> Craig Ringer
>
> Tech-related writing: http://soapyfrogs.blogspot.com/


Hotmail: Free, trusted and rich email service. Get it now.

Re: what do i need to know about array index?

От
Craig Ringer
Дата:
On 21/07/10 15:08, Prometheus Prometheus wrote:

> head -> wall

That's about how I feel about SQL NULLs in general.

They seem like a great idea. A way of representing "unknown" or
"undefined" in a generic, consistent manner.

Or is that "a definite value that means empty or absent" ? Depends on
who you ask, which is the first problem.

Then you get to combining them with anything else in any way. Comparing
row-types or arrays with null entries. Uniqueness. Aggregation/counting.
Conceptual mismatches with the null concept in other languages, where
null/None/nil/undef means "the certain value of nothing or emptiness"
rather than "could be anything or nothing, I dunno" as it does in SQL.

SQL NULL's contradictions often remind me of early formulations of set
theory and their ... issues.

   http://en.wikipedia.org/wiki/Russell%27s_paradox

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/