Обсуждение: obtaining ARRAY position for a given match

Поиск
Список
Период
Сортировка

obtaining ARRAY position for a given match

От
Pedro Doria Meunier
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

I'm trying to get the array position for a given match as thus:

This gets me the record for a given match:
SELECT *  FROM garmin_units WHERE 'L' = ANY (protocol_tag);

Ok. so far so good...
But what about getting the array position at which 'L' is stored?

Searching the Postgresql docs gives me no answer... :(
Am I on a wild goose chase?

Any insight highly appreciated ;)

BR,
Pedro Doria Meunier.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAksFP6IACgkQ2FH5GXCfxAuasgCgu/d68fkg16r1OF/2QSLnmwhW
gjYAniyQ1Mn/72323NSznxgakF4dn98k
=tWbI
-----END PGP SIGNATURE-----


Re: obtaining ARRAY position for a given match

От
Scott Bailey
Дата:
Pedro Doria Meunier wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi,
>
> I'm trying to get the array position for a given match as thus:
>
> This gets me the record for a given match:
> SELECT *  FROM garmin_units WHERE 'L' = ANY (protocol_tag);
>
> Ok. so far so good...
> But what about getting the array position at which 'L' is stored?
>
> Searching the Postgresql docs gives me no answer... :(
> Am I on a wild goose chase?
>
> Any insight highly appreciated ;)
>
> BR,
> Pedro Doria Meunier.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
> Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAksFP6IACgkQ2FH5GXCfxAuasgCgu/d68fkg16r1OF/2QSLnmwhW
> gjYAniyQ1Mn/72323NSznxgakF4dn98k
> =tWbI
> -----END PGP SIGNATURE-----

I wrote this a while ago. If you are on 8.4 use unnest instead. And if
you are searching thru really big arrays, use plpgsql so you can
terminate when you find a match.

CREATE OR REPLACE FUNCTION idx(text[], text)
RETURNS int AS
$$
SELECT MIN(CASE WHEN $1[i] = $2 THEN i
ELSE NULL END)::int
FROM generate_series(array_lower($1, 1),
array_upper($1, 1)) i;
$$
LANGUAGE 'sql' IMMUTABLE STRICT;

Re: obtaining ARRAY position for a given match

От
Pavel Stehule
Дата:
2009/11/19 Scott Bailey <artacus@comcast.net>:
> Pedro Doria Meunier wrote:
>>
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> Hi,
>>
>> I'm trying to get the array position for a given match as thus:
>>
>> This gets me the record for a given match:
>> SELECT *  FROM garmin_units WHERE 'L' = ANY (protocol_tag);
>>
>> Ok. so far so good...
>> But what about getting the array position at which 'L' is stored?
>>
>> Searching the Postgresql docs gives me no answer... :(
>> Am I on a wild goose chase?
>>
>> Any insight highly appreciated ;)
>>
>> BR,
>> Pedro Doria Meunier.
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.9 (GNU/Linux)
>> Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/
>>
>> iEYEARECAAYFAksFP6IACgkQ2FH5GXCfxAuasgCgu/d68fkg16r1OF/2QSLnmwhW
>> gjYAniyQ1Mn/72323NSznxgakF4dn98k
>> =tWbI
>> -----END PGP SIGNATURE-----
>
> I wrote this a while ago. If you are on 8.4 use unnest instead. And if you
> are searching thru really big arrays, use plpgsql so you can terminate when
> you find a match.
>
> CREATE OR REPLACE FUNCTION idx(text[], text)
> RETURNS int AS
> $$
> SELECT MIN(CASE WHEN $1[i] = $2 THEN i
> ELSE NULL END)::int
> FROM generate_series(array_lower($1, 1),
> array_upper($1, 1)) i;
> $$
> LANGUAGE 'sql' IMMUTABLE STRICT;
>

Hello

it should be little bit more effective:

CREATE OR REPLACE FUNCTION idx(anyarray, anyelement)
RETURNS int AS $$
SELECT i
   FROM generate_series(array_lover($1,1),array_upper($1,1)) g(i)
  WHERE $1[i] = $2
  UNION ALL
  SELECT 0  -- return 0 as not found
  LIMIT 1; -- stop after first match
$$ LANGUAGE sql;

Regards
Pavel Stehule

> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: obtaining ARRAY position for a given match

От
Sam Mason
Дата:
On Thu, Nov 19, 2009 at 05:24:33PM +0100, Pavel Stehule wrote:
> it should be little bit more effective:

I'm not sure if it will be much more; when you put a set returning
function into a FROM clause PG will always run the function to
completion---as far as I know, but I've only got 8.3 for testing at the
moment.  I'm also not sure why you want to return zero when you don't
find the element.  The code also exploits an implementation artifact of
PG that the zero (i.e. the RHS of your UNION ALL) will be "after" the
real index.

This raises a small and interesting optimization for PG, when it does
the plan it could notice that a UNION ALL followed by a LIMIT won't need
to return all rows and hence it may be better to run the "quicker" one
first.  Or would this end up breaking more code than it helps?

> CREATE OR REPLACE FUNCTION idx(anyarray, anyelement)
> RETURNS int AS $$
> SELECT i
>    FROM generate_series(array_lover($1,1),array_upper($1,1)) g(i)

Quality typo :)                  ^^^

>   WHERE $1[i] = $2
>   UNION ALL
>   SELECT 0  -- return 0 as not found
>   LIMIT 1; -- stop after first match
> $$ LANGUAGE sql;

I'd do something like:

  CREATE OR REPLACE FUNCTION firstidx(anyarray, anyelement)
      RETURNS int AS $$
    SELECT i FROM (
      SELECT generate_series(array_lower($1,1),array_upper($1,1))) g(i)
    WHERE $1[i] = $2
    LIMIT 1;
  $$ LANGUAGE sql IMMUTABLE;

You can replace the call to array_upper with some large number to check
either function's behavior with large arrays.

--
  Sam  http://samason.me.uk/

Re: obtaining ARRAY position for a given match

От
Scott Bailey
Дата:
>    FROM generate_series(array_lover($1,1),array_upper($1,1)) g(i)

Pavel,

Don't get me wrong, I enjoy coding, but I think you've taken it too far
here ;)

Yes, definitely more effective for large arrays. Thanks. Would probably
be a good snippet for the wiki.

Scott



Re: obtaining ARRAY position for a given match

От
Merlin Moncure
Дата:
On Thu, Nov 19, 2009 at 12:15 PM, Scott Bailey <artacus@comcast.net> wrote:
>>   FROM generate_series(array_lover($1,1),array_upper($1,1)) g(i)
>
> Pavel,
>
> Don't get me wrong, I enjoy coding, but I think you've taken it too far here
> ;)
>
> Yes, definitely more effective for large arrays. Thanks. Would probably be a
> good snippet for the wiki.

this was actually a pretty typical solution to dealing with arrays
until we got 'unnest()'.  See information_schema._pg_expand_array for
example.

I need this functionality quite often...maybe we could use a version
of unnest that works like that  (returns idx, elem)?  It would be a
small efficiency win over generate_series based approaches.

merlin

Re: obtaining ARRAY position for a given match

От
Scott Bailey
Дата:
Sam Mason wrote:
> On Thu, Nov 19, 2009 at 05:24:33PM +0100, Pavel Stehule wrote:
>> it should be little bit more effective:
>
> I'm not sure if it will be much more; when you put a set returning
> function into a FROM clause PG will always run the function to
> completion---as far as I know, but I've only got 8.3 for testing at the
> moment.  I'm also not sure why you want to return zero when you don't
> find the element.  The code also exploits an implementation artifact of
> PG that the zero (i.e. the RHS of your UNION ALL) will be "after" the
> real index.
>
> This raises a small and interesting optimization for PG, when it does
> the plan it could notice that a UNION ALL followed by a LIMIT won't need
> to return all rows and hence it may be better to run the "quicker" one
> first.  Or would this end up breaking more code than it helps?
>
>> CREATE OR REPLACE FUNCTION idx(anyarray, anyelement)
>> RETURNS int AS $$
>> SELECT i
>>    FROM generate_series(array_lover($1,1),array_upper($1,1)) g(i)
>
> Quality typo :)                  ^^^
>
>>   WHERE $1[i] = $2
>>   UNION ALL
>>   SELECT 0  -- return 0 as not found
>>   LIMIT 1; -- stop after first match
>> $$ LANGUAGE sql;
>
> I'd do something like:
>
>   CREATE OR REPLACE FUNCTION firstidx(anyarray, anyelement)
>       RETURNS int AS $$
>     SELECT i FROM (
>       SELECT generate_series(array_lower($1,1),array_upper($1,1))) g(i)
>     WHERE $1[i] = $2
>     LIMIT 1;
>   $$ LANGUAGE sql IMMUTABLE;
>
> You can replace the call to array_upper with some large number to check
> either function's behavior with large arrays.

I agree that it should return null when the item is not found. So I
tested both and Sam is correct. His function performs the same whether
there are 500 elements or 50,000.

We had an idx() function in the _int contrib module. I wonder if it
would be useful to write this in C now that _int is deprecated?

Scott



Re: obtaining ARRAY position for a given match

От
Sam Mason
Дата:
On Thu, Nov 19, 2009 at 09:46:42AM -0800, Scott Bailey wrote:
> We had an idx() function in the _int contrib module. I wonder if it
> would be useful to write this in C now that _int is deprecated?

Is "idx" really the best name for this? there could be multiple
occurrences of a value in an array (i.e. it's not a set) and hence why I
used "firstidx" for the function name.  If it's replacing an existing
function, then compatibility is a good reason.

--
  Sam  http://samason.me.uk/

Re: obtaining ARRAY position for a given match

От
Scott Bailey
Дата:
> this was actually a pretty typical solution to dealing with arrays
> until we got 'unnest()'.  See information_schema._pg_expand_array for
> example.

Oh I know. I was just having a laugh at the array_lover function. Now
that I think about it, we could replace array_agg() with array_orgy()
and unnest() with oh_crap_her_husband_is_home()... I'll probably get in
trouble for that.

Scott

Re: obtaining ARRAY position for a given match

От
Sam Mason
Дата:
On Thu, Nov 19, 2009 at 12:43:38PM -0500, Merlin Moncure wrote:
> we could use a version
> of unnest that works like that  (returns idx, elem)?  It would be a
> small efficiency win over generate_series based approaches.

What would "idx" look like for multidimensional arrays?  I think PG
needs a sensible way of dealing with these before you attempt to modify
unnest.

--
  Sam  http://samason.me.uk/

Re: obtaining ARRAY position for a given match

От
Pavel Stehule
Дата:
2009/11/19 Sam Mason <sam@samason.me.uk>:
> On Thu, Nov 19, 2009 at 05:24:33PM +0100, Pavel Stehule wrote:
>> it should be little bit more effective:
>
> I'm not sure if it will be much more; when you put a set returning
> function into a FROM clause PG will always run the function to
> completion---as far as I know, but I've only got 8.3 for testing at the
> moment.

yes, but generate_series is very cheap, this is protection before not
necessary string equation.

 I'm also not sure why you want to return zero when you don't
> find the element.  The code also exploits an implementation artifact of
> PG that the zero (i.e. the RHS of your UNION ALL) will be "after" the
> real index.
>

this is only convention. Somebody like 0 or -1 as result. Somebody can
live with NULL.

> This raises a small and interesting optimization for PG, when it does
> the plan it could notice that a UNION ALL followed by a LIMIT won't need
> to return all rows and hence it may be better to run the "quicker" one
> first.  Or would this end up breaking more code than it helps?
>
>> CREATE OR REPLACE FUNCTION idx(anyarray, anyelement)
>> RETURNS int AS $$
>> SELECT i
>>    FROM generate_series(array_lover($1,1),array_upper($1,1)) g(i)
>
> Quality typo :)                  ^^^
>
>>   WHERE $1[i] = $2
>>   UNION ALL
>>   SELECT 0  -- return 0 as not found
>>   LIMIT 1; -- stop after first match
>> $$ LANGUAGE sql;
>
> I'd do something like:
>
>  CREATE OR REPLACE FUNCTION firstidx(anyarray, anyelement)
>      RETURNS int AS $$
>    SELECT i FROM (
>      SELECT generate_series(array_lower($1,1),array_upper($1,1))) g(i)
>    WHERE $1[i] = $2
>    LIMIT 1;
>  $$ LANGUAGE sql IMMUTABLE;
>
> You can replace the call to array_upper with some large number to check
> either function's behavior with large arrays.
>

your code is very very exactly same as my code. First - there are
flattening stage. So if you don't use offset 0, then subselect is
transformed to select. I am not sure, if offset 0 should help here -
it have to do a materialisation (5ms for 10000 items) more. This
function is relative fast:

postgres=# select idx(array(select generate_series(1,10000)),10000);
  idx
-------
 10000
(1 row)

Time: 40.070 ms

maybe - I cannot test it - there could be code

CREATE OR REPLACE FUNCTION idx(anyarray, anyelement)
RETURNS int AS $$
SELECT i
   FROM (SELECT generate_subscripts($1) as i, unnest($1) as v) s
  WHERE v = $2
  LIMIT 1;
$$ LANGUAGE sql;

but I am sure so C code should be faster

Regards
Pavel Stehule

> --
>  Sam  http://samason.me.uk/
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: obtaining ARRAY position for a given match

От
Scott Bailey
Дата:
Sam Mason wrote:
> On Thu, Nov 19, 2009 at 09:46:42AM -0800, Scott Bailey wrote:
>> We had an idx() function in the _int contrib module. I wonder if it
>> would be useful to write this in C now that _int is deprecated?
>
> Is "idx" really the best name for this? there could be multiple
> occurrences of a value in an array (i.e. it's not a set) and hence why I
> used "firstidx" for the function name.  If it's replacing an existing
> function, then compatibility is a good reason.

Well I used idx() because there was already a idx(int[], int) function
with the _int contrib module. Obviously "index" is out of the question.
In other languages, it is assumed you are looking for the first index.
Some allow you to specify an offset to begin searching at. And some
provide another function to get the last index of element.

Here is what other languages are using for similar concept.
PHP - array_search()
Python - index()
Ruby - index()/rindex()
Java - binarySearch()
JavaScript/ActionScript - indexOf()
MySQL - find_in_set()

Scott



Re: obtaining ARRAY position for a given match

От
Sam Mason
Дата:
On Thu, Nov 19, 2009 at 10:47:02AM -0800, Scott Bailey wrote:
> Sam Mason wrote:
> >Is "idx" really the best name for this?
>
> Well I used idx() because there was already a idx(int[], int) function
> with the _int contrib module.

I don't remember ever using that before, hence my question.

> In other languages, it is assumed you are looking for the first index.

Huh, they seem to don't they, even my old stalwart of pedantry, Haskell,
follows form here.  Not sure why I'd never noticed before, "idx" is
looking more and more sensible!

--
  Sam  http://samason.me.uk/

Re: obtaining ARRAY position for a given match

От
Pavel Stehule
Дата:
>
> CREATE OR REPLACE FUNCTION idx(anyarray, anyelement)
> RETURNS int AS $$
> SELECT i
>   FROM (SELECT generate_subscripts($1) as i, unnest($1) as v) s
>  WHERE v = $2
>  LIMIT 1;
> $$ LANGUAGE sql;
>

there is bug

correct is

create or replace function idx(anyarray, anyelement)
returns int as $$
select i
   from (select generate_subscripts($1,1) i, unnest($1) v) s
  where v = $2
  limit 1
$$ language sql;

and it is 5% faster than older version

on integer array on my eeepc

Regards
Pavel

Re: obtaining ARRAY position for a given match

От
Pedro Doria Meunier
Дата:
Regarding this thread...
I've been away for a while...
But thank you to all who replied! :)

BR,
Pedro

On 11/19/2009 07:03 PM, Sam Mason wrote:
> On Thu, Nov 19, 2009 at 10:47:02AM -0800, Scott Bailey wrote:
>
>> Sam Mason wrote:
>>
>>> Is "idx" really the best name for this?
>>>
>> Well I used idx() because there was already a idx(int[], int) function
>> with the _int contrib module.
>>
> I don't remember ever using that before, hence my question.
>
>
>> In other languages, it is assumed you are looking for the first index.
>>
> Huh, they seem to don't they, even my old stalwart of pedantry, Haskell,
> follows form here.  Not sure why I'd never noticed before, "idx" is
> looking more and more sensible!
>
>