Обсуждение: Functional Index Question

Поиск
Список
Период
Сортировка

Functional Index Question

От
"James B. Byrne"
Дата:
I am considering the utility value of creating a functional index on a name
field.  To minimize the number of invalid searches caused by spacing errors
and mis-matched lettercase I am contemplating doing something like this:

CREATE UNIQUE INDEX idxUF_table_column ON table
  (lower(trim(both ' ' from(regexp_replace(<column>, /( ){2,}/g," " )))))

What I intend this to do is to squeeze out excess whitespace, strip off
leading and trailing blanks, and then force the whole thing to lowercase.

Is this idea worth pursuing and, if it is, is my regexp correct for the
purpose intended?

Not a regexp guy.

--
***          E-Mail is NOT a SECURE channel          ***
James B. Byrne                mailto:ByrneJB@Harte-Lyne.ca
Harte & Lyne Limited          http://www.harte-lyne.ca
9 Brockley Drive              vox: +1 905 561 1241
Hamilton, Ontario             fax: +1 905 561 0757
Canada  L8E 3C3


Re: Functional Index Question

От
hubert depesz lubaczewski
Дата:
On Wed, Mar 12, 2008 at 11:46:12AM -0400, James B. Byrne wrote:
> CREATE UNIQUE INDEX idxUF_table_column ON table
>   (lower(trim(both ' ' from(regexp_replace(<column>, /( ){2,}/g," " )))))
> What I intend this to do is to squeeze out excess whitespace, strip off
> leading and trailing blanks, and then force the whole thing to lowercase.
> Is this idea worth pursuing and, if it is, is my regexp correct for the
> purpose intended?

if you'd try the query you would see instantly:
# select lower(trim(both ' ' from(regexp_replace(' depeSz   hub ', /( ){2,}/g," " ))));
ERROR:  syntax error at or near "/"
LINE 1: ...im(both ' ' from(regexp_replace(' depeSz   hub ', /( ){2,}/g...

correct way:
select lower(trim(both ' ' from(regexp_replace(' depeSz   hub ', E'\\s+', ' ', 'g' ))));

now. i would suggest *not* to use this as base for index.

make a wrapper function istead:

create function cleaned(text) returns text as $BODY$
select lower(trim(both ' ' from(regexp_replace($1, E'\\s+', ' ', 'g' ))));
$BODY$ language sql immutable;

now you can simply:
create unique index xxx on table ( cleaned(<column>) );
plus your sql's will look saner.
instead of:
select * from table where lower(trim(both ' ' from(regexp_replace(<column>, E'\\s+', ' ', 'g' )))) = lower(trim(both '
'from(regexp_replace('some_string', E'\\s+', ' ', 'g' )))); 
you will have:
select * from table where cleaned(<field>) = cleaned('some_string');

depesz

--
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

Re: Functional Index Question

От
"James B. Byrne"
Дата:
On Wed, March 12, 2008 17:02, hubert depesz lubaczewski wrote:
> correct way:
> select lower(trim(both ' ' from(regexp_replace(' depeSz   hub ', E'\\s+', ' ',
> 'g' ))));
>
> now. i would suggest *not* to use this as base for index.
>
> make a wrapper function instead:
>
> create function cleaned(text) returns text as $BODY$
> select lower(trim(both ' ' from(regexp_replace($1, E'\\s+', ' ', 'g' ))));
> $BODY$ language sql immutable;
>
> now you can simply:
> create unique index xxx on table ( cleaned(<column>) );
> plus your sql's will look saner.
> instead of:
> select * from table where lower(trim(both ' ' from(regexp_replace(<column>,
> E'\\s+', ' ', 'g' )))) = lower(trim(both ' '
> from(regexp_replace('some_string', E'\\s+', ' ', 'g' ))));
> you will have:
> select * from table where cleaned(<field>) = cleaned('some_string');
>

Thank you very much.  I really appreciate this.

To return to my first question.  Is doing this sort of thing considered good a
practice?

I am prototyping in Rails a replacement system for one implemented with a
CODASYL shallow network DBSM.  In Rails the paradigm seems to be to keep this
kind of thing out of the database and place it in the application code. With
the existing system we had no choice in the matter but I am considering moving
theses sorts of purification routines into the DBMS because I feel that is
where they really belong.  However, the prevailing sentiment of the community
surrounding Rails seems to hold otherwise so I wonder if this is really the
right thing to do.

Regards,


--
***          E-Mail is NOT a SECURE channel          ***
James B. Byrne                mailto:ByrneJB@Harte-Lyne.ca
Harte & Lyne Limited          http://www.harte-lyne.ca
9 Brockley Drive              vox: +1 905 561 1241
Hamilton, Ontario             fax: +1 905 561 0757
Canada  L8E 3C3


Re: Functional Index Question

От
hubert depesz lubaczewski
Дата:
On Thu, Mar 13, 2008 at 09:04:28AM -0400, James B. Byrne wrote:
> To return to my first question.  Is doing this sort of thing considered good a
> practice?

for me - yes. keeping it in database is the best and safest option.
but you might get "issues" with orms.

> where they really belong.  However, the prevailing sentiment of the community
> surrounding Rails seems to hold otherwise so I wonder if this is really the
> right thing to do.

every (or nearly every) community around web-frameworks loves orms, and
thus doesn't like "too smart" databases.

depesz
--
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)

Re: Functional Index Question

От
Craig Ringer
Дата:
James B. Byrne wrote:

> I am considering moving
> theses sorts of purification routines into the DBMS because I feel that is
> where they really belong.  However, the prevailing sentiment of the community
> surrounding Rails seems to hold otherwise so I wonder if this is really the
> right thing to do.
>
They also like to use simple, dumb, unsafe databases that're really fast
for simple queries.

Many web apps consider the database their private stomping ground - a
more covenient and faster way of storing tabular (but not necessarily
even truly relational) data than flat files. They often seem to shun
powerful SQL and the use of data structures that can also be understood
and used by other applications.

I suspect some of this grew out of the primitive OSS databases available
when a lot of the big open source web apps got going, and it's thinking
that's become rather prevalent in the community.

I could understand it if database portability was a goal, because then
you really do often want to keep things as dumb as possible. I guess the
ones that aren't seeking database portability must be trying to still
work on MySQL 3 & 4 , which means "use as few database features as
possible".

To me it seems obvious that such routines should be in the database, but
I guess that's partly because of the sort of work I'm doing. Other apps
can then use the in-DB routines, rather than having to reimplement them
(read: fewer things to test and debug) . They can be cached intellgently
by the DB server and taken account of in query planning. The DB can help
intelligently manage concurrency. Db<->client round trips may be
reduced. For simple immutable functions you can rely on them being
available for any database you'd actually care to use.

Then again, that kind of thinking is part of why I'm not much of a fan
of SQL-abstracted web app frameworks.

--
Craig Ringer

functions, replication and portability was: Functional Index Question

От
Ivan Sergio Borgonovo
Дата:
On Fri, 14 Mar 2008 02:26:06 +0900
Craig Ringer <craig@postnewspapers.com.au> wrote:

> To me it seems obvious that such routines should be in the
> database, but I guess that's partly because of the sort of work I'm
> doing. Other apps can then use the in-DB routines, rather than

To me too but...

> Then again, that kind of thinking is part of why I'm not much of a
> fan of SQL-abstracted web app frameworks.

Maybe because there is not too many choices to chose from.
At least this is what is driving me crazy... I really would like to
build up portable, agnostic apps... but then I've to renounce to
toooooo many features even the one that rely on standards and not on
peculiarities of implementation... then...

...but oooh that makes everything non portable... and after reading a
bit pgpool docs it seems non-scalable (am I missing something?).

But still what other viable choices could I make to keep my data
coherent and sane without doing tons of bookkeeping?


--
Ivan Sergio Borgonovo
http://www.webthatworks.it