Обсуждение: Functional Index Question
I am considering the utility value of creating a functional index on a name field. To minimize the number of invalid searches caused by spacing errors and mis-matched lettercase I am contemplating doing something like this: CREATE UNIQUE INDEX idxUF_table_column ON table (lower(trim(both ' ' from(regexp_replace(<column>, /( ){2,}/g," " ))))) What I intend this to do is to squeeze out excess whitespace, strip off leading and trailing blanks, and then force the whole thing to lowercase. Is this idea worth pursuing and, if it is, is my regexp correct for the purpose intended? Not a regexp guy. -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
On Wed, Mar 12, 2008 at 11:46:12AM -0400, James B. Byrne wrote: > CREATE UNIQUE INDEX idxUF_table_column ON table > (lower(trim(both ' ' from(regexp_replace(<column>, /( ){2,}/g," " ))))) > What I intend this to do is to squeeze out excess whitespace, strip off > leading and trailing blanks, and then force the whole thing to lowercase. > Is this idea worth pursuing and, if it is, is my regexp correct for the > purpose intended? if you'd try the query you would see instantly: # select lower(trim(both ' ' from(regexp_replace(' depeSz hub ', /( ){2,}/g," " )))); ERROR: syntax error at or near "/" LINE 1: ...im(both ' ' from(regexp_replace(' depeSz hub ', /( ){2,}/g... correct way: select lower(trim(both ' ' from(regexp_replace(' depeSz hub ', E'\\s+', ' ', 'g' )))); now. i would suggest *not* to use this as base for index. make a wrapper function istead: create function cleaned(text) returns text as $BODY$ select lower(trim(both ' ' from(regexp_replace($1, E'\\s+', ' ', 'g' )))); $BODY$ language sql immutable; now you can simply: create unique index xxx on table ( cleaned(<column>) ); plus your sql's will look saner. instead of: select * from table where lower(trim(both ' ' from(regexp_replace(<column>, E'\\s+', ' ', 'g' )))) = lower(trim(both ' 'from(regexp_replace('some_string', E'\\s+', ' ', 'g' )))); you will have: select * from table where cleaned(<field>) = cleaned('some_string'); depesz -- quicksil1er: "postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV!" :) http://www.depesz.com/ - blog dla ciebie (i moje CV)
On Wed, March 12, 2008 17:02, hubert depesz lubaczewski wrote: > correct way: > select lower(trim(both ' ' from(regexp_replace(' depeSz hub ', E'\\s+', ' ', > 'g' )))); > > now. i would suggest *not* to use this as base for index. > > make a wrapper function instead: > > create function cleaned(text) returns text as $BODY$ > select lower(trim(both ' ' from(regexp_replace($1, E'\\s+', ' ', 'g' )))); > $BODY$ language sql immutable; > > now you can simply: > create unique index xxx on table ( cleaned(<column>) ); > plus your sql's will look saner. > instead of: > select * from table where lower(trim(both ' ' from(regexp_replace(<column>, > E'\\s+', ' ', 'g' )))) = lower(trim(both ' ' > from(regexp_replace('some_string', E'\\s+', ' ', 'g' )))); > you will have: > select * from table where cleaned(<field>) = cleaned('some_string'); > Thank you very much. I really appreciate this. To return to my first question. Is doing this sort of thing considered good a practice? I am prototyping in Rails a replacement system for one implemented with a CODASYL shallow network DBSM. In Rails the paradigm seems to be to keep this kind of thing out of the database and place it in the application code. With the existing system we had no choice in the matter but I am considering moving theses sorts of purification routines into the DBMS because I feel that is where they really belong. However, the prevailing sentiment of the community surrounding Rails seems to hold otherwise so I wonder if this is really the right thing to do. Regards, -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
On Thu, Mar 13, 2008 at 09:04:28AM -0400, James B. Byrne wrote: > To return to my first question. Is doing this sort of thing considered good a > practice? for me - yes. keeping it in database is the best and safest option. but you might get "issues" with orms. > where they really belong. However, the prevailing sentiment of the community > surrounding Rails seems to hold otherwise so I wonder if this is really the > right thing to do. every (or nearly every) community around web-frameworks loves orms, and thus doesn't like "too smart" databases. depesz -- quicksil1er: "postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV!" :) http://www.depesz.com/ - blog dla ciebie (i moje CV)
James B. Byrne wrote: > I am considering moving > theses sorts of purification routines into the DBMS because I feel that is > where they really belong. However, the prevailing sentiment of the community > surrounding Rails seems to hold otherwise so I wonder if this is really the > right thing to do. > They also like to use simple, dumb, unsafe databases that're really fast for simple queries. Many web apps consider the database their private stomping ground - a more covenient and faster way of storing tabular (but not necessarily even truly relational) data than flat files. They often seem to shun powerful SQL and the use of data structures that can also be understood and used by other applications. I suspect some of this grew out of the primitive OSS databases available when a lot of the big open source web apps got going, and it's thinking that's become rather prevalent in the community. I could understand it if database portability was a goal, because then you really do often want to keep things as dumb as possible. I guess the ones that aren't seeking database portability must be trying to still work on MySQL 3 & 4 , which means "use as few database features as possible". To me it seems obvious that such routines should be in the database, but I guess that's partly because of the sort of work I'm doing. Other apps can then use the in-DB routines, rather than having to reimplement them (read: fewer things to test and debug) . They can be cached intellgently by the DB server and taken account of in query planning. The DB can help intelligently manage concurrency. Db<->client round trips may be reduced. For simple immutable functions you can rely on them being available for any database you'd actually care to use. Then again, that kind of thinking is part of why I'm not much of a fan of SQL-abstracted web app frameworks. -- Craig Ringer
functions, replication and portability was: Functional Index Question
От
Ivan Sergio Borgonovo
Дата:
On Fri, 14 Mar 2008 02:26:06 +0900 Craig Ringer <craig@postnewspapers.com.au> wrote: > To me it seems obvious that such routines should be in the > database, but I guess that's partly because of the sort of work I'm > doing. Other apps can then use the in-DB routines, rather than To me too but... > Then again, that kind of thinking is part of why I'm not much of a > fan of SQL-abstracted web app frameworks. Maybe because there is not too many choices to chose from. At least this is what is driving me crazy... I really would like to build up portable, agnostic apps... but then I've to renounce to toooooo many features even the one that rely on standards and not on peculiarities of implementation... then... ...but oooh that makes everything non portable... and after reading a bit pgpool docs it seems non-scalable (am I missing something?). But still what other viable choices could I make to keep my data coherent and sane without doing tons of bookkeeping? -- Ivan Sergio Borgonovo http://www.webthatworks.it