Обсуждение: Best Fit SQL query statement
All, Could anyone help me in writing Best Fit SQL statement. Suppose we have table t1 with coloumn t1 (text) with following rows. 98456 98457 9845 9846 984 985 98 99 and if I query on 98456 the result must be 98456, However if I query on 98455 the result must be 9845 and If I query 9849 the result must be 984 Regards, Kiran
On Mon, Aug 06, 2007 at 01:57:07AM -0700, Kiran wrote: > Could anyone help me in writing Best Fit SQL statement. > Suppose we have table t1 with coloumn t1 (text) with following rows. > 98456 > 98457 > 9845 > 9846 > 984 > 985 > 98 > 99 > and if I query on 98456 the result must be 98456, > However if I query on 98455 the result must be 9845 > and If I query 9849 the result must be 984 select t1.t1 from t1 where '98456' like t1.t1||'%' order by length(t1.t1) desc limit 1; should be ok. depesz -- quicksil1er: "postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV!" :) http://www.depesz.com/ - blog dla ciebie (i moje CV)
Hi Depesz, I was curious about your solution for Best Fit since I had mine working in a function with a loop: ... FOR v_len IN REVERSE v_max..v_min LOOP v_prefix := substring(v_destino, 1, v_len); SELECT * INTO v_result FROM numeracion WHERE prefijo = v_prefix; IF FOUND THEN RETURN :v_result; END IF; END LOOP; ... Found your query is shorter and clearer, problem is I couldn't have it use an index. Thought it was a locale issue but adding a 2nd index with varchar_pattern_ops made no difference. In result, it turned out to be too slow in comparison to the function. Am I missing something? --- DDL --- rd=# show lc_collate;lc_collate -------------en_US.UTF-8 (1 row) rd=# show client_encoding;client_encoding -----------------SQL_ASCII (1 row) rd=# show server_encoding;server_encoding -----------------SQL_ASCII (1 row) rd=# \d numeracion Table "public.numeracion" Column | Type | Modifiers -------------+-----------------------------+---------------cod_oper | integer |servicio | text | not nullmodalidad | text | not nulllocalidad | text | not nullindicativo | text | not nullbloque | text | not nullresolucion | text |fecha | date | not nullprefijo | text | not nulllargo | integer |fecha_carga | timestamp without time zone | defaultnow() Indexes: "pk_numeracion" PRIMARY KEY, btree (prefijo) "idx_numeracion_prefijo" btree (prefijo varchar_pattern_ops) Foreign-key constraints: "fk_numeracion_operadores_cod_oper" FOREIGN KEY (cod_oper) REFERENCES operadores(cod_oper) rd=# set enable_seqscan = off; SET rd=# explain select prefijo rd-# FROM numeracion rd-# WHERE '3514269565' LIKE prefijo || '%' rd-# ORDER BY LENGTH(prefijo) DESC rd-# LIMIT 1; QUERY PLAN ---------------------------------------------------------------------------- Limit (cost=100001077.54..100001077.54 rows=1 width=89) -> Sort (cost=100001077.54..100001077.91 rows=151 width=89) Sort Key: length(prefijo) -> Seq Scan on numeracion (cost=100000000.00..100001072.07 rows=151 width=89) Filter: ('3514269565'::text ~~ (prefijo || '%'::text)) Why I am getting these monstrous costs? Table had been vacuumed full just before running the explain plan. It has ~31k rows. Any hindsight will be greatly appreciated. Regards, Fernando. -----Mensaje original----- De: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] En nombre de hubert depesz lubaczewski Enviado el: Viernes, 10 de Agosto de 2007 05:00 Para: Kiran CC: pgsql-sql@postgresql.org Asunto: Re: [SQL] Best Fit SQL query statement On Mon, Aug 06, 2007 at 01:57:07AM -0700, Kiran wrote: > Could anyone help me in writing Best Fit SQL statement. > Suppose we have table t1 with coloumn t1 (text) with following rows. > 98456 > 98457 > 9845 > 9846 > 984 > 985 > 98 > 99 > and if I query on 98456 the result must be 98456, > However if I query on 98455 the result must be 9845 > and If I query 9849 the result must be 984 select t1.t1 from t1 where '98456' like t1.t1||'%' order by length(t1.t1) desc limit 1; should be ok. depesz -- quicksil1er: "postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV!" :) http://www.depesz.com/ - blog dla ciebie (i moje CV) ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
On Fri, Aug 10, 2007 at 04:40:34PM -0300, Fernando Hevia wrote: > Found your query is shorter and clearer, problem is I couldn't have it use > an index. Thought it was a locale issue but adding a 2nd index with > varchar_pattern_ops made no difference. > In result, it turned out to be too slow in comparison to the function. Am I > missing something? > rd=# explain select prefijo > rd-# FROM numeracion > rd-# WHERE '3514269565' LIKE prefijo || '%' > rd-# ORDER BY LENGTH(prefijo) DESC > rd-# LIMIT 1; unfortunatelly this query will be hard to optimize. i guess that functional approach will be the fastest, but you can try with something like this: select prefijo from numeracion where prefijo in ( select substr('3514269565',1,i) from generate_series(1, length('3514269565')) i ) order by length(prefijo) desc LIMIT 1; it should be faster then the previous approach, but it will most probably not be as fast as function. depesz -- quicksil1er: "postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV!" :) http://www.depesz.com/ - blog dla ciebie (i moje CV)
On 8/10/07, hubert depesz lubaczewski <depesz@depesz.com> wrote: > unfortunatelly this query will be hard to optimize. Uh, how about SELECT MAX(t1) FROM t1WHERE '9849' LIKE t1 || '%';
On Fri, Aug 10, 2007 at 08:13:46PM -0500, Rodrigo De León wrote: > On 8/10/07, hubert depesz lubaczewski <depesz@depesz.com> wrote: > > unfortunatelly this query will be hard to optimize. > > Uh, how about > > SELECT MAX(t1) > FROM t1 > WHERE '9849' LIKE t1 || '%'; it will not help much as the main burden is the where clause, which is not indexable. depesz -- quicksil1er: "postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV!" :) http://www.depesz.com/ - blog dla ciebie (i moje CV)
De: hubert depesz lubaczewski [mailto:depesz@depesz.com] >>On Fri, Aug 10, 2007 at 04:40:34PM -0300, Fernando Hevia wrote: >> Found your query is shorter and clearer, problem is I couldn't have it use >> an index. Thought it was a locale issue but adding a 2nd index with >> varchar_pattern_ops made no difference. >> In result, it turned out to be too slow in comparison to the function. Am I >> missing something? >> rd=# explain select prefijo >> rd-# FROM numeracion >> rd-# WHERE '3514269565' LIKE prefijo || '%' >> rd-# ORDER BY LENGTH(prefijo) DESC >> rd-# LIMIT 1; > unfortunatelly this query will be hard to optimize. > i guess that functional approach will be the fastest, but you can try > with something like this: > > select prefijo > from numeracion > where prefijo in ( > select substr('3514269565',1,i) > from generate_series(1, length('3514269565')) i > ) > order by length(prefijo) desc LIMIT 1; > >it should be faster then the previous approach, but it will most >probably not be as fast as function. Actually, I find this variant nearly as fast as the function. The generate_series can be limited to known minimum and maximum prefix lengths in order to speed up the query a bit more. Works quite well. Cheers, Fernando.