Обсуждение: ilike multi-byte pattern cache
The attached patch implements a one-value pattern cache for the multi-byte encoding case for ILIKE. This reduces calls to lower() by (50% -1) in the common case where the pattern is a constant. My own testing and Guillaume Smet's show that this cuts roughly in half the performance penalty we inflicted by using lower() in that case. Is this sufficiently low risk to sneak into 8.3? cheers andrew Index: src/backend/utils/adt/like.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/adt/like.c,v retrieving revision 1.71 diff -c -r1.71 like.c *** src/backend/utils/adt/like.c 22 Sep 2007 03:58:34 -0000 1.71 --- src/backend/utils/adt/like.c 22 Sep 2007 12:16:23 -0000 *************** *** 139,144 **** --- 139,149 ---- *p; int slen, plen; + static char patcache[512], lpatcache[512]; + static int patcachelen = 0, lpatcachelen = 0; + + p = VARDATA_ANY(pat); + plen = VARSIZE_ANY_EXHDR(pat); /* For efficiency reasons, in the single byte case we don't call * lower() on the pattern and text, but instead call to_lower on each *************** *** 147,156 **** if (pg_database_encoding_max_length() > 1) { /* lower's result is never packed, so OK to use old macros here */ - pat = DatumGetTextP(DirectFunctionCall1(lower, PointerGetDatum(pat))); - p = VARDATA(pat); - plen = (VARSIZE(pat) - VARHDRSZ); str = DatumGetTextP(DirectFunctionCall1(lower, PointerGetDatum(str))); s = VARDATA(str); slen = (VARSIZE(str) - VARHDRSZ); --- 152,192 ---- if (pg_database_encoding_max_length() > 1) { + if (plen > 0 && plen == patcachelen && strncmp(p,patcache,plen) == 0) + { + p = lpatcache; + plen = lpatcachelen; + } + else + { + char *lp; + int lplen; + + pat = DatumGetTextP(DirectFunctionCall1(lower, + PointerGetDatum(pat))); + + /* lower's result is never packed, so OK to use old macros here */ + lp = VARDATA(pat); + lplen = (VARSIZE(pat) - VARHDRSZ); + + if (plen < 512 && lplen < 512) + { + patcachelen = plen; + lpatcachelen = lplen; + memcpy(patcache,p,plen); + memcpy(lpatcache,lp,lplen); + } + else + { + patcachelen = 0; + lpatcachelen = 0; + } + + p = lp; + plen = lplen; + } + /* lower's result is never packed, so OK to use old macros here */ str = DatumGetTextP(DirectFunctionCall1(lower, PointerGetDatum(str))); s = VARDATA(str); slen = (VARSIZE(str) - VARHDRSZ); *************** *** 161,168 **** } else { - p = VARDATA_ANY(pat); - plen = VARSIZE_ANY_EXHDR(pat); s = VARDATA_ANY(str); slen = VARSIZE_ANY_EXHDR(str); return SB_IMatchText(s, slen, p, plen); --- 197,202 ----
Andrew Dunstan <andrew@dunslane.net> writes: > The attached patch implements a one-value pattern cache for the > multi-byte encoding case for ILIKE. This reduces calls to lower() by > (50% -1) in the common case where the pattern is a constant. My own > testing and Guillaume Smet's show that this cuts roughly in half the > performance penalty we inflicted by using lower() in that case. > Is this sufficiently low risk to sneak into 8.3? This seems awfully ugly ... and considering that you don't get to avoid lower() on the data side, it seems pretty dubious that it buys very much percentagewise. It would also be a net loss for non-constant patterns, which are by no means unheard of --- or even two constant patterns used in the same query. We've lived with this in 8.2 without much complaint. I think we can let it go until we think of a better solution. To my mind this is all tied up in the problem of handling locales in a better fashion --- a lot of the inefficiency of lower() is due to having a poor impedance match to the libc locale-related functions, and might be eliminated if we had locale support with better APIs. regards, tom lane
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> The attached patch implements a one-value pattern cache for the >> multi-byte encoding case for ILIKE. This reduces calls to lower() by >> (50% -1) in the common case where the pattern is a constant. My own >> testing and Guillaume Smet's show that this cuts roughly in half the >> performance penalty we inflicted by using lower() in that case. >> > > >> Is this sufficiently low risk to sneak into 8.3? >> > > This seems awfully ugly ... and considering that you don't get to > avoid lower() on the data side, it seems pretty dubious that it > buys very much percentagewise. It would also be a net loss for > non-constant patterns, which are by no means unheard of --- or even > two constant patterns used in the same query. > The cost of using lower() is demonstrably high. Even on a very small pattern the speedup is easily measurable. The cost of the patch is effectively 1 call to strcmp() per call and 2 calls to memcpy() per cache miss, which should be quite cheap. > We've lived with this in 8.2 without much complaint. I think we can > let it go until we think of a better solution. To my mind this is > all tied up in the problem of handling locales in a better fashion > --- a lot of the inefficiency of lower() is due to having a poor > impedance match to the libc locale-related functions, and might be > eliminated if we had locale support with better APIs. > > > Well, we have a complaint now :-( This was aimed at being some temporary relief rather than a long term fix. I guess Guillaume can use this as a patch if he wants. I agree that we need better locale APIs. cheers andrew
Andrew Dunstan <andrew@dunslane.net> wrote: > The attached patch implements a one-value pattern cache for the > multi-byte encoding case for ILIKE. This reduces calls to lower() by > (50% -1) in the common case where the pattern is a constant. My own > testing and Guillaume Smet's show that this cuts roughly in half the > performance penalty we inflicted by using lower() in that case. It might be a better solution to create the new text type 'lower_text' and replace 'text ILIKE text' to 'text ILIKE lower_text'. The converter function 'lower' is marked as immutable, planner can evaluate it just one time in planning if the right-hand side is a constant expression. Here is a pseudo-code for the above. CREATE TYPE lower_text (INPUT = lower, ... ); CREATE CAST (text AS lower_text) WITH FUNCTION lower(text) AS IMPLICIT; CREATE OPERATOR ILIKE ( LEFTARG = text, RIGHTARG = lower_text, ); Regards, --- ITAGAKI Takahiro NTT Open Source Software Center