Обсуждение: Add pg_strtoupper and pg_strtolower functions

Поиск
Список
Период
Сортировка

Add pg_strtoupper and pg_strtolower functions

От
Bharath Rupireddy
Дата:
Hi,

I came across pg_toupper and pg_tolower functions, converting a single
character, are being used in loops to convert an entire
null-terminated string. The cost of calling these character-based
conversion functions (even though small) can be avoided if we have two
new functions pg_strtoupper and pg_strtolower.

Attaching a patch with these new two functions and their usage in most
of the possible places in the code.

Thoughts?

Regards,
Bharath Rupireddy.

Вложения

Re: Add pg_strtoupper and pg_strtolower functions

От
Ashutosh Bapat
Дата:
On Mon, May 2, 2022 at 6:21 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi,
>
> I came across pg_toupper and pg_tolower functions, converting a single
> character, are being used in loops to convert an entire
> null-terminated string. The cost of calling these character-based
> conversion functions (even though small) can be avoided if we have two
> new functions pg_strtoupper and pg_strtolower.

Have we measured the saving in cost? Let's say for a million character
long string?

>
> Attaching a patch with these new two functions and their usage in most
> of the possible places in the code.

Converting pg_toupper and pg_tolower to "inline" might save cost
similarly and also avoid code duplication?

-- 
Best Wishes,
Ashutosh Bapat



Re: Add pg_strtoupper and pg_strtolower functions

От
Bharath Rupireddy
Дата:
On Mon, May 2, 2022 at 6:43 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Mon, May 2, 2022 at 6:21 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Hi,
> >
> > I came across pg_toupper and pg_tolower functions, converting a single
> > character, are being used in loops to convert an entire
> > null-terminated string. The cost of calling these character-based
> > conversion functions (even though small) can be avoided if we have two
> > new functions pg_strtoupper and pg_strtolower.
>
> Have we measured the saving in cost? Let's say for a million character
> long string?

I didn't spend time on figuring out the use-cases hitting all the code
areas, even if I do so, the function call cost savings might not
impress most of the time and the argument of saving function call cost
then becomes pointless.

> > Attaching a patch with these new two functions and their usage in most
> > of the possible places in the code.
>
> Converting pg_toupper and pg_tolower to "inline" might save cost
> similarly and also avoid code duplication?

I think most of the modern compilers do inline small functions. But,
inlining isn't always good as it increases the size of the code. With
the proposed helper functions, the code looks cleaner (at least IMO,
others may have different opinions though).

Regards,
Bharath Rupireddy.



Re: Add pg_strtoupper and pg_strtolower functions

От
Alvaro Herrera
Дата:
On 2022-May-02, Bharath Rupireddy wrote:

> Hi,
> 
> I came across pg_toupper and pg_tolower functions, converting a single
> character, are being used in loops to convert an entire
> null-terminated string. The cost of calling these character-based
> conversion functions (even though small) can be avoided if we have two
> new functions pg_strtoupper and pg_strtolower.

Currently, pg_toupper/pg_tolower are used in very limited situations.
Are they really always safe enough to run in arbitrary situations,
enough to create this new layer on top of them?  Reading the comment on
pg_tolower, "the whole thing is a bit bogus for multibyte charsets", I
worry that we might create security holes, either now or in future
callsites that use these new functions.

Consider that in the Turkish locale you lowercase an I (single-byte
ASCII character) with a dotless-i (two bytes).  So overwriting the input
string is not a great solution.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Nunca se desea ardientemente lo que solo se desea por razón" (F. Alexandre)



Re: Add pg_strtoupper and pg_strtolower functions

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Currently, pg_toupper/pg_tolower are used in very limited situations.
> Are they really always safe enough to run in arbitrary situations,
> enough to create this new layer on top of them?

They are not, and we should absolutely not be encouraging additional uses
of them.  The existing multi-character str_toupper/str_tolower functions
should be used instead.  (Perhaps those should be relocated to someplace
more prominent?)

> Reading the comment on
> pg_tolower, "the whole thing is a bit bogus for multibyte charsets", I
> worry that we might create security holes, either now or in future
> callsites that use these new functions.

I doubt that they are security holes, but they do give unexpected
answers in some locales.

            regards, tom lane