Re: [POC] verifying UTF-8 using SIMD instructions

Поиск
Список
Период
Сортировка
От John Naylor
Тема Re: [POC] verifying UTF-8 using SIMD instructions
Дата
Msg-id CAFBsxsHqsgKc60+2u5FpRQMCcmkzemtMK0avm7fmuDzL-R0KPw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [POC] verifying UTF-8 using SIMD instructions  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-hackers


On Tue, Feb 9, 2021 at 4:22 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> On 09/02/2021 22:08, John Naylor wrote:
> > Maybe there's a smarter way to check for zeros in C. Or maybe be more
> > careful about cache -- running memchr() on the whole input first might
> > not be the best thing to do.
>
> The usual trick is the haszero() macro here:
> https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord. That's
> how memchr() is typically implemented, too.

Thanks for that. Checking with that macro each loop iteration gives a small boost:

v1, but using memcpy()

 mixed | ascii
-------+-------
   601 |   129

with haszero()

 mixed | ascii
-------+-------
   583 |   105

remove zero-byte check:

 mixed | ascii
-------+-------
   588 |    93

--
John Naylor
EDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Greg Nancarrow
Дата:
Сообщение: Re: Parallel INSERT (INTO ... SELECT ...)
Следующее
От: Ajin Cherian
Дата:
Сообщение: Re: Single transaction in the tablesync worker?