RE: Popcount optimization using AVX512

Поиск
Список
Период
Сортировка
От Amonson, Paul D
Тема RE: Popcount optimization using AVX512
Дата
Msg-id BL1PR11MB53042C623F17F17469B2C053DC4B2@BL1PR11MB5304.namprd11.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Popcount optimization using AVX512  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Ответы Re: Popcount optimization using AVX512  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Álvaro,

All feedback is now completed. I added the additional checks for the new APIs and a separate check for the header to
autoconf.

About the double check for AVX 512 I added a large comment explaining why both are needed. There are cases where the
CPUZMM# registers are not exposed by the OS or hypervisor even if the CPU supports AVX512.
 

The big change is adding all old and new build support to meson. I am new to meson/ninja so please review carefully.

Thanks,
Paul

-----Original Message-----
From: Alvaro Herrera <alvherre@alvh.no-ip.org> 
Sent: Wednesday, February 7, 2024 2:13 AM
To: Amonson, Paul D <paul.d.amonson@intel.com>
Cc: Shankaran, Akash <akash.shankaran@intel.com>; Nathan Bossart <nathandbossart@gmail.com>; Noah Misch
<noah@leadboat.com>;Tom Lane <tgl@sss.pgh.pa.us>; Matthias van de Meent <boekewurm+postgres@gmail.com>;
pgsql-hackers@lists.postgresql.org
Subject: Re: Popcount optimization using AVX512

Hello,

This looks quite reasonable.  On my machine, I get the compiler test to pass so I get a "yes" in configure; but of
coursemy CPU doesn't support the instructions so I get the slow variant.  So here's the patch again with some minor
artifactsfixed.
 

I have the following review notes:

1. we use __get_cpuid_count and __cpuidex by relying on macros HAVE__GET_CPUID and HAVE__CPUID respectively; but those
macrosare (in the current Postgres source) only used and tested for __get_cpuid and __cpuid respectively.  So unless
there'ssome reason to be certain that __get_cpuid_count is always present when __get_cpuid is present, and that
__cpuidexis present when __cpuid is present, I think we need to add new configure tests and new HAVE_ macros for
these.

2. we rely on <immintrin.h> being present with no AC_CHECK_HEADER() test.  We currently don't use this header anywhere,
soI suppose we need a test for this one as well.  (Also, I suppose if we don't have immintrin.h we can skip the rest of
it?)

3. We do the __get_cpuid_count/__cpuidex test and we also do a xgetbv test.  The comment there claims that this is to
checkthe results for consistency.  But ... how would we know that the results are ever inconsistent?  As far as I
understand,if they were, we would silently become slower.  Is this really what we want?  I'm confused about this
coding. Maybe we do need both tests to succeed?  In that case, just reword the comment.
 

I think if both tests are each considered reliable on its own, then we could either choose one of them and stick with
it,ignoring the other; or we could use one as primary and then in a USE_ASSERT_CHECKING block verify that the other
matchesand throw a WARNING if not (but what would that tell us?).  Or something like that ... not sure.
 

4. It needs meson support, which I suppose consists of copying the
c-compiler.m4 test into meson.build, mimicking what the tests for CRC instructions do.


I started a CI run with this patch applied,
https://cirrus-ci.com/build/4912499619790848
but because Meson support is missing, the compile failed
immediately:

[10:08:48.825] ccache cc -Isrc/port/libpgport_srv.a.p -Isrc/include -I../src/include -Isrc/include/utils
-fdiagnostics-color=always-pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -g -fno-strict-aliasing -fwrapv
-fexcess-precision=standard-D_GNU_SOURCE -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels
-Wmissing-format-attribute-Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security
-Wdeclaration-after-statement-Wno-format-truncation -Wno-stringop-truncation -fPIC -pthread -DBUILDING_DLL -MD -MQ
src/port/libpgport_srv.a.p/pg_bitutils.c.o-MF src/port/libpgport_srv.a.p/pg_bitutils.c.o.d -o
src/port/libpgport_srv.a.p/pg_bitutils.c.o-c ../src/port/pg_bitutils.c [10:08:48.825] ../src/port/pg_bitutils.c: In
function‘pg_popcount512_fast’:
 
[10:08:48.825] ../src/port/pg_bitutils.c:270:11: warning: AVX512F vector return without AVX512F enabled changes the ABI
[-Wpsabi]
[10:08:48.825]   270 |  __m512i  accumulator = _mm512_setzero_si512();
[10:08:48.825]       |           ^~~~~~~~~~~
[10:08:48.825] In file included from /usr/lib/gcc/x86_64-linux-gnu/10/include/immintrin.h:55,
[10:08:48.825]                  from ../src/port/pg_bitutils.c:22:
[10:08:48.825] /usr/lib/gcc/x86_64-linux-gnu/10/include/avx512fintrin.h:339:1: error: inlining failed in call to
‘always_inline’‘_mm512_setzero_si512’: target specific option mismatch
 
[10:08:48.825]   339 | _mm512_setzero_si512 (void)
[10:08:48.825]       | ^~~~~~~~~~~~~~~~~~~~
[10:08:48.825] ../src/port/pg_bitutils.c:270:25: note: called from here
[10:08:48.825]   270 |  __m512i  accumulator = _mm512_setzero_si512();
[10:08:48.825]       |                         ^~~~~~~~~~~~~~~~~~~~~~


Thanks

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Siempre hay que alimentar a los dioses, aunque la tierra esté seca" (Orual)

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nikita Malakhov
Дата:
Сообщение: Re: POC: Extension for adding distributed tracing - pg_tracing
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Popcount optimization using AVX512