Обсуждение: allow frontend use of the backend's core hashing functions

Поиск
Список
Период
Сортировка

allow frontend use of the backend's core hashing functions

От
Robert Haas
Дата:
Late last year, I did some work to make it possible to use simplehash
in frontend code.[1] However, a hash table is not much good unless one
also has some hash functions that one can use to hash the keys that
need to be inserted into that hash table. I initially thought that
solving this problem was going to be pretty annoying, but when I
looked at it again today I found what I think is a pretty simple way
to adapt things so that the hashing routines we use in the backend are
easily available to frontend code.

Here are some patches for that. It may make sense to combine some of
these in terms of actually getting this committed, but I have
separated them here so that it is, hopefully, easy to see what I did
and why I did it. There are three basic problems which are solved by
the three preparatory patches:

0001 - hashfn.c has a couple of routines that depend on bitmapsets,
and bitmapset.c is currently backend-only. Fix by moving this code
near related code in bitmapset.c.

0002 - some of the prototypes for functions in hashfn.c are in
hsearch.h, mixed with the dynahash stuff, and others are in
hashutils.c. Fix by making hashutils.h the one true header for
hashfn.c.

0003 - Some of hashfn.c's routines return Datum, but that's
backend-only. Fix by renaming the functions and changing the return
types to uint32 and uint64, and add static inline wrappers with the
old names that convert to Datum. Just changing the return types of the
existing functions seemed like it would've required a lot more code
churn, and also seems like it could cause silent breakage in the
future.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] http://postgr.es/m/CA+Tgmob8oyh02NrZW=xCScB+5GyJ-jVowE3+TWTUmPF=FsGWTA@mail.gmail.com

Вложения

Re: allow frontend use of the backend's core hashing functions

От
Suraj Kharage
Дата:
Hi,

I have spent some time reviewing the patches and overall it looks good to me.

However, I have few cosmetic review comments for 0003 patch as below;

1:
+++ b/src/backend/utils/hash/hashfn.c
@@ -16,15 +16,14 @@
  *  It is expected that every bit of a hash function's 32-bit result is
  *  as random as every other; failure to ensure this is likely to lead
  *  to poor performance of hash tables.  In most cases a hash
- *  function should use hash_any() or its variant hash_uint32().
+ *  function should use hash_bytes() or its variant hash_bytes_uint32(),
+ *  or the wrappers hash_any() and hash_any_uint32 defined in hashfn.h.

Here, indicated function name should be hash_uint32.

2: I can see renamed functions are declared twice in hashutils.c. I think duplicate declarations after #endif can be removed,

+extern uint32 hash_bytes(const unsigned char *k, int keylen);
+extern uint64 hash_bytes_extended(const unsigned char *k,
+  int keylen, uint64 seed);
+extern uint32 hash_bytes_uint32(uint32 k);
+extern uint64 hash_bytes_uint32_extended(uint32 k, uint64 seed);
+
+#ifndef FRONTEND
..
Wrapper functions
..
+#endif
+
+extern uint32 hash_bytes(const unsigned char *k, int keylen);
+extern uint64 hash_bytes_extended(const unsigned char *k,
+  int keylen, uint64 seed);
+extern uint32 hash_bytes_uint32(uint32 k);
+extern uint64 hash_bytes_uint32_extended(uint32 k, uint64 seed);


3: The first line of the commit message has one typo.
defiend => defined.

On Fri, Feb 7, 2020 at 11:00 PM Robert Haas <robertmhaas@gmail.com> wrote:
Late last year, I did some work to make it possible to use simplehash
in frontend code.[1] However, a hash table is not much good unless one
also has some hash functions that one can use to hash the keys that
need to be inserted into that hash table. I initially thought that
solving this problem was going to be pretty annoying, but when I
looked at it again today I found what I think is a pretty simple way
to adapt things so that the hashing routines we use in the backend are
easily available to frontend code.

Here are some patches for that. It may make sense to combine some of
these in terms of actually getting this committed, but I have
separated them here so that it is, hopefully, easy to see what I did
and why I did it. There are three basic problems which are solved by
the three preparatory patches:

0001 - hashfn.c has a couple of routines that depend on bitmapsets,
and bitmapset.c is currently backend-only. Fix by moving this code
near related code in bitmapset.c.

0002 - some of the prototypes for functions in hashfn.c are in
hsearch.h, mixed with the dynahash stuff, and others are in
hashutils.c. Fix by making hashutils.h the one true header for
hashfn.c.

0003 - Some of hashfn.c's routines return Datum, but that's
backend-only. Fix by renaming the functions and changing the return
types to uint32 and uint64, and add static inline wrappers with the
old names that convert to Datum. Just changing the return types of the
existing functions seemed like it would've required a lot more code
churn, and also seems like it could cause silent breakage in the
future.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] http://postgr.es/m/CA+Tgmob8oyh02NrZW=xCScB+5GyJ-jVowE3+TWTUmPF=FsGWTA@mail.gmail.com


--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.

Re: allow frontend use of the backend's core hashing functions

От
Mark Dilger
Дата:

> On Feb 13, 2020, at 3:44 AM, Suraj Kharage <suraj.kharage@enterprisedb.com> wrote:
>
> Hi,
>
> I have spent some time reviewing the patches and overall it looks good to me.
>
> However, I have few cosmetic review comments for 0003 patch as below;
>
> 1:
> +++ b/src/backend/utils/hash/hashfn.c
> @@ -16,15 +16,14 @@
>   *     It is expected that every bit of a hash function's 32-bit result is
>   *     as random as every other; failure to ensure this is likely to lead
>   *     to poor performance of hash tables.  In most cases a hash
> - *     function should use hash_any() or its variant hash_uint32().
> + *     function should use hash_bytes() or its variant hash_bytes_uint32(),
> + *     or the wrappers hash_any() and hash_any_uint32 defined in hashfn.h.
>
> Here, indicated function name should be hash_uint32.

+1

> 2: I can see renamed functions are declared twice in hashutils.c. I think duplicate declarations after #endif can be
removed,
>
> +extern uint32 hash_bytes(const unsigned char *k, int keylen);
> +extern uint64 hash_bytes_extended(const unsigned char *k,
> +     int keylen, uint64 seed);
> +extern uint32 hash_bytes_uint32(uint32 k);
> +extern uint64 hash_bytes_uint32_extended(uint32 k, uint64 seed);
> +
> +#ifndef FRONTEND
> ..
> Wrapper functions
> ..
> +#endif
> +
> +extern uint32 hash_bytes(const unsigned char *k, int keylen);
> +extern uint64 hash_bytes_extended(const unsigned char *k,
> +     int keylen, uint64 seed);
> +extern uint32 hash_bytes_uint32(uint32 k);
> +extern uint64 hash_bytes_uint32_extended(uint32 k, uint64 seed);

+1

> 3: The first line of the commit message has one typo.
> defiend => defined.

+1

I have made these changes and rebased Robert’s patches but otherwise changed nothing.  Here they are:



—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Вложения

Re: allow frontend use of the backend's core hashing functions

От
Robert Haas
Дата:
On Thu, Feb 13, 2020 at 11:26 AM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> I have made these changes and rebased Robert’s patches but otherwise changed nothing.  Here they are:

Thanks. Anyone else have comments? I think this is pretty
straightforward and unobjectionable work so I'm inclined to press
forward with committing it fairly soon, but if someone feels
otherwise, please speak up.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: allow frontend use of the backend's core hashing functions

От
David Fetter
Дата:
On Fri, Feb 14, 2020 at 10:33:04AM -0500, Robert Haas wrote:
> On Thu, Feb 13, 2020 at 11:26 AM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
> > I have made these changes and rebased Robert’s patches but
> > otherwise changed nothing.  Here they are:
> 
> Thanks. Anyone else have comments? I think this is pretty
> straightforward and unobjectionable work so I'm inclined to press
> forward with committing it fairly soon, but if someone feels
> otherwise, please speak up.

One question. It might be possible to make these functions faster
using compiler intrinsics. Would those still be available to front-end
code?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: allow frontend use of the backend's core hashing functions

От
Mark Dilger
Дата:

> On Feb 14, 2020, at 8:15 AM, David Fetter <david@fetter.org> wrote:
>
> On Fri, Feb 14, 2020 at 10:33:04AM -0500, Robert Haas wrote:
>> On Thu, Feb 13, 2020 at 11:26 AM Mark Dilger
>> <mark.dilger@enterprisedb.com> wrote:
>>> I have made these changes and rebased Robert’s patches but
>>> otherwise changed nothing.  Here they are:
>>
>> Thanks. Anyone else have comments? I think this is pretty
>> straightforward and unobjectionable work so I'm inclined to press
>> forward with committing it fairly soon, but if someone feels
>> otherwise, please speak up.
>
> One question. It might be possible to make these functions faster
> using compiler intrinsics. Would those still be available to front-end
> code?

Do you have a specific proposal that would preserve on-disk compatibility?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: allow frontend use of the backend's core hashing functions

От
Robert Haas
Дата:
On Fri, Feb 14, 2020 at 11:15 AM David Fetter <david@fetter.org> wrote:
> One question. It might be possible to make these functions faster
> using compiler intrinsics. Would those still be available to front-end
> code?

Why not? We use the same compiler for frontend code as we do for
backend code. Possibly you might need some more header-file
rejiggering someplace, but I don't know of anything specific or see
any reason why it would be particularly painful if we had to do
something.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: allow frontend use of the backend's core hashing functions

От
David Fetter
Дата:
On Fri, Feb 14, 2020 at 08:16:47AM -0800, Mark Dilger wrote:
> > On Feb 14, 2020, at 8:15 AM, David Fetter <david@fetter.org> wrote:
> > 
> > On Fri, Feb 14, 2020 at 10:33:04AM -0500, Robert Haas wrote:
> >> On Thu, Feb 13, 2020 at 11:26 AM Mark Dilger
> >> <mark.dilger@enterprisedb.com> wrote:
> >>> I have made these changes and rebased Robert’s patches but
> >>> otherwise changed nothing.  Here they are:
> >> 
> >> Thanks. Anyone else have comments? I think this is pretty
> >> straightforward and unobjectionable work so I'm inclined to press
> >> forward with committing it fairly soon, but if someone feels
> >> otherwise, please speak up.
> > 
> > One question. It might be possible to make these functions faster
> > using compiler intrinsics. Would those still be available to front-end
> > code?
> 
> Do you have a specific proposal that would preserve on-disk compatibility?

I hadn't planned on changing the representation, just cutting
instructions out of the calculation of same.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: allow frontend use of the backend's core hashing functions

От
Mark Dilger
Дата:

> On Feb 14, 2020, at 8:29 AM, David Fetter <david@fetter.org> wrote:
>
> On Fri, Feb 14, 2020 at 08:16:47AM -0800, Mark Dilger wrote:
>>> On Feb 14, 2020, at 8:15 AM, David Fetter <david@fetter.org> wrote:
>>>
>>> On Fri, Feb 14, 2020 at 10:33:04AM -0500, Robert Haas wrote:
>>>> On Thu, Feb 13, 2020 at 11:26 AM Mark Dilger
>>>> <mark.dilger@enterprisedb.com> wrote:
>>>>> I have made these changes and rebased Robert’s patches but
>>>>> otherwise changed nothing.  Here they are:
>>>>
>>>> Thanks. Anyone else have comments? I think this is pretty
>>>> straightforward and unobjectionable work so I'm inclined to press
>>>> forward with committing it fairly soon, but if someone feels
>>>> otherwise, please speak up.
>>>
>>> One question. It might be possible to make these functions faster
>>> using compiler intrinsics. Would those still be available to front-end
>>> code?
>>
>> Do you have a specific proposal that would preserve on-disk compatibility?
>
> I hadn't planned on changing the representation, just cutting
> instructions out of the calculation of same.

Ok, I misunderstood.

I thought the question was about using compiler intrinsics to implement an alltogether different hashing algorithm than
theone currently in use and whether exposing the hash function to frontend code would lock down the algorithm in a way
thatwould make it harder to change in the future.  That lead me to the question of whether we had sufficient
flexibilityto entertain changing the hashing algorithm anyway. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: allow frontend use of the backend's core hashing functions

От
Robert Haas
Дата:
On Fri, Feb 14, 2020 at 9:03 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Feb 13, 2020 at 11:26 AM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
> > I have made these changes and rebased Robert’s patches but otherwise changed nothing.  Here they are:
>
> Thanks. Anyone else have comments? I think this is pretty
> straightforward and unobjectionable work so I'm inclined to press
> forward with committing it fairly soon, but if someone feels
> otherwise, please speak up.

I've committed 0001 through 0003 as revised by Mark in accordance with
the comments from Suraj. Here's the last patch again with a tweak to
try not to break the Windows build, per some off-list advice I
received on how not to break the Windows build. Barring complaints
from the buildfarm or otherwise, I'll commit this one too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: allow frontend use of the backend's core hashing functions

От
Robert Haas
Дата:
On Mon, Feb 24, 2020 at 5:32 PM Robert Haas <robertmhaas@gmail.com> wrote:
> I've committed 0001 through 0003 as revised by Mark in accordance with
> the comments from Suraj. Here's the last patch again with a tweak to
> try not to break the Windows build, per some off-list advice I
> received on how not to break the Windows build. Barring complaints
> from the buildfarm or otherwise, I'll commit this one too.

Done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company