Re: Change GUC hashtable to use simplehash?

Поиск
Список
Период
Сортировка
От John Naylor
Тема Re: Change GUC hashtable to use simplehash?
Дата
Msg-id CANWCAZaswYfm7_jnuXua1R8BU6kdSPhe9geCdvGVwojoGvfoMg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Change GUC hashtable to use simplehash?  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Change GUC hashtable to use simplehash?  (John Naylor <johncnaylorls@gmail.com>)
Список pgsql-hackers
On Sun, Dec 10, 2023 at 2:18 AM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Sat, 2023-12-09 at 18:52 +0700, John Naylor wrote:
> > > I tested using the new hash function APIs for my search path cache,
> > > and
> > > there's a significant speedup for cases not benefiting from
> > > a86c61c9ee.
> > > It's enough that we almost don't need a86c61c9ee. So a definite +1
> > > to
> > > the new APIs.

Interesting, thanks for testing! SearchPathCache is a better starting
point than dynahash for removing strlen calls anyway -- it's more
localized, uses simplehash, and we can test it with at-hand tests.

> > Do you have a new test?
>
> Still using the same basic test here:
>
> https://www.postgresql.org/message-id/04c8592dbd694e4114a3ed87139a7a04e4363030.camel%40j-davis.com
>
> What I did is:
>
>    a. add your v5 patches
>    b. disable optimization in a86c61c9ee
>    c. add attached patch to use new hash APIs

Of course, the CF bot doesn't know this, so it crashed and burned
before I had a chance to check how v6 did. I'm attaching v7 which just
improves commit messages for reviewing, and gets rid of git whitespace
errors.

My local branch of master is still at 457428d9e99b6 from Dec 4. That's
before both a86c61c9ee (Optimize SearchPathCache by saving the last
entry.) and 867dd2dc87 (Cache opaque handle for GUC option to avoid
repeasted lookups.). My plan was to keep testing against Dec. 4, but
like you I'm not sure if there is a better GUC test to do now.

> I got a slowdown between (a) and (b), and then (c) closed the gap about
> halfway. It started to get close to test noise at that point -- I could
> get some better numbers out of it if it's helpful.

We can also try (c) with using the "chunked" interface. Also note your
patch may no longer apply on top of v6 or v7.

> Also, what I'm doing in the attached path is using part of the key as
> the seed. Is that a good idea or should the seed be zero or come from
> somewhere else?

I think whether to use part of the key as a seed is a judgment call.
See this part in resowner.c:

/*
 * Most resource kinds store a pointer in 'value', and pointers are unique
 * all on their own.  But some resources store plain integers (Files and
 * Buffers as of this writing), so we want to incorporate the 'kind' in
 * the hash too, otherwise those resources will collide a lot.  But
 * because there are only a few resource kinds like that - and only a few
 * resource kinds to begin with - we don't need to work too hard to mix
 * 'kind' into the hash.  Just add it with hash_combine(), it perturbs the
 * result enough for our purposes.
 */
#if SIZEOF_DATUM == 8
    return hash_combine64(murmurhash64((uint64) value), (uint64) kind);

Given these comments, I'd feel free to use the "kind" as the seed if I
were writing this with fasthash.

The caller-provided seed can probably be zero unless we have a good
reason to, like the above, but with the incremental interface there is
an issue:

hs->hash = seed ^ (len * UINT64CONST(0x880355f21e6d1965));

Passing length 0 will wipe out the internal seed here, and that can't be good.

1) We could by convention pass "1" as the length for strings. That
could be a macro like

#define FH_UNKNOWN_LENGTH 1

...and maybe Assert(len != 0 || seed != 0)

Or 2) we could detect zero and force it to be one, but it's best if
the compiler can always constant-fold that branch. Future work may
invalidate that assumption.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Junwang Zhao
Дата:
Сообщение: Re: Make COPY format extendable: Extract COPY TO format implementations
Следующее
От: Alexander Lakhin
Дата:
Сообщение: Re: postgres_fdw test timeouts