Обсуждение: pgsql: Clean up jsonb code.

Поиск
Список
Период
Сортировка

pgsql: Clean up jsonb code.

От
Heikki Linnakangas
Дата:
Clean up jsonb code.

The main target of this cleanup is the convertJsonb() function, but I also
touched a lot of other things that I spotted into in the process.

The new convertToJsonb() function uses an output buffer that's resized on
demand, so the code to estimate of the size of JsonbValue is removed.

The on-disk format was not changed, even though I refactored the structs
used to handle it. The term "superheader" is replaced with "container".

The jsonb_exists_any and jsonb_exists_all functions no longer sort the input
array. That was a premature optimization, the idea being that if there are
duplicates in the input array, you only need to check them once. Also,
sorting the array saves some effort in the binary search used to find a key
within an object. But there were drawbacks too: the sorting and
deduplicating obviously isn't free, and in the typical case there are no
duplicates to remove, and the gain in the binary search was minimal. Remove
all that, which makes the code simpler too.

This includes a bug-fix; the total length of the elements in a jsonb array
or object mustn't exceed 2^28. That is now checked.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/364ddc3e5cbd01c93a39896b5260509129a9883e

Modified Files
--------------
src/backend/utils/adt/jsonb.c      |   16 +-
src/backend/utils/adt/jsonb_gin.c  |    4 +-
src/backend/utils/adt/jsonb_op.c   |  106 ++--
src/backend/utils/adt/jsonb_util.c |  929 +++++++++++++-----------------------
src/backend/utils/adt/jsonfuncs.c  |   64 +--
src/include/utils/jsonb.h          |  213 +++++----
6 files changed, 556 insertions(+), 776 deletions(-)


Re: pgsql: Clean up jsonb code.

От
Peter Geoghegan
Дата:
Thanks for cleaning this up.

On Wed, May 7, 2014 at 1:18 PM, Heikki Linnakangas
<heikki.linnakangas@iki.fi> wrote:
> The jsonb_exists_any and jsonb_exists_all functions no longer sort the input
> array. That was a premature optimization, the idea being that if there are
> duplicates in the input array, you only need to check them once. Also,
> sorting the array saves some effort in the binary search used to find a key
> within an object. But there were drawbacks too: the sorting and
> deduplicating obviously isn't free, and in the typical case there are no
> duplicates to remove, and the gain in the binary search was minimal. Remove
> all that, which makes the code simpler too.

This is not the reason why the code did that. De-duplication was not
the point at all. findJsonbValueFromSuperHeader()'s lowbound argument
previously served to establish a low bound for searching when
searching for multiple keys (so the second and subsequent
user-supplied key could skip much of the object). In the case of
jsonb_exists_any(), say, if you only have a reasonable expectation
that about 1 key exists, and that happens to be the last key that the
user passed to the text[] argument (to the existence/? operator), then
n - 1 calls to what is now findJsonbValueFromContainer() (which now
does not accept a lowbound) are wasted.  That's elem_count - 1
top-level binary searches of the entire jsonb. Or elem_count such
calls rather than 1 call (plus 1 sort of the supplied array) in the
common case where jsonb_exists_any() will return false.

Granted, that might not be that bad right now, given that it's only
ever (say) elem_count or elem_count - 1 wasted binary searches through
the *top* level, but that might not always be true. And even today,
sorting a presumably much smaller user-passed lookup array once has to
be cheaper than searching through the entire jsonb perhaps elem_count
times per call.

--
Peter Geoghegan