Re: [PATCH] Clarify the comments about varlena header encoding

Поиск
Список
Период
Сортировка
От John Naylor
Тема Re: [PATCH] Clarify the comments about varlena header encoding
Дата
Msg-id CAFBsxsGkYq_atJ7i157cWG-YE2EPV2Bwq60Vw1LHJ7vjx+gFgg@mail.gmail.com
обсуждение исходный текст
Ответ на [PATCH] Clarify the comments about varlena header encoding  (Aleksander Alekseev <aleksander@timescale.com>)
Ответы Re: [PATCH] Clarify the comments about varlena header encoding  (Aleksander Alekseev <aleksander@timescale.com>)
Список pgsql-hackers
On Thu, Aug 18, 2022 at 1:06 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi hackers,
>
> I noticed that the comments regarding bit layouts for varlena headers
> in postgres.h are somewhat misleading. For instance, when reading:

I agree it's confusing, but I don't think this patch is the right direction.

> ```
> 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G)
> ```
>
> ... one can assume this is a 00xxxxxx byte followed by another 4 bytes
> (which is wrong). Also one can read this as "aligned, uncompressed
> data" (which again is wrong).

- * 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G)
+ * 00xxxxxx xxxxxxxx xxxxxxxx xxxxxxxx, uncompressed data (up to 1G)

Maybe "00xxxxxx 4-byte length word (aligned)," is more clear about
what is aligned. Also, adding all those xxx's obscures the point that
we only need to examine one byte to figure out what to do next.

> ```
> 10000000 1-byte length word, unaligned, TOAST pointer
> ```
>
> This is misleading too. The comments above this line say that `struct
> varatt_external` is a TOAST pointer. sizeof(varatt_external) = 16,
> plus 1 byte equals 17, right? However the documentation [1] claims the
> result should be 18:

The patch has:

+ * In the third case the va_tag field (see varattrib_1b_e) is used to discern
+ * the specific type and length of the pointer datum. On disk the "xxx" bits
+ * currently always store sizeof(varatt_external) + 2.

...so not sure where 17 came from.

- * 10000000 1-byte length word, unaligned, TOAST pointer
+ * 10000000 xxxxxxxx, TOAST pointer (struct varatt_external)

This implies that the header is two bytes, which is not accurate. That
next byte is a type tag:

/* TOAST pointers are a subset of varattrib_1b with an identifying tag byte */
typedef struct
{
uint8 va_header; /* Always 0x80 or 0x01 */
uint8 va_tag; /* Type of datum */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Type-specific data */
} varattrib_1b_e;

...and does not always represent the on-disk length:

/*
 * Type tag for the various sorts of "TOAST pointer" datums.  The peculiar
 * value for VARTAG_ONDISK comes from a requirement for on-disk compatibility
 * with a previous notion that the tag field was the pointer datum's length.
 */
typedef enum vartag_external
{
VARTAG_INDIRECT = 1,
VARTAG_EXPANDED_RO = 2,
VARTAG_EXPANDED_RW = 3,
VARTAG_ONDISK = 18
} vartag_external;

And I don't think the new comments referring to "third case", "first
two cases", etc make it easier to follow.
-- 
John Naylor
EDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nikita Glukhov
Дата:
Сообщение: Re: SQL/JSON features for v15
Следующее
От: Noah Misch
Дата:
Сообщение: Re: static libpq (and other libraries) overwritten on aix