Обсуждение: 64 bit TID?

Поиск
Список
Период
Сортировка

64 bit TID?

От
Chris Cleveland
Дата:
All,

I'm considering a new design for a specialized table am. It would simplify the design if TIDs grew forever and I didn't have to implement TID reuse logic.

The current 48 bit TID is big, but I can see extreme situations where it might not be quite big enough. If every row that gets updated needs a TID, and something like an IoT app is updating huge numbers of rows per second using multiple connections in parallel, there might be a problem. This is especially true if each connection requests a batch of TIDs and then doesn't use all of them.

Are there any plans in the works to widen the TID?

I saw some notes on this in the Zedstore project, but there hasn't been much activity in that project for almost a year.

Chris

--
Chris Cleveland
312-339-2677 mobile

Re: 64 bit TID?

От
Matthias van de Meent
Дата:
On Mon, 13 Sept 2021 at 17:50, Chris Cleveland
<ccleveland@dieselpoint.com> wrote:
>
> All,
>
> I'm considering a new design for a specialized table am. It would simplify the design if TIDs grew forever and I
didn'thave to implement TID reuse logic. 

TID reuse logic also helps clean up index tuples for deleted table
tuples. I would suggest to implement TID reuse logic if only to
prevent indexes from growing indefinately (or TID limits reached,
whichever first).

> The current 48 bit TID is big, but I can see extreme situations where it might not be quite big enough. If every row
thatgets updated needs a TID, and something like an IoT app is updating huge numbers of rows per second using multiple
connectionsin parallel, there might be a problem. 

If your table contains such large amounts of (versions of) tuples, you
might want to partition your table(s), as that allows the system to
move some bits of tuple identification to the the relation identifier.

> This is especially true if each connection requests a batch of TIDs and then doesn't use all of them.

For the HeapAM this is never the case; TIDs cannot be allocated
without use (albeit some may be used for rolled-back and thus dead
tuples).

> Are there any plans in the works to widen the TID?

This was recently discussed here [0] as well, but to the best of my
knowledge no material proposal to update the APIs has been suggested
as of yet.

Kind regards,

Matthias van de Meent

[0] https://www.postgresql.org/message-id/flat/0bbeb784050503036344e1f08513f13b2083244b.camel%40j-davis.com



Re: 64 bit TID?

От
Chris Cleveland
Дата:
> > Are there any plans in the works to widen the TID?
>
> This was recently discussed here [0] as well, but to the best of my
> knowledge no material proposal to update the APIs has been suggested
> as of yet.
>
> [0] https://www.postgresql.org/message-id/flat/0bbeb784050503036344e1f08513f13b2083244b.camel%40j-davis.com

Wow, thank you, that is some thread. It discusses the issues
thoroughly. As I see it, there are three options:

1. Make it possible to use the unused 5 bits in the existing TID
scheme. The advantages: we get the full 48 bits, and it may not take a
lot of work, and it makes Jeff Davis' work with Columnar easier.

2. Go to a flat 64-bit logical TID. The advantages: certain types of
table AMs work better, including Columnar and LSM tree-based AMs
(which I'm currently working on).

3. Go to a variable-length TID. The advantages: you can stuff any kind
of payload into the TID, which would make clustered tables and certain
fancy indexes easier, but would be far more work.

I would contribute patches myself, but I'm not *yet* skilled enough in
the ways of Postgres to do so.

Questions:

Would widening the existing ItemPointer to 64 bits now preclude a
variable-length TID in the future? Or make it more difficult?

How much work would it take?

Since the thread ended in May, has the group reached any kind of
consensus on the issue?
-- 
Chris Cleveland
312-339-2677 mobile



Re: 64 bit TID?

От
Peter Geoghegan
Дата:
On Mon, Sep 13, 2021 at 3:30 PM Chris Cleveland
<ccleveland@dieselpoint.com> wrote:
> Wow, thank you, that is some thread. It discusses the issues
> thoroughly.

If somebody wants to make TIDs (or some generalized TID-like thing
that tableam knows about) into logical identifiers, then they must
also answer the question: identifiers of what?

TIDs from Postgres heapam identify a physical version, or perhaps a
HOT chain -- which is not how TIDs work in other DB systems that use a
heap structure. This is the only reason why we can mostly think of
indexes as data structures that don't need to be involved in
concurrency control. Postgres index access methods don't usually need
to know anything about locks that protect the logical structure of the
database.

The option of just creating a new distinct TID (for the same logical
row) buys us the ability to keep index access methods rather separate
from everything else -- which helps with extensibility. No logical
locks are required in Postgres. Complicated designs that bleed into
other parts of the system (designs like ARIES/KVL and ARIES/IM) are
unnecessary.

> Questions:
>
> Would widening the existing ItemPointer to 64 bits now preclude a
> variable-length TID in the future? Or make it more difficult?
>
> How much work would it take?

If it was just a matter of changing the data structure then I think it
would be far easier.

-- 
Peter Geoghegan



Re: 64 bit TID?

От
Peter Geoghegan
Дата:
On Mon, Sep 13, 2021 at 5:36 PM Peter Geoghegan <pg@bowt.ie> wrote:
> If somebody wants to make TIDs (or some generalized TID-like thing
> that tableam knows about) into logical identifiers, then they must
> also answer the question: identifiers of what?
>
> TIDs from Postgres heapam identify a physical version, or perhaps a
> HOT chain -- which is not how TIDs work in other DB systems that use a
> heap structure. This is the only reason why we can mostly think of
> indexes as data structures that don't need to be involved in
> concurrency control. Postgres index access methods don't usually need
> to know anything about locks that protect the logical structure of the
> database.

The 1993 paper "Options in Physical Database Design" gives a useful
overview of the challenges here. Especially for an extensibile system
like Postgres relative to a system with a traditional design
implementing classic ARIES.

I think that you need an ACM membership to get a copy. The relevant
section starts out like this:

"""
Item Representation
-------------------

Physical representation types for abstract data types
is only slowly gaining research attention for object-
oriented database systems but will likely become a
very important tuning option. Examples include sets
represented as bit maps, arrays, or lists and matrices
represented densely or sparsely, by row or by column
or as tiles, e.g. [MaV93]. The goal is to bring physical
data independence to object-oriented and scientific
databases and their applications.

Physical pointers, references, or object identifiers to
represent relationships support "navigation" through a
database, which is very good for single-instance
retrievals and often improves set matching, but also
creates a new type of updates, structural updates,
which may increase the complexity of concurrency
control and recovery [CSL90, ChK84, RoR85,
ShC90].
"""

This seems to be a fundamental trade-off that is tied inextricably to
the design of many other things.

That doesn't stop anybody from creating a column store using the
tableam. But it does mean that they will need to be very careful about
defining what exact "logical vs physical vs physiological" tradeoff
they've chosen. It's rather subtle stuff.

-- 
Peter Geoghegan