Обсуждение: Partitioning update-heavy queue with hash partitions vs partial indexes

Поиск
Список
Период
Сортировка

Partitioning update-heavy queue with hash partitions vs partial indexes

От
Dorian Hoxha
Дата:
Hi list,

I have a queue table with the schema:

```
create table queue(id bigserial primary key, view_time timestamp with timezone not null);
create index queue_view_time ON queue(view_time ASC);
```

The most concurrent operation is:
```
UPDATE queue SET view_time=view_time+INTERVAL '60 seconds' WHERE id=(
    SELECT id FROM queue WHERE view_time<=now() at time zone 'utc' ORDER BY view_time ASC LIMIT 1 FOR UPDATE SKIP LOCKED
)
```
As you can imagine, with increased concurrency, this query will have to read & skip a lot of locked+dead index entries, so taking a lot of cpu-time.

I'm assuming 10K+ queries/second will do the update above and actually return a row.
You may think about how you'll maintain 10K connections, but you can increase the limit, the queries being fast, use a connection pooler, use auto-commit, etc.

--------------

Since most of the overhead is in the `queue_view_time` index, I thought of partitioning just that with partial indexes and then querying the indexes randomly. This is with 2 partitions:

```
create index queue_view_time_0 ON queue(view_time ASC) WHERE id%2=0;
create index queue_view_time_0 ON queue(view_time ASC) WHERE id%2=1;
```
Adding `where id%2=0` to the select query above and trying the partitions randomly until I get a row or searched all partitions.

----------------

> Do Not Use Partial Indexes as a Substitute for Partitioning
> While a search in this larger index might have to descend through a couple more tree levels than a search in a smaller index, that's almost certainly going to be cheaper than the planner effort needed to select the appropriate one of the partial indexes. The core of the problem is that the system does not understand the relationship among the partial indexes, and will laboriously test each one to see if it's applicable to the current query.

Would this be true in my case too? 

Is it faster for the planner to select a correct partition(hash partitioning on `id` column) instead of a correct partial index like in my case? I don't think I'll need more than ~32 partitions/partial-indexes in an extreme scenario.

Regards,
Dorian

Re: Partitioning update-heavy queue with hash partitions vs partial indexes

От
David Rowley
Дата:
On Thu, 10 Aug 2023 at 20:36, Dorian Hoxha <dorian.hoxha@gmail.com> wrote:
> > Do Not Use Partial Indexes as a Substitute for Partitioning
> > While a search in this larger index might have to descend through a couple more tree levels than a search in a
smallerindex, that's almost certainly going to be cheaper than the planner effort needed to select the appropriate one
ofthe partial indexes. The core of the problem is that the system does not understand the relationship among the
partialindexes, and will laboriously test each one to see if it's applicable to the current query. 
>
> Would this be true in my case too?

Yes.  The process of determining which partial indexes are valid for
the given query must consider each index one at a time and validate
the index's WHERE clause against the query's WHERE clause to see if it
can be used.  There is no shortcut that sees you have a series of
partial indexes with WHERE id % 10 = N; which just picks 1 index
without searching all of them.

> Is it faster for the planner to select a correct partition(hash partitioning on `id` column) instead of a correct
partialindex like in my case? I don't think I'll need more than ~32 partitions/partial-indexes in an extreme scenario. 

I mean, test it and find out, but probably, yes, the partition pruning
code for hash partitioning is an O(1) operation and is very fast.
Once the given Constants have been hashed, finding the partition is
just a single divide operation away.

David



Re: Partitioning update-heavy queue with hash partitions vs partial indexes

От
burcinyazici@gmail.com
Дата:
hi,
Consider adding id%10 as a new column?you will have one time burden but after creating index on it, update perf will
satisfy.

Burcin 📱

> On 11 Aug 2023, at 07:49, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Thu, 10 Aug 2023 at 20:36, Dorian Hoxha <dorian.hoxha@gmail.com> wrote:
>>> Do Not Use Partial Indexes as a Substitute for Partitioning
>>> While a search in this larger index might have to descend through a couple more tree levels than a search in a
smallerindex, that's almost certainly going to be cheaper than the planner effort needed to select the appropriate one
ofthe partial indexes. The core of the problem is that the system does not understand the relationship among the
partialindexes, and will laboriously test each one to see if it's applicable to the current query. 
>>
>> Would this be true in my case too?
>
> Yes.  The process of determining which partial indexes are valid for
> the given query must consider each index one at a time and validate
> the index's WHERE clause against the query's WHERE clause to see if it
> can be used.  There is no shortcut that sees you have a series of
> partial indexes with WHERE id % 10 = N; which just picks 1 index
> without searching all of them.
>
>> Is it faster for the planner to select a correct partition(hash partitioning on `id` column) instead of a correct
partialindex like in my case? I don't think I'll need more than ~32 partitions/partial-indexes in an extreme scenario. 
>
> I mean, test it and find out, but probably, yes, the partition pruning
> code for hash partitioning is an O(1) operation and is very fast.
> Once the given Constants have been hashed, finding the partition is
> just a single divide operation away.
>
> David
>
>



Re: Partitioning update-heavy queue with hash partitions vs partial indexes

От
Dorian Hoxha
Дата:

@Burcin
I'm not too fond of the new column because it introduces a new index that needs to be maintained.
I could change the index on `view_time` to `(shard_id, view_time)`, but I'm afraid that may increase the time to traverse the index (because all shard_id will be the same in a query).
And maybe increase index size, but there is index compression for duplicate values on last versions.

Thank you


On Fri, Aug 11, 2023 at 7:14 AM <burcinyazici@gmail.com> wrote:
hi,
Consider adding id%10 as a new column?you will have one time burden but after creating index on it, update perf will satisfy.

Burcin 📱

> On 11 Aug 2023, at 07:49, David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Thu, 10 Aug 2023 at 20:36, Dorian Hoxha <dorian.hoxha@gmail.com> wrote:
>>> Do Not Use Partial Indexes as a Substitute for Partitioning
>>> While a search in this larger index might have to descend through a couple more tree levels than a search in a smaller index, that's almost certainly going to be cheaper than the planner effort needed to select the appropriate one of the partial indexes. The core of the problem is that the system does not understand the relationship among the partial indexes, and will laboriously test each one to see if it's applicable to the current query.
>>
>> Would this be true in my case too?
>
> Yes.  The process of determining which partial indexes are valid for
> the given query must consider each index one at a time and validate
> the index's WHERE clause against the query's WHERE clause to see if it
> can be used.  There is no shortcut that sees you have a series of
> partial indexes with WHERE id % 10 = N; which just picks 1 index
> without searching all of them.
>
>> Is it faster for the planner to select a correct partition(hash partitioning on `id` column) instead of a correct partial index like in my case? I don't think I'll need more than ~32 partitions/partial-indexes in an extreme scenario.
>
> I mean, test it and find out, but probably, yes, the partition pruning
> code for hash partitioning is an O(1) operation and is very fast.
> Once the given Constants have been hashed, finding the partition is
> just a single divide operation away.
>
> David
>
>