Обсуждение: Clustered index to preserve data locality in a multitenant application?

Поиск
Список
Период
Сортировка

Clustered index to preserve data locality in a multitenant application?

От
Nicolas Grilly
Дата:
Hello,

We are developing a multitenant application which is currently based on MySQL, but we're thinking of migrating to PostgreSQL.

We rely on clustered indexes to preserve data locality for each tenant. Primary keys start with the tenant ID. This way, rows belonging to the same tenant are stored next to each other. Because all requests hit only one tenant, this is a great performance improvement.

PostgreSQL doesn't have clustered indexes — I'm aware of the CLUSTER command but it's a one-time operation — and I'm wondering if this can be a problem or not.

Let's say we have a table containing data for 10,000 tenants and 10,000 rows per tenant, for a total of 100,000,000 rows. Let's say each 8 KB block contains ~10 rows. Let's way we want to compute the sum of an integer column for all rows belonging to a given tenant ID.

In MySQL/InnoDB, rows are stored in the leaf nodes of a B-tree. To compute the sum, MySQL has to read at least 1,000 blocks (each block containing ~10 rows). I deliberately neglect the cost of walking the B-tree intermediate nodes.

By comparison, PostgreSQL has to read at least 10,000 blocks (each block containing ~10 rows, but most of the time, only one row will match the tenant ID, other rows belonging to other tenants).

A few questions:

- Am I missing something?
- Am I overestimating the benefit of a clustered index in our case, and the cost of not having one in PostgreSQL?
- Is there another technical solution to this problem?

Thanks,

Nicolas Grilly
Garden / Vocation City - Web Recruitment Software
Managing Partner
+33 6 03 00 25 34

Re: Clustered index to preserve data locality in a multitenant application?

От
Vick Khera
Дата:
On Tue, Aug 30, 2016 at 7:10 AM, Nicolas Grilly
<nicolas@vocationcity.com> wrote:
> Let's say we have a table containing data for 10,000 tenants and 10,000 rows
> per tenant, for a total of 100,000,000 rows. Let's say each 8 KB block
> contains ~10 rows. Let's way we want to compute the sum of an integer column
> for all rows belonging to a given tenant ID.

I'll assume you have an index on the tenant ID. In that case, your
queries will be pretty fast.

On some instances, we have multi-column indexes starting with the
tenant ID, and those are used very effectively as well.

I never worry about data locality.

Depending on your data distribution, you may want to consider table
partitions based on the tenant id. I personally never bother with
that, but split based on some other key in the data.


Re: Clustered index to preserve data locality in a multitenant application?

От
Nicolas Grilly
Дата:
On Tue, Aug 30, 2016 at 7:26 PM, Vick Khera <vivek@khera.org> wrote:
I'll assume you have an index on the tenant ID. In that case, your
queries will be pretty fast.

On some instances, we have multi-column indexes starting with the
tenant ID, and those are used very effectively as well.

I never worry about data locality.

Yes, we have an index starting with the tenant ID, and the query uses the index.

But I'm still worried about PostgreSQL having to fetch 10 times more pages from the disk than MySQL. If each 8K page contains approximately 10 rows, fetching one page in MySQL will return 10 "useful" rows belonging to the tenant. By comparison, fetching one page in PostgreSQL will probably return only 1 "useful" row belonging to the tenant. In terms of IO, it's a big difference.
 
Depending on your data distribution, you may want to consider table
partitions based on the tenant id. I personally never bother with
that, but split based on some other key in the data.

You're right. I don't really need a clustered index (like InnoDB). What I need is to store rows belonging to the same tenant close from each other. Partitioning can help with that. But the lack of declarative partitioning makes it cumbersome (I've read this is worked on). 

Re: Clustered index to preserve data locality in a multitenant application?

От
Kenneth Marshall
Дата:
On Tue, Aug 30, 2016 at 07:51:58PM +0200, Nicolas Grilly wrote:
> On Tue, Aug 30, 2016 at 7:26 PM, Vick Khera <vivek@khera.org> wrote:
>
> > I'll assume you have an index on the tenant ID. In that case, your
> > queries will be pretty fast.
> >
> > On some instances, we have multi-column indexes starting with the
> > tenant ID, and those are used very effectively as well.
> >
> > I never worry about data locality.
> >
>
> Yes, we have an index starting with the tenant ID, and the query uses the
> index.
>
> But I'm still worried about PostgreSQL having to fetch 10 times more pages
> from the disk than MySQL. If each 8K page contains approximately 10 rows,
> fetching one page in MySQL will return 10 "useful" rows belonging to the
> tenant. By comparison, fetching one page in PostgreSQL will probably return
> only 1 "useful" row belonging to the tenant. In terms of IO, it's a big
> difference.
>
>
> > Depending on your data distribution, you may want to consider table
> > partitions based on the tenant id. I personally never bother with
> > that, but split based on some other key in the data.
> >
>
> You're right. I don't really need a clustered index (like InnoDB). What I
> need is to store rows belonging to the same tenant close from each other.
> Partitioning can help with that. But the lack of declarative partitioning
> makes it cumbersome (I've read this is worked on).

Hi,

We have been using the extension pg_repack to keep a table groomed into
cluster order. With an appropriate FILLFACTOR to keep updates on the same
page, it works well. The issue is that it needs space to rebuild the new
index/table. If you have that, it works well.

Regards,
Ken


Re: Clustered index to preserve data locality in a multitenant application?

От
Nicolas Grilly
Дата:
On Tue, Aug 30, 2016 at 8:17 PM, Kenneth Marshall <ktm@rice.edu> wrote:
We have been using the extension pg_repack to keep a table groomed into
cluster order. With an appropriate FILLFACTOR to keep updates on the same
page, it works well. The issue is that it needs space to rebuild the new
index/table. If you have that, it works well.

Interesting!
Do you run pg_repack on a regular schedule using something like cron, or does it run automatically in the background?
Is it compatible with PostgreSQL 9.6?

Re: Clustered index to preserve data locality in a multitenant application?

От
Nicolas Grilly
Дата:
Eduardo Morras wrote:
 
Check BRIN indexs, they are "designed for handling very large tables in
which certain columns have some natural correlation with their physical
location within the table", I think they fit your needs.

Yes, a BRIN index on the tenant ID would be very useful if the rows in the heap were naturally sorted by the tenant ID, but they are not. They are naturally sorted by their order of insertion, which is completely unrelated. The first step in solving this is to find a way to keep rows belonging to the same tenant close to each other. The second step could be to use a BRIN index.

Re: Clustered index to preserve data locality in a multitenant application?

От
Nicolas Grilly
Дата:
Mike Sofen wrote: 
For Nicolas’s situation, that would require 10,000 partitions – not very useful, and each partition would be very small.

This is exactly my conclusion about using partitions in my situation.
 
In Postgres, as you mentioned, clustering is a “one time” operation but only in the sense that after you add more rows, you’ll need to re-cluster the table.  Depending on the activity model for that table, that may be feasible/ok.  For example, if you load it via regular batch scripts, then the clustering could be done after those loads.  If you add rows only rarely but then do lots of updates, then the clustering would work great.  If this is an active real time data table, then clustering would not be viable.

The application is very interactive and news rows are inserted all the time in my use case.

Thanks for your time,

Nicolas

Re: Clustered index to preserve data locality in a multitenant application?

От
Kenneth Marshall
Дата:
On Wed, Aug 31, 2016 at 05:23:50PM +0200, Nicolas Grilly wrote:
> On Tue, Aug 30, 2016 at 8:17 PM, Kenneth Marshall <ktm@rice.edu> wrote:
>
> > We have been using the extension pg_repack to keep a table groomed into
> > cluster order. With an appropriate FILLFACTOR to keep updates on the same
> > page, it works well. The issue is that it needs space to rebuild the new
> > index/table. If you have that, it works well.
> >
>
> Interesting!
> Do you run pg_repack on a regular schedule using something like cron, or
> does it run automatically in the background?
> Is it compatible with PostgreSQL 9.6?

Hi,

We just run it via cron. In our case, we run it once a day, but depending on
your churn, it could be run once a week or more.

Regards,
Ken


Re: Clustered index to preserve data locality in a multitenant application?

От
Nicolas Grilly
Дата:
On Wed, Aug 31, 2016 at 6:05 PM, Kenneth Marshall <ktm@rice.edu> wrote:
We just run it via cron. In our case, we run it once a day, but depending on
your churn, it could be run once a week or more.

Could you provide some numbers: what is the size of the tables or tables that are repacked? how long does it take? is there a performance impact during the repack?

Re: Clustered index to preserve data locality in a multitenant application?

От
Kenneth Marshall
Дата:
On Wed, Aug 31, 2016 at 06:06:54PM +0200, Nicolas Grilly wrote:
> On Wed, Aug 31, 2016 at 6:05 PM, Kenneth Marshall <ktm@rice.edu> wrote:
>
> > We just run it via cron. In our case, we run it once a day, but depending
> > on
> > your churn, it could be run once a week or more.
> >
>
> Could you provide some numbers: what is the size of the tables or tables
> that are repacked? how long does it take? is there a performance impact
> during the repack?
Hi,

The key is that there must be enough I/O capacity to handle your routine
DB access needs plus the additional usage by the repack operation. Running
the repack during a slow period does not tangibly impact DB performance and
completes in a reasonable period of time. Your best best would be to set
up a test environment. By way of example, running our repack during peak
usage can take over 5 hours and less than 30m outside of peak when there
is excess I/O capacity. You should be able to use your I/O monitoring
system to get an idea about your I/O headroom.

Regards,
Ken


Re: Clustered index to preserve data locality in a multitenant application?

От
Nicolas Grilly
Дата:
On Tue, Aug 30, 2016 at 8:17 PM, Kenneth Marshall <ktm@rice.edu> wrote:
We have been using the extension pg_repack to keep a table groomed into
cluster order. With an appropriate FILLFACTOR to keep updates on the same
page, it works well. The issue is that it needs space to rebuild the new
index/table. If you have that, it works well.

It looks like Instagram has been using pg_reorg (the ancestor of pg_repack) to keep all likes from the same user contiguous on disk, in order to minimize disk seeks.


This is very similar to what I'm trying to achieve.

The article is 3 years old. I'd be curious to know if they still do that.

Nicolas

Re: Clustered index to preserve data locality in a multitenant application?

От
Ben Chobot
Дата:
On Aug 31, 2016, at 2:55 PM, Nicolas Grilly <nicolas@gardentechno.com> wrote:

It looks like Instagram has been using pg_reorg (the ancestor of pg_repack) to keep all likes from the same user contiguous on disk, in order to minimize disk seeks.


This is very similar to what I'm trying to achieve.

The article is 3 years old. I'd be curious to know if they still do that.

If what they did 3 years ago is similar to what you are trying to do today, who cares what they are doing today? (Besides using pg_repack instead of pg_reorg, of course.)

For what it's worth, we use pg_repack on a regular basis and it works exactly as advertised.

Re: Clustered index to preserve data locality in a multitenant application?

От
Nicolas Grilly
Дата:
On Thu, Sep 1, 2016 at 12:05 AM, Ben Chobot <bench@silentmedia.com> wrote:
If what they did 3 years ago is similar to what you are trying to do today, who cares what they are doing today? (Besides using pg_repack instead of pg_reorg, of course.)

I'm curious because, in the meantime, Instagram could have stopped doing this because 1) it didn't work as intended or 2) they found a better solution.
 
For what it's worth, we use pg_repack on a regular basis and it works exactly as advertised.

Thanks.

Re: Clustered index to preserve data locality in a multitenant application?

От
Nicolas Grilly
Дата:
On Tue, Aug 30, 2016 at 8:17 PM, Kenneth Marshall <ktm@rice.edu> wrote:
We have been using the extension pg_repack to keep a table groomed into
cluster order. With an appropriate FILLFACTOR to keep updates on the same
page, it works well. The issue is that it needs space to rebuild the new
index/table. If you have that, it works well.

In DB2, it seems possible to define a "clustering index" that determines how rows are physically ordered in the "table space" (the heap).

The documentation says: "When a table has a clustering index, an INSERT statement causes DB2 to insert the records as nearly as possible in the order of their index values."

It looks like a kind of "continuous CLUSTER/pg_repack". Is there something similar available or planned for PostgreSQL?


Re: Clustered index to preserve data locality in a multitenant application?

От
Nicolas Grilly
Дата:
On Thu, Sep 1, 2016 at 12:31 AM, Nicolas Grilly <nicolas@gardentechno.com> wrote:
In DB2, it seems possible to define a "clustering index" that determines how rows are physically ordered in the "table space" (the heap).

The documentation says: "When a table has a clustering index, an INSERT statement causes DB2 to insert the records as nearly as possible in the order of their index values."

It looks like a kind of "continuous CLUSTER/pg_repack". Is there something similar available or planned for PostgreSQL?

I forgot the links to DB2 documentation about clustering index:


Re: Clustered index to preserve data locality in a multitenant application?

От
Igor Neyman
Дата:

From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Nicolas Grilly
Sent: Wednesday, August 31, 2016 6:32 PM
To: Kenneth Marshall <ktm@rice.edu>
Cc: Vick Khera <vivek@khera.org>; pgsql-general <pgsql-general@postgresql.org>
Subject: Re: [GENERAL] Clustered index to preserve data locality in a multitenant application?

 

On Tue, Aug 30, 2016 at 8:17 PM, Kenneth Marshall <ktm@rice.edu> wrote:

We have been using the extension pg_repack to keep a table groomed into
cluster order. With an appropriate FILLFACTOR to keep updates on the same
page, it works well. The issue is that it needs space to rebuild the new
index/table. If you have that, it works well.

 

In DB2, it seems possible to define a "clustering index" that determines how rows are physically ordered in the "table space" (the heap).

 

The documentation says: "When a table has a clustering index, an INSERT statement causes DB2 to insert the records as nearly as possible in the order of their index values."


It looks like a kind of "continuous CLUSTER/pg_repack". Is there something similar available or planned for PostgreSQL?

 

 

Don’t know about plans to implement clustered indexes in PostgreSQL.

 

Not sure if this was mentioned, MS SQL Server has clustered indexes, where heap row is just stored on the leaf level of the index.

Oracle also has similar feature: IOT, Index Organized Table.

 

It seems to me (may be I’m wrong), that in PostgreSQL it should be much harder to implement clustered index (with the heap row stored in the index leaf) because of the way how MVCC implemented: multiple row versions are stored in the table itself (e.g. Oracle for that purpose keeps table “clean” and stores multiple row versions in UNDO tablespace/segment).

 

Regards,

Igor Neyman

 

 

Re: Clustered index to preserve data locality in a multitenant application?

От
Eduardo Morras
Дата:
On Wed, 31 Aug 2016 17:33:18 +0200
Nicolas Grilly <nicolas@vocationcity.com> wrote:

> Eduardo Morras wrote:
>
>
> > Check BRIN indexs, they are "designed for handling very large
> > tables in which certain columns have some natural correlation with
> > their physical location within the table", I think they fit your
> > needs.
>
>
> Yes, a BRIN index on the tenant ID would be very useful if the rows
> in the heap were naturally sorted by the tenant ID, but they are not.
> They are naturally sorted by their order of insertion, which is
> completely unrelated. The first step in solving this is to find a way
> to keep rows belonging to the same tenant close to each other. The
> second step could be to use a BRIN index.

Then you can make multiple column partial indexes:

CREATE INDEX CONCURRENTLY tenant_01_idx ON big_tenant_table (the_columns_with_data_you_need, tenant_id) WHERE tenant_id
=1; 
CREATE INDEX CONCURRENTLY tenant_02_idx ON big_tenant_table (the_columns_with_data_you_need, tenant_id) WHERE tenant_id
=2; 

This way each index has the data for a tenant, is updated only when the data for that tenant is updated and each index
hasit own files and you can reindex to clean index content and debloat. 

REINDEX INDEX tenant_01_idx;

Or grouping them if there are too much indexes:
CREATE INDEX CONCURRENTLY tenant_01_idx ON big_tenant_table (the_columns_with_data_you_need, tenant_id) WHERE tenant_id
<=300; 
CREATE INDEX CONCURRENTLY tenant_02_idx ON big_tenant_table (the_columns_with_data_you_need, tenant_id) WHERE tenant_id
>300 AND tenant_id <= 600; 

---   ---
Eduardo Morras <emorrasg@yahoo.es>


Re: Clustered index to preserve data locality in a multitenant application?

От
Nicolas Grilly
Дата:
On Thu, Sep 1, 2016 at 3:08 PM, Igor Neyman <ineyman@perceptron.com> wrote:

Don’t know about plans to implement clustered indexes in PostgreSQL.


It was discussed on the mailing list in the past. 

I found an interesting thread dated from 2012 about integrating pg_reorg (the ancestor of pg_repack) in PostgreSQL core:


There is also an item titled "Automatically maintain clustering on a table" in the TODO list:

 

Not sure if this was mentioned, MS SQL Server has clustered indexes, where heap row is just stored on the leaf level of the index.

Oracle also has similar feature: IOT, Index Organized Table.

 

It seems to me (may be I’m wrong), that in PostgreSQL it should be much harder to implement clustered index (with the heap row stored in the index leaf) because of the way how MVCC implemented: multiple row versions are stored in the table itself (e.g. Oracle for that purpose keeps table “clean” and stores multiple row versions in UNDO tablespace/segment).


DB2, like PostgreSQL, stores rows in a heap, and not in the leafs of a Btree. But it's possible to define a "clustering" key for a table. When it is defined, DB2 tries to keep the rows in the heap ordered according to the clustering key. If DB2 can’t find space on the page where the row should go, then it looks a few pages before and after and puts it there, and if it still can’t find space then puts it at the end. There is also a feature called "multidimensional clustering" which is even more sophisticated. There is also a command REORG, which would be the equivalent of a non-blocking CLUSTER in PostgreSQL.

I think DB2's approach is interesting because it shows that maintaining spatial coherency is possible with a heap, without having to store rows in a Btree (like InnoDB).

Re: Clustered index to preserve data locality in a multitenant application?

От
Nicolas Grilly
Дата:
On Tue, Aug 30, 2016 at 1:10 PM, Nicolas Grilly <nicolas@vocationcity.com> wrote:
We are developing a multitenant application which is currently based on MySQL, but we're thinking of migrating to PostgreSQL.

We rely on clustered indexes to preserve data locality for each tenant. Primary keys start with the tenant ID. This way, rows belonging to the same tenant are stored next to each other. Because all requests hit only one tenant, this is a great performance improvement.

PostgreSQL doesn't have clustered indexes — I'm aware of the CLUSTER command but it's a one-time operation — and I'm wondering if this can be a problem or not.

Let's say we have a table containing data for 10,000 tenants and 10,000 rows per tenant, for a total of 100,000,000 rows. Let's say each 8 KB block contains ~10 rows. Let's way we want to compute the sum of an integer column for all rows belonging to a given tenant ID.

In MySQL/InnoDB, rows are stored in the leaf nodes of a B-tree. To compute the sum, MySQL has to read at least 1,000 blocks (each block containing ~10 rows). I deliberately neglect the cost of walking the B-tree intermediate nodes.

By comparison, PostgreSQL has to read at least 10,000 blocks (each block containing ~10 rows, but most of the time, only one row will match the tenant ID, other rows belonging to other tenants).

A few questions:

- Am I missing something?
- Am I overestimating the benefit of a clustered index in our case, and the cost of not having one in PostgreSQL?
- Is there another technical solution to this problem?


After a few days thinking and researching about this, I've come to the following conclusion:

1) Trying to preserve data locality in a multi-tenant application is a legitimate requirement (because it can significantly improve performance).
2) But index organized tables are not the right answer.
3) Partitioning by tenant_id, sorting heap rows by tenant_id, and/or creating covering indexes (for index-only scans) are better solutions.

I found at least two examples of well-known companies using PostgreSQL at scale for a multi-tenant application, and taking specific steps to preserve data locality:

- Heap shards its data by customer_id (all rows in a logical shard belong to the same customer — except for small customers, but it's easy to make their queries fast anyway) [1].
- Instagram uses pg_reorg (the ancestor of pg_repack) to keep likes created by the same user contiguous on disk [2].

At first, I thought that index organized tables (aka clustered indexes) were the solution, and that missing them in PostgreSQL could be an issue, but I was wrong. They are not the right tool for the job.

Index organized tables are a powerful tool for an application that needs very fast lookups and range scans on the primary key. But they also have significant drawbacks [3]:

- Lookups on secondary keys are slower (because they need one more indirection).
- The index is bigger (because rows are stored directly in the index, instead of a heap).
- InnoDB can only use the primary key as clustering key, which is very restrictive (for example, in PostgreSQL, most recently inserted/updated rows are naturally clustered together, independently of the chosen primary key).

So, PostgreSQL uses heap organized tables instead of index organized tables, and this is a good thing, at least for the kind of queries my application needs. But, at scale, I still need to find a way to preserve data locality for each tenant.

The solutions I've identified are:

- Partition by tenant_id as suggested by Thomas Kellerer earlier in this thread. Declarative partitioning will make this easier in a future version of PostgreSQL. It's also possible to "shard" using Citus Data (like Heap or CloudFlare).
- Periodically sort rows by tenant_id in the heap, using something like pg_repack, as suggested by Kenneth Marshall and Ben Chobot.
- Create covering indexes, which let PostgreSQL do index-only scans (exactly like an index organized table), as suggested by Eduardo Morras and engineers at Heap [4].

It looks like I can move forward with our migration from MySQL to PostgreSQL, without worrying about the lack of clustered indexes, because there are better solutions to keep tenant data contiguous!

Thanks for all your comments.

Nicolas Grilly


Re: Clustered index to preserve data locality in a multitenant application?

От
Jim Nasby
Дата:
On 9/6/16 11:21 AM, Nicolas Grilly wrote:
> It looks like I can move forward with our migration from MySQL to
> PostgreSQL, without worrying about the lack of clustered indexes,
> because there are better solutions to keep tenant data contiguous!

First rule of performance tuning: don't. :) There are lots of areas
where Postgres can be expected to perform better than MySQL, so without
testing your app you really don't know how it's going to fare.

There's also another option: use a logical replication system (such as
pg_logical, BDR, londiste or Slony) to maintain at least one replica.
You can take that replica down to perform maintenance (such as a
database-wide CLUSTER) as needed, and let replication catch up once you
bring it back online. That, combined with scripted failover makes a lot
of database maintenance items far easier, at the cost of having to
maintain the replication. Depending on your needs, a major benefit to
this method is it makes major version upgrades very simple: you just
stand up a new replica on the new version and then failover to it. If
anything goes wrong, you can fail back to the old version without losing
any data.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461