Обсуждение: Distributed Table Partitioning

Поиск
Список
Период
Сортировка

Distributed Table Partitioning

От
Leonardo M. Ramé
Дата:
I have this problem: a Master table containing records with a timestamp
column registering creation date-time, and one Detail table containing
info related to the Master table.

As time went by, those tables grew enormously, and I can't afford
expanding my SSD VPS. So I'm thinking about storing only NEW data into
it, and move OLD data to a cheaper SATA VPS.

The goal is using the SSD server as "main", and the other (or others?)
as "child", so queries still go to the main server, it somehow detects
which records must be fetched from it and what from the child servers,
then return the "composed" dataset to the caller.

I think this is called Distributed Horizontal Table Partitioning.


Is there a way to do this without changing my application code?.

Regards,
--
Leonardo M. Ramé
Medical IT - Griensu S.A.
Av. Colón 636 - Piso 8 Of. A
X5000EPT -- Córdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877


Re: Distributed Table Partitioning

От
Alvaro Aguayo Garcia-Rada
Дата:
Hi. I think pgpool-II can do that job for you. It's a middleware, so you can use it without even changing your app
code(butyour postgres configuration). It suppoerts many clustering functions, including replication, failover, and a
lotmore; it also supports partitioning. so that may be suitable for you. Check the tutorial, it even has some examples:
http://www.pgpool.net/docs/latest/tutorial-en.html

Alvaro Aguayo
Jefe de Operaciones
Open Comb Systems E.I.R.L.

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103  | RPC: (+51) 954183248
Website: www.ocs.pe

----- Original Message -----
From: "Leonardo M. Ramé" <l.rame@griensu.com>
To: "PostgreSql-general" <pgsql-general@postgresql.org>
Sent: Saturday, 12 March, 2016 8:25:01 PM
Subject: [GENERAL] Distributed Table Partitioning

I have this problem: a Master table containing records with a timestamp
column registering creation date-time, and one Detail table containing
info related to the Master table.

As time went by, those tables grew enormously, and I can't afford
expanding my SSD VPS. So I'm thinking about storing only NEW data into
it, and move OLD data to a cheaper SATA VPS.

The goal is using the SSD server as "main", and the other (or others?)
as "child", so queries still go to the main server, it somehow detects
which records must be fetched from it and what from the child servers,
then return the "composed" dataset to the caller.

I think this is called Distributed Horizontal Table Partitioning.


Is there a way to do this without changing my application code?.

Regards,
--
Leonardo M. Ramé
Medical IT - Griensu S.A.
Av. Colón 636 - Piso 8 Of. A
X5000EPT -- Córdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: Distributed Table Partitioning

От
Melvin Davidson
Дата:


On Sat, Mar 12, 2016 at 8:33 PM, Alvaro Aguayo Garcia-Rada <aaguayo@opensysperu.com> wrote:
Hi. I think pgpool-II can do that job for you. It's a middleware, so you can use it without even changing your app code(but your postgres configuration). It suppoerts many clustering functions, including replication, failover, and a lot more; it also supports partitioning. so that may be suitable for you. Check the tutorial, it even has some examples: http://www.pgpool.net/docs/latest/tutorial-en.html

Alvaro Aguayo
Jefe de Operaciones
Open Comb Systems E.I.R.L.

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103  | RPC: (+51) 954183248
Website: www.ocs.pe

----- Original Message -----
From: "Leonardo M. Ramé" <l.rame@griensu.com>
To: "PostgreSql-general" <pgsql-general@postgresql.org>
Sent: Saturday, 12 March, 2016 8:25:01 PM
Subject: [GENERAL] Distributed Table Partitioning

I have this problem: a Master table containing records with a timestamp
column registering creation date-time, and one Detail table containing
info related to the Master table.

As time went by, those tables grew enormously, and I can't afford
expanding my SSD VPS. So I'm thinking about storing only NEW data into
it, and move OLD data to a cheaper SATA VPS.

The goal is using the SSD server as "main", and the other (or others?)
as "child", so queries still go to the main server, it somehow detects
which records must be fetched from it and what from the child servers,
then return the "composed" dataset to the caller.

I think this is called Distributed Horizontal Table Partitioning.


Is there a way to do this without changing my application code?.

Regards,
--
Leonardo M. Ramé
Medical IT - Griensu S.A.
Av. Colón 636 - Piso 8 Of. A
X5000EPT -- Córdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Why don't you just make use of tablespaces and partition the child tablespaces
so that the newer parttion is on the SSD and the older one is on SATA?
You will need a trigger and tg function to handle inserts

eg: {note: below is psuedo code}
      child {master} (SSD) NO ROWS33
      child1 (SSD) CONSTRAINT timestamp > {specified time}
      tg_insert_child1 on insert execute tgf_split_data
      child2 (SATA) CONSTRAINT timestamp <= {specified time}
      tg_insert_child2 on insert execute tgf_split_data
     
     tgf_split_data()
       if timestamp > {specified time}
         insert into child1
       else
         insert into child2
       endif
--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: Distributed Table Partitioning

От
Melvin Davidson
Дата:
oops! Better example
eg: {note: below is psuedo code}
      child {master} (SSD) NO ROWS33
      tg_insert_child before insert execute tgf_split_data
      child1 (SSD) CONSTRAINT timestamp > {specified time}
      child2 (SATA) CONSTRAINT timestamp <= {specified time}
     
     tgf_split_data()
       if timestamp > {specified time}
         insert into child1
       else
         insert into child2
       endif

On Sat, Mar 12, 2016 at 9:19 PM, Melvin Davidson <melvin6925@gmail.com> wrote:


On Sat, Mar 12, 2016 at 8:33 PM, Alvaro Aguayo Garcia-Rada <aaguayo@opensysperu.com> wrote:
Hi. I think pgpool-II can do that job for you. It's a middleware, so you can use it without even changing your app code(but your postgres configuration). It suppoerts many clustering functions, including replication, failover, and a lot more; it also supports partitioning. so that may be suitable for you. Check the tutorial, it even has some examples: http://www.pgpool.net/docs/latest/tutorial-en.html

Alvaro Aguayo
Jefe de Operaciones
Open Comb Systems E.I.R.L.

Oficina: (+51-1) 3377813 | RPM: #034252 / (+51) 995540103  | RPC: (+51) 954183248
Website: www.ocs.pe

----- Original Message -----
From: "Leonardo M. Ramé" <l.rame@griensu.com>
To: "PostgreSql-general" <pgsql-general@postgresql.org>
Sent: Saturday, 12 March, 2016 8:25:01 PM
Subject: [GENERAL] Distributed Table Partitioning

I have this problem: a Master table containing records with a timestamp
column registering creation date-time, and one Detail table containing
info related to the Master table.

As time went by, those tables grew enormously, and I can't afford
expanding my SSD VPS. So I'm thinking about storing only NEW data into
it, and move OLD data to a cheaper SATA VPS.

The goal is using the SSD server as "main", and the other (or others?)
as "child", so queries still go to the main server, it somehow detects
which records must be fetched from it and what from the child servers,
then return the "composed" dataset to the caller.

I think this is called Distributed Horizontal Table Partitioning.


Is there a way to do this without changing my application code?.

Regards,
--
Leonardo M. Ramé
Medical IT - Griensu S.A.
Av. Colón 636 - Piso 8 Of. A
X5000EPT -- Córdoba
Tel.: +54(351)4246924 +54(351)4247788 +54(351)4247979 int. 19
Cel.: +54 9 (011) 40871877


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Why don't you just make use of tablespaces and partition the child tablespaces
so that the newer parttion is on the SSD and the older one is on SATA?
You will need a trigger and tg function to handle inserts

eg: {note: below is psuedo code}
      child {master} (SSD) NO ROWS33
      child1 (SSD) CONSTRAINT timestamp > {specified time}
      tg_insert_child1 on insert execute tgf_split_data
      child2 (SATA) CONSTRAINT timestamp <= {specified time}
      tg_insert_child2 on insert execute tgf_split_data
     
     tgf_split_data()
       if timestamp > {specified time}
         insert into child1
       else
         insert into child2
       endif
--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.




--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: Distributed Table Partitioning

От
"Peter J. Holzer"
Дата:
On 2016-03-12 21:19:11 -0500, Melvin Davidson wrote:
>     ----- Original Message -----
>     From: "Leonardo M. Ramé" <l.rame@griensu.com>
>     To: "PostgreSql-general" <pgsql-general@postgresql.org>
>     Sent: Saturday, 12 March, 2016 8:25:01 PM
>     Subject: [GENERAL] Distributed Table Partitioning
>
>     I have this problem: a Master table containing records with a timestamp
>     column registering creation date-time, and one Detail table containing
>     info related to the Master table.
>
>     As time went by, those tables grew enormously, and I can't afford
>     expanding my SSD VPS. So I'm thinking about storing only NEW data into
>     it, and move OLD data to a cheaper SATA VPS.
[...]
> Why don't you just make use of tablespaces and partition the child tablespaces
> so that the newer parttion is on the SSD and the older one is on SATA?

Since he mentioned virtual private servers (VPS) the reason might be
that his hoster offers VPS with SSDs (of various sizes) and VPS with
rotating hard disks (of various sizes), but not VPS with both. So he
can't rent a VPS with a relatively small SSD and a larger hard disk.

That might be a reason to look for an alternate hoster, but if he's
otherwise happy, switching to an unknown provider might be considered
too large a risk.

        hp

--
   _  | Peter J. Holzer    | I want to forget all about both belts and
|_|_) |                    | suspenders; instead, I want to buy pants
| |   | hjp@hjp.at         | that actually fit.
__/   | http://www.hjp.at/ |   -- http://noncombatant.org/

Вложения

Re: Distributed Table Partitioning

От
Leonardo M. Ramé
Дата:
El 13/03/16 a las 10:04, Peter J. Holzer escribió:
> On 2016-03-12 21:19:11 -0500, Melvin Davidson wrote:
>>      ----- Original Message -----
>>      From: "Leonardo M. Ramé" <l.rame@griensu.com>
>>      To: "PostgreSql-general" <pgsql-general@postgresql.org>
>>      Sent: Saturday, 12 March, 2016 8:25:01 PM
>>      Subject: [GENERAL] Distributed Table Partitioning
>>
>>      I have this problem: a Master table containing records with a timestamp
>>      column registering creation date-time, and one Detail table containing
>>      info related to the Master table.
>>
>>      As time went by, those tables grew enormously, and I can't afford
>>      expanding my SSD VPS. So I'm thinking about storing only NEW data into
>>      it, and move OLD data to a cheaper SATA VPS.
> [...]
>> Why don't you just make use of tablespaces and partition the child tablespaces
>> so that the newer parttion is on the SSD and the older one is on SATA?
>
> Since he mentioned virtual private servers (VPS) the reason might be
> that his hoster offers VPS with SSDs (of various sizes) and VPS with
> rotating hard disks (of various sizes), but not VPS with both. So he
> can't rent a VPS with a relatively small SSD and a larger hard disk.
>
> That might be a reason to look for an alternate hoster, but if he's
> otherwise happy, switching to an unknown provider might be considered
> too large a risk.
>
>          hp
>

Yes, Peter is right, I must store one table in one VPS and the other in
a 2nd VPS.

I'm thinking of partitioning the table, on local and the remote using a
Foreign Data Wrapper, what do you think about this approach?.


Re: Distributed Table Partitioning

От
Melvin Davidson
Дата:
I guess that would work if you used a view to combine both tables, but you would still need a BEFORE trigger to make sure DML goes to the appropriate table.
Another solution might be to use dblink if you need to access the SATA table on the 2nd VPS,

On Sun, Mar 13, 2016 at 11:37 AM, Leonardo M. Ramé <l.rame@griensu.com> wrote:
El 13/03/16 a las 10:04, Peter J. Holzer escribió:
On 2016-03-12 21:19:11 -0500, Melvin Davidson wrote:
     ----- Original Message -----
     From: "Leonardo M. Ramé" <l.rame@griensu.com>
     To: "PostgreSql-general" <pgsql-general@postgresql.org>
     Sent: Saturday, 12 March, 2016 8:25:01 PM
     Subject: [GENERAL] Distributed Table Partitioning

     I have this problem: a Master table containing records with a timestamp
     column registering creation date-time, and one Detail table containing
     info related to the Master table.

     As time went by, those tables grew enormously, and I can't afford
     expanding my SSD VPS. So I'm thinking about storing only NEW data into
     it, and move OLD data to a cheaper SATA VPS.
[...]
Why don't you just make use of tablespaces and partition the child tablespaces
so that the newer parttion is on the SSD and the older one is on SATA?

Since he mentioned virtual private servers (VPS) the reason might be
that his hoster offers VPS with SSDs (of various sizes) and VPS with
rotating hard disks (of various sizes), but not VPS with both. So he
can't rent a VPS with a relatively small SSD and a larger hard disk.

That might be a reason to look for an alternate hoster, but if he's
otherwise happy, switching to an unknown provider might be considered
too large a risk.

         hp


Yes, Peter is right, I must store one table in one VPS and the other in a 2nd VPS.

I'm thinking of partitioning the table, on local and the remote using a Foreign Data Wrapper, what do you think about this approach?.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.