Обсуждение: Using Postgres to store high volume streams of sensor readings
Hello all! I would like to ask some advice about the following problem (related to the Dehems project: http://www.dehems.eu/ ): * there are some clients; (the clients are in fact house holds;) * each device has a number of sensors (about 10), and not all the clients have the same sensor; also sensors might appear and disappear dynamicaly; (the sensors are appliances;) * for each device and each sensor a reading is produced (at about 6 seconds); (the values could be power consumptions;) * I would like to store the following data: (client, sensor, timestamp, value); * the usual queries are: * for a given client (and sensor), and time interval, I need the min, max, and avg of the values; * for a given time interval (and sensor), I need min, max, and avg of the values; * other statistics; Currently I'm benchmarking the following storage solutions for this: * Hypertable (http://www.hypertable.org/) -- which has good insert rate (about 250k inserts / s), but slow read rate (about 150k reads / s); (the aggregates are manually computed, as Hypertable does not support other queries except scanning (in fact min, and max are easy beeing the first / last key in the ordered set, but avg must be done by sequential scan);) * BerkeleyDB -- quite Ok insert rate (about 50k inserts / s), but fabulos read rate (about 2M reads / s); (the same issue with aggregates;) * Postgres -- which behaves quite poorly (see below)... * MySQL -- next to be tested; So what can I do / how could I optimize the use of Postgres for this usage? (I'm aware that there could be optimizations for this problem (like computing the aggregates in memory and storing only these aggregates at 10 minutes, or other interval), but I want to have the full history (for data mining tasks for example);) I will also like to present how was the Postgres benchmark implemented: * test machine: Linux (Ubuntu 8.04 x64), IBM x3750, 2 x 500Gb, SCSI, RAID 0, 8Gb RAM, 2 x Intel Xeon 5160 x 2 core (3Ghz, 4Mb L2); * Postgres version: 8.3.3; * database schema: > create table sds_benchmark_data ( > client int4 not null, > sensor int4 not null, > timestamp int4 not null, > value int4 > ); > > alter table sds_benchmark_data add primary key (client, sensor, timestamp); > > create index sds_benchmark_data__client_sensor__index on sds_benchmark_data (client, sensor); * postgres.conf (the default values, I will list only what has been changed): > max_connections = 20 > shared_buffers = 24MB > work_mem = 64MB > maintenance_work_mem = 256MB > fsync = off > checkpoint_segments = 32 > effective_cache_size = 1024MB * inserts are done like this: * generated 100 million readings by using the following rule: * client is randomly chosen between 0 and 10 thousand; * sensor is randomly chosen between 0 and 10; * the timestamp is always increasing by one; * the insert is done in batches of 500 thousand inserts (I've also tried 5, 25, 50 and 100 thousand without big impact); * the banch inserts are done through COPY sds_benchmark_data FROM STDIN through libpq (by using UNIX (local) sockets); What have I observed / tried: * I've tested without the primary key and the index, and the results were the best for inserts (600k inserts / s), but the readings, worked extremly slow (due to the lack of indexing); * with only the index (or only the primary key) the insert rate is good at start (for the first 2 million readings), but then drops to about 200 inserts / s; So could someone point me where I'me wrong, or what can I do to optimize Postgres for this particular task? Thanks for your help, Ciprian Dorin Craciun. P.S.: I'll want to publish the benchmark results after they are done, and I want to squeeze as much power out of Postgres as possible.
On Fri, Nov 21, 2008 at 02:50:45PM +0200, Ciprian Dorin Craciun wrote: > Hello all! > > I would like to ask some advice about the following problem > (related to the Dehems project: http://www.dehems.eu/ ): > * there are some clients; (the clients are in fact house holds;) > * each device has a number of sensors (about 10), and not all the > clients have the same sensor; also sensors might appear and disappear > dynamicaly; (the sensors are appliances;) > * for each device and each sensor a reading is produced (at about > 6 seconds); (the values could be power consumptions;) > * I would like to store the following data: (client, sensor, > timestamp, value); > * the usual queries are: > * for a given client (and sensor), and time interval, I need > the min, max, and avg of the values; > * for a given time interval (and sensor), I need min, max, and > avg of the values; > * other statistics; > > Currently I'm benchmarking the following storage solutions for this: > * Hypertable (http://www.hypertable.org/) -- which has good insert > rate (about 250k inserts / s), but slow read rate (about 150k reads / > s); (the aggregates are manually computed, as Hypertable does not > support other queries except scanning (in fact min, and max are easy > beeing the first / last key in the ordered set, but avg must be done > by sequential scan);) > * BerkeleyDB -- quite Ok insert rate (about 50k inserts / s), but > fabulos read rate (about 2M reads / s); (the same issue with > aggregates;) > * Postgres -- which behaves quite poorly (see below)... > * MySQL -- next to be tested; For this case I would use RRDTool: http://oss.oetiker.ch/rrdtool/ Regards, Gerhard
Вложения
On Fri, Nov 21, 2008 at 2:55 PM, Gerhard Heift <ml-postgresql-20081012-3518@gheift.de> wrote: > On Fri, Nov 21, 2008 at 02:50:45PM +0200, Ciprian Dorin Craciun wrote: >> Hello all! >> >> I would like to ask some advice about the following problem >> (related to the Dehems project: http://www.dehems.eu/ ): >> * there are some clients; (the clients are in fact house holds;) >> * each device has a number of sensors (about 10), and not all the >> clients have the same sensor; also sensors might appear and disappear >> dynamicaly; (the sensors are appliances;) >> * for each device and each sensor a reading is produced (at about >> 6 seconds); (the values could be power consumptions;) >> * I would like to store the following data: (client, sensor, >> timestamp, value); >> * the usual queries are: >> * for a given client (and sensor), and time interval, I need >> the min, max, and avg of the values; >> * for a given time interval (and sensor), I need min, max, and >> avg of the values; >> * other statistics; >> >> Currently I'm benchmarking the following storage solutions for this: >> * Hypertable (http://www.hypertable.org/) -- which has good insert >> rate (about 250k inserts / s), but slow read rate (about 150k reads / >> s); (the aggregates are manually computed, as Hypertable does not >> support other queries except scanning (in fact min, and max are easy >> beeing the first / last key in the ordered set, but avg must be done >> by sequential scan);) >> * BerkeleyDB -- quite Ok insert rate (about 50k inserts / s), but >> fabulos read rate (about 2M reads / s); (the same issue with >> aggregates;) >> * Postgres -- which behaves quite poorly (see below)... >> * MySQL -- next to be tested; > > For this case I would use RRDTool: http://oss.oetiker.ch/rrdtool/ > > Regards, > Gerhard > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > > iEYEARECAAYFAkkmr9wACgkQa8fhU24j2flejQCeKl650qmSlomovOXDl6IKrADb > cTAAnRebAFq420MuW9aMmhoFOo+sPIje > =Zcoo > -----END PGP SIGNATURE----- Hy Gerhard, I know about RRDTool, but it has some limitations: * I must know in advance the number of sensors; * I must create for each client a file (and If I have 10 thousand clients?); * I have a limited amount of history; * (I'm not sure about this one but i think that) I must insert each data point by executing a command; * and also I can not replicate (distribute) it easily; Or have you used RRDTool in a similar context as mine? Do you have some benchmarks? Ciprian.
you'll have to provide us with some sort of test-case to get some answers, please. (set of scripts, queries, etc).
On Fri, Nov 21, 2008 at 3:18 PM, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote: > you'll have to provide us with some sort of test-case to get some answers, > please. (set of scripts, queries, etc). Bellow is the content of my original post. Inside I mention exactly the may the benchmark was conducted. In short the data is inserted by using COPY sds_benchmark_data from STDIN, in batches of 500 thousand data points. I'll also paste the important part here: > * Postgres version: 8.3.3; > > * database schema: >> create table sds_benchmark_data ( >> client int4 not null, >> sensor int4 not null, >> timestamp int4 not null, >> value int4 >> ); >> >> alter table sds_benchmark_data add primary key (client, sensor, timestamp); >> >> create index sds_benchmark_data__client_sensor__index on sds_benchmark_data (client, sensor); > > * postgres.conf (the default values, I will list only what has > been changed): >> max_connections = 20 >> shared_buffers = 24MB >> work_mem = 64MB >> maintenance_work_mem = 256MB >> fsync = off >> checkpoint_segments = 32 >> effective_cache_size = 1024MB > > * inserts are done like this: > * generated 100 million readings by using the following rule: > * client is randomly chosen between 0 and 10 thousand; > * sensor is randomly chosen between 0 and 10; > * the timestamp is always increasing by one; > * the insert is done in batches of 500 thousand inserts (I've > also tried 5, 25, 50 and 100 thousand without big impact); > * the banch inserts are done through COPY sds_benchmark_data > FROM STDIN through libpq (by using UNIX (local) sockets); Unfortunately I don't know what more information to give... Thanks, Ciprian Dorin Craciun. On Fri, Nov 21, 2008 at 2:50 PM, Ciprian Dorin Craciun <ciprian.craciun@gmail.com> wrote: > Hello all! > > I would like to ask some advice about the following problem > (related to the Dehems project: http://www.dehems.eu/ ): > * there are some clients; (the clients are in fact house holds;) > * each device has a number of sensors (about 10), and not all the > clients have the same sensor; also sensors might appear and disappear > dynamicaly; (the sensors are appliances;) > * for each device and each sensor a reading is produced (at about > 6 seconds); (the values could be power consumptions;) > * I would like to store the following data: (client, sensor, > timestamp, value); > * the usual queries are: > * for a given client (and sensor), and time interval, I need > the min, max, and avg of the values; > * for a given time interval (and sensor), I need min, max, and > avg of the values; > * other statistics; > > Currently I'm benchmarking the following storage solutions for this: > * Hypertable (http://www.hypertable.org/) -- which has good insert > rate (about 250k inserts / s), but slow read rate (about 150k reads / > s); (the aggregates are manually computed, as Hypertable does not > support other queries except scanning (in fact min, and max are easy > beeing the first / last key in the ordered set, but avg must be done > by sequential scan);) > * BerkeleyDB -- quite Ok insert rate (about 50k inserts / s), but > fabulos read rate (about 2M reads / s); (the same issue with > aggregates;) > * Postgres -- which behaves quite poorly (see below)... > * MySQL -- next to be tested; > > So what can I do / how could I optimize the use of Postgres for this usage? > > (I'm aware that there could be optimizations for this problem > (like computing the aggregates in memory and storing only these > aggregates at 10 minutes, or other interval), but I want to have the > full history (for data mining tasks for example);) > > I will also like to present how was the Postgres benchmark implemented: > > * test machine: Linux (Ubuntu 8.04 x64), IBM x3750, 2 x 500Gb, > SCSI, RAID 0, 8Gb RAM, 2 x Intel Xeon 5160 x 2 core (3Ghz, 4Mb L2); > * Postgres version: 8.3.3; > > * database schema: >> create table sds_benchmark_data ( >> client int4 not null, >> sensor int4 not null, >> timestamp int4 not null, >> value int4 >> ); >> >> alter table sds_benchmark_data add primary key (client, sensor, timestamp); >> >> create index sds_benchmark_data__client_sensor__index on sds_benchmark_data (client, sensor); > > * postgres.conf (the default values, I will list only what has > been changed): >> max_connections = 20 >> shared_buffers = 24MB >> work_mem = 64MB >> maintenance_work_mem = 256MB >> fsync = off >> checkpoint_segments = 32 >> effective_cache_size = 1024MB > > * inserts are done like this: > * generated 100 million readings by using the following rule: > * client is randomly chosen between 0 and 10 thousand; > * sensor is randomly chosen between 0 and 10; > * the timestamp is always increasing by one; > * the insert is done in batches of 500 thousand inserts (I've > also tried 5, 25, 50 and 100 thousand without big impact); > * the banch inserts are done through COPY sds_benchmark_data > FROM STDIN through libpq (by using UNIX (local) sockets); > > What have I observed / tried: > * I've tested without the primary key and the index, and the > results were the best for inserts (600k inserts / s), but the > readings, worked extremly slow (due to the lack of indexing); > * with only the index (or only the primary key) the insert rate is > good at start (for the first 2 million readings), but then drops to > about 200 inserts / s; > > So could someone point me where I'me wrong, or what can I do to > optimize Postgres for this particular task? > > Thanks for your help, > Ciprian Dorin Craciun. > > P.S.: I'll want to publish the benchmark results after they are > done, and I want to squeeze as much power out of Postgres as possible.
see, I am affraid of the part when it says "randomly", because you probably used random(), which isn't the fastest thing on earth :)
Ciprian Dorin Craciun wrote: [............] > > So what can I do / how could I optimize the use of Postgres for this usage? > Hello, here you have some comments that will probably help you to get more from this test machine ...... > > * test machine: Linux (Ubuntu 8.04 x64), IBM x3750, 2 x 500Gb, > SCSI, RAID 0, 8Gb RAM, 2 x Intel Xeon 5160 x 2 core (3Ghz, 4Mb L2); > * Postgres version: 8.3.3; > (RAID 0 is never a good thing with databases if you don't have another redundant system that can be used to restore your data or if you want to minimize your downtime.) Putting the database transaction logs ($PGDATA/pg_xlog) on its own dedicated disk resource will probably increase write performace. >> >> create index sds_benchmark_data__client_sensor__index on sds_benchmark_data (client, sensor); > You don't need this index if the primary key is (client, sensor, timestamp). >> shared_buffers = 24MB I would increase this to 25% of your RAM. 2GB in the test machine (if it is a dedicated postgres server). It will help read-rate. You will probably have to increase kernel.shmmax and kernel.shmall in /etc/sysctl.conf (linux) >> fsync = off Do you have the results with this on? >> checkpoint_segments = 32 I would increase this to 128-256 if you work with large write loads (several GB of data). $PGDATA/pg_xlog would use some extra disk if you change this value. >> effective_cache_size = 1024MB > 50% of your RAM. 4GB in the test machine (if it is a dedicated postgres server). It will probably help read-rate. In addition, I will try to change these parameters also: wal_buffers = 64 random_page_cost = 2.0 In general, faster and more disks in a RAID 1+0 / 0+1 will help write performace. autovacuum should be on. regards. -- Rafael Martinez, <r.m.guerrero@usit.uio.no> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/
On Fri, Nov 21, 2008 at 02:50:45PM +0200, Ciprian Dorin Craciun wrote: > Currently I'm benchmarking the following storage solutions for this: > * Hypertable (http://www.hypertable.org/) -- which has good insert > rate (about 250k inserts / s), but slow read rate (about 150k reads / > s); (the aggregates are manually computed, as Hypertable does not > support other queries except scanning (in fact min, and max are easy > beeing the first / last key in the ordered set, but avg must be done > by sequential scan);) > * BerkeleyDB -- quite Ok insert rate (about 50k inserts / s), but > fabulos read rate (about 2M reads / s); (the same issue with > aggregates;) > * Postgres -- which behaves quite poorly (see below)... > * MySQL -- next to be tested; It's not quite what you're asking for; but have you checked out any of the databases that have resulted from the StreamSQL research? The Borealis engine[1] looks like the most recent development, but I'm not sure how you are with academic style code. I've never used it before, but it sounds as though it was designed for exactly your sort of problem. Sam [1] http://www.cs.brown.edu/research/borealis/public/
On Fri, Nov 21, 2008 at 3:29 PM, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote: > see, I am affraid of the part when it says "randomly", because you probably > used random(), which isn't the fastest thing on earth :) I can assure you this is not the problem... The other storage engines work quite well, and also the generation speed is somewhere at 30 million records / second, which 100 greater than the speed achieved by the fastest store I've tested so far... Ciprian.
You might want to look into how OpenNMS uses RRDTool. It is able to handle a huge number of nodes by queuing inserts into the RRDs and using JRobin.
I'm not sure if it is a great solution for what you are looking for, but I've found its performance scales quite well. I'm getting well over 500 updates per second using JRobin and an NFS disk. I'm sure I could do better but by my application is limited by hibernate. Each of my files stores an MRTG's worth of data and keeps the average and max of four points in 136k.
Here is an iostat -dmx 1 of the rrd update:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
xvda 7.00 33.00 122.00 2093.00 0.48 8.34 8.16 150.70 69.92 0.45 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
xvda 493.00 26.00 248.00 540.00 2.84 2.21 13.13 43.70 55.42 1.26 99.60
My big problem with RRD is not being able to query it like you can a database and it tending not to keep exact values. Oh, and it being pretty cryptic.
Sorry to clutter up the list of RRD stuff. I just thought it might be pertinent to Ciprian.
I'm not sure if it is a great solution for what you are looking for, but I've found its performance scales quite well. I'm getting well over 500 updates per second using JRobin and an NFS disk. I'm sure I could do better but by my application is limited by hibernate. Each of my files stores an MRTG's worth of data and keeps the average and max of four points in 136k.
Here is an iostat -dmx 1 of the rrd update:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
xvda 7.00 33.00 122.00 2093.00 0.48 8.34 8.16 150.70 69.92 0.45 100.00
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
xvda 493.00 26.00 248.00 540.00 2.84 2.21 13.13 43.70 55.42 1.26 99.60
My big problem with RRD is not being able to query it like you can a database and it tending not to keep exact values. Oh, and it being pretty cryptic.
Sorry to clutter up the list of RRD stuff. I just thought it might be pertinent to Ciprian.
On Fri, Nov 21, 2008 at 8:03 AM, Ciprian Dorin Craciun <ciprian.craciun@gmail.com> wrote:
> -----BEGIN PGP SIGNATURE-----On Fri, Nov 21, 2008 at 2:55 PM, Gerhard Heift
<ml-postgresql-20081012-3518@gheift.de> wrote:
> On Fri, Nov 21, 2008 at 02:50:45PM +0200, Ciprian Dorin Craciun wrote:
>> Hello all!
>>
>> I would like to ask some advice about the following problem
>> (related to the Dehems project: http://www.dehems.eu/ ):
>> * there are some clients; (the clients are in fact house holds;)
>> * each device has a number of sensors (about 10), and not all the
>> clients have the same sensor; also sensors might appear and disappear
>> dynamicaly; (the sensors are appliances;)
>> * for each device and each sensor a reading is produced (at about
>> 6 seconds); (the values could be power consumptions;)
>> * I would like to store the following data: (client, sensor,
>> timestamp, value);
>> * the usual queries are:
>> * for a given client (and sensor), and time interval, I need
>> the min, max, and avg of the values;
>> * for a given time interval (and sensor), I need min, max, and
>> avg of the values;
>> * other statistics;
>>
>> Currently I'm benchmarking the following storage solutions for this:
>> * Hypertable (http://www.hypertable.org/) -- which has good insert
>> rate (about 250k inserts / s), but slow read rate (about 150k reads /
>> s); (the aggregates are manually computed, as Hypertable does not
>> support other queries except scanning (in fact min, and max are easy
>> beeing the first / last key in the ordered set, but avg must be done
>> by sequential scan);)
>> * BerkeleyDB -- quite Ok insert rate (about 50k inserts / s), but
>> fabulos read rate (about 2M reads / s); (the same issue with
>> aggregates;)
>> * Postgres -- which behaves quite poorly (see below)...
>> * MySQL -- next to be tested;
>
> For this case I would use RRDTool: http://oss.oetiker.ch/rrdtool/
>
> Regards,
> Gerhard
>
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iEYEARECAAYFAkkmr9wACgkQa8fhU24j2flejQCeKl650qmSlomovOXDl6IKrADb
> cTAAnRebAFq420MuW9aMmhoFOo+sPIje
> =Zcoo
> -----END PGP SIGNATURE-----
Hy Gerhard, I know about RRDTool, but it has some limitations:
* I must know in advance the number of sensors;
* I must create for each client a file (and If I have 10 thousand clients?);
* I have a limited amount of history;
* (I'm not sure about this one but i think that) I must insert
each data point by executing a command;
* and also I can not replicate (distribute) it easily;
Or have you used RRDTool in a similar context as mine? Do you have
some benchmarks?
Ciprian.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
"Ciprian Dorin Craciun" <ciprian.craciun@gmail.com> writes: > In short the data is inserted by using COPY sds_benchmark_data > from STDIN, in batches of 500 thousand data points. Not sure if it applies to your real use-case, but if you can try doing the COPY from a local file instead of across the network link, it might go faster. Also, as already noted, drop the redundant index. regards, tom lane
Thank's for your info! Please see below... On Fri, Nov 21, 2008 at 4:14 PM, Rafael Martinez <r.m.guerrero@usit.uio.no> wrote: > Ciprian Dorin Craciun wrote: > [............] >> >> So what can I do / how could I optimize the use of Postgres for this usage? >> > > Hello, here you have some comments that will probably help you to get > more from this test machine ...... > >> >> * test machine: Linux (Ubuntu 8.04 x64), IBM x3750, 2 x 500Gb, >> SCSI, RAID 0, 8Gb RAM, 2 x Intel Xeon 5160 x 2 core (3Ghz, 4Mb L2); >> * Postgres version: 8.3.3; >> > > (RAID 0 is never a good thing with databases if you don't have another > redundant system that can be used to restore your data or if you want to > minimize your downtime.) > > Putting the database transaction logs ($PGDATA/pg_xlog) on its own > dedicated disk resource will probably increase write performace. Unfortunately this is a test machine shared with other projects, and I can't change (for now) the disk setup... When I'll have a dedicated machine I'll do this... For now nop... >>> create index sds_benchmark_data__client_sensor__index on sds_benchmark_data (client, sensor); >> > > You don't need this index if the primary key is (client, sensor, timestamp). In both the primary key and the index are listed here because when I've tested I have switched between them... (Almost the same behaviour with only the primary key, or with only the index)... >>> shared_buffers = 24MB > > I would increase this to 25% of your RAM. 2GB in the test machine (if it > is a dedicated postgres server). It will help read-rate. > > You will probably have to increase kernel.shmmax and kernel.shmall in > /etc/sysctl.conf (linux) Modified it. >>> fsync = off > > Do you have the results with this on? Doesn't help at all... I guest the problem is with the index building... >>> checkpoint_segments = 32 > > I would increase this to 128-256 if you work with large write loads > (several GB of data). $PGDATA/pg_xlog would use some extra disk if you > change this value. Updated it to 256. >>> effective_cache_size = 1024MB >> > > 50% of your RAM. 4GB in the test machine (if it is a dedicated postgres > server). It will probably help read-rate. Updated it to 4096MB. > In addition, I will try to change these parameters also: > > wal_buffers = 64 > random_page_cost = 2.0 Currently wal_buffers is 64kB, I've set it to 64MB??? random_page_cost was 4.0, decreased it to 2.0??? > In general, faster and more disks in a RAID 1+0 / 0+1 will help write > performace. autovacuum should be on. > > regards. > -- > Rafael Martinez, <r.m.guerrero@usit.uio.no> > Center for Information Technology Services > University of Oslo, Norway > > PGP Public Key: http://folk.uio.no/rafael/ So after the updates, the results were better, but still under 1k inserts / second... Thanks again for your info!
On Fri, Nov 21, 2008 at 6:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Ciprian Dorin Craciun" <ciprian.craciun@gmail.com> writes: >> In short the data is inserted by using COPY sds_benchmark_data >> from STDIN, in batches of 500 thousand data points. > > Not sure if it applies to your real use-case, but if you can try doing > the COPY from a local file instead of across the network link, it > might go faster. Also, as already noted, drop the redundant index. > > regards, tom lane Hy! It won't be that difficult to use a local file (now I'm using the same computer), but will it really make a difference? (I mean have you seen such issues?) Thanks, Ciprian Craciun.
"Ciprian Dorin Craciun" <ciprian.craciun@gmail.com> writes: > On Fri, Nov 21, 2008 at 6:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Not sure if it applies to your real use-case, but if you can try doing >> the COPY from a local file instead of across the network link, it >> might go faster. Also, as already noted, drop the redundant index. > It won't be that difficult to use a local file (now I'm using the > same computer), but will it really make a difference? Yes. I'm not sure how much, but there is nontrivial protocol overhead. regards, tom lane
On Fri, Nov 21, 2008 at 7:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Ciprian Dorin Craciun" <ciprian.craciun@gmail.com> writes: >> On Fri, Nov 21, 2008 at 6:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Not sure if it applies to your real use-case, but if you can try doing >>> the COPY from a local file instead of across the network link, it >>> might go faster. Also, as already noted, drop the redundant index. > >> It won't be that difficult to use a local file (now I'm using the >> same computer), but will it really make a difference? > > Yes. I'm not sure how much, but there is nontrivial protocol overhead. > > regards, tom lane > Ok, I have tried it, and no improvements... (There is also the drawback that I must run the inserts as the superuser...) Ciprian Craciun.
On Fri, 21 Nov 2008, Tom Lane wrote: > Not sure if it applies to your real use-case, but if you can try doing > the COPY from a local file instead of across the network link, it > might go faster. The fact that the inserts are reported as fast initially but slow as the table and index size grow means it's probably a disk bottleneck rather than anything related to the client itself. If the network overhead was the issue, I wouldn't expect it to start fast like that. Ditto for concerns about the random function being slow. Either of those might speed up the initial, fast period a bit, but they're not going to impact the later slowdown. Ciprian, the main interesting piece of data to collect here is a snapshot of a few samples lines from the output from "vmstat 1" during the initial, fast loading section versus the slower period. I think what's happening to you is that maintaining the index blocks on the disk is turning into increasingly random I/O as the size of the table grows, and your disks just can't keep up with that. What I'd expect is that initially the waiting for I/O "wa" figure will be low, but it will creep up constantly and at some point spike up hard after the working set of data operated on exceeds memory. The fact that PostgreSQL performs badly here compared to the more lightweight databases you've used isn't that much of a surprise. There's a fair amount of overhead for the write-ahead log and the MVCC implementation in the database, and your application is suffering from all that robustness overhead but not really gaining much of a benefit from it. The main things that help in this sort of situation are increases in shared_buffers and checkpoint_segments, so that more database information is stored in RAM for longer rather than being pushed to disk too quickly, but what Rafael suggested already got you most of the possible improvement here. You might get an extra bit of boost by adjusting the index FILLFACTOR upwards (default is 90, if you're never updating you could try 100). I doubt that will be anything but a minor incremental improvement though. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Fri, Nov 21, 2008 at 7:42 PM, Ciprian Dorin Craciun <ciprian.craciun@gmail.com> wrote: > On Fri, Nov 21, 2008 at 7:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> "Ciprian Dorin Craciun" <ciprian.craciun@gmail.com> writes: >>> On Fri, Nov 21, 2008 at 6:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> Not sure if it applies to your real use-case, but if you can try doing >>>> the COPY from a local file instead of across the network link, it >>>> might go faster. Also, as already noted, drop the redundant index. >> >>> It won't be that difficult to use a local file (now I'm using the >>> same computer), but will it really make a difference? >> >> Yes. I'm not sure how much, but there is nontrivial protocol overhead. >> >> regards, tom lane >> > > Ok, I have tried it, and no improvements... (There is also the > drawback that I must run the inserts as the superuser...) > > Ciprian Craciun. If I think better, the protocol overhead is not important... Because if I don't use indexes, I obtain 600k inserts / second... (So the test was useless... :) But I learn from my mistakes...) Ciprian.
On Fri, Nov 21, 2008 at 7:45 PM, Greg Smith <gsmith@gregsmith.com> wrote: > On Fri, 21 Nov 2008, Tom Lane wrote: > >> Not sure if it applies to your real use-case, but if you can try doing >> the COPY from a local file instead of across the network link, it >> might go faster. > > The fact that the inserts are reported as fast initially but slow as the > table and index size grow means it's probably a disk bottleneck rather than > anything related to the client itself. If the network overhead was the > issue, I wouldn't expect it to start fast like that. Ditto for concerns > about the random function being slow. Either of those might speed up the > initial, fast period a bit, but they're not going to impact the later > slowdown. > > Ciprian, the main interesting piece of data to collect here is a snapshot of > a few samples lines from the output from "vmstat 1" during the initial, fast > loading section versus the slower period. I think what's happening to you > is that maintaining the index blocks on the disk is turning into > increasingly random I/O as the size of the table grows, and your disks just > can't keep up with that. What I'd expect is that initially the waiting for > I/O "wa" figure will be low, but it will creep up constantly and at some > point spike up hard after the working set of data operated on exceeds > memory. Ok. Silly question: how do I "vmstat 1"??? The problem is indeed the indexes... So If we analyze the insert patterns: client id's randomly distributed and sensor id's the same, and the index is created ontop of these two, it means that (probabilistically) speaking after 100 thousand inserts (10 thousand clients and 10 sensors), all the index pages would be dirty... Indeed I could prolongue the flush by using bigger and bigger memory, but this doesn't help for 100 million records... > The fact that PostgreSQL performs badly here compared to the more > lightweight databases you've used isn't that much of a surprise. There's a > fair amount of overhead for the write-ahead log and the MVCC implementation > in the database, and your application is suffering from all that robustness > overhead but not really gaining much of a benefit from it. The main things > that help in this sort of situation are increases in shared_buffers and > checkpoint_segments, so that more database information is stored in RAM for > longer rather than being pushed to disk too quickly, but what Rafael > suggested already got you most of the possible improvement here. You might > get an extra bit of boost by adjusting the index FILLFACTOR upwards (default > is 90, if you're never updating you could try 100). I doubt that will be > anything but a minor incremental improvement though. About the fillfactor, on the contrary, I think I should set it lower (as the index fills very quickly)... I've set it to 10% and it behave a little better than with (10)... I'll run a full 100 million test to see where it breaks... > -- > * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD Thanks, Ciprian Craciun.
On Fri, 21 Nov 2008, Sam Mason wrote: > It's not quite what you're asking for; but have you checked out any > of the databases that have resulted from the StreamSQL research? A streaming database approach is in fact ideally suited to handling this particular problem. Looking at the original request here: > * for a given client (and sensor), and time interval, I need the min, > max, and avg of the values; > * for a given time interval (and sensor), I need min, max, and avg of > the values; The most efficient way possible to compute these queries is to buffer the full interval worth of data in memory as the values are being inserted, compute these aggregates once the time window for the interval has ended, then write a summarized version of the data. Doing that sort of thing and then operating on the aggregated data, rather than maintaining a bulky index covering every single row, is exactly the sort of thing a good streaming database would handle for you. I can't comment on the current state of Borealis. But as the original focus of the streaming database research that spawned Truviso where I work was accelerating data capture from sensor networks, I know this general style of approach is quite beneficial here. For example, http://telegraph.cs.berkeley.edu/telegraphcq/v2.1/ is an open-source implementation from that research that's integrated with an older version of PostgreSQL. If you look at the "windowed aggregates" example there, it shows what a streaming query similar to the requirements here would look like: an average and other statistics produced on a per-interval basis. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Fri, Nov 21, 2008 at 8:41 PM, Greg Smith <gsmith@gregsmith.com> wrote: > On Fri, 21 Nov 2008, Sam Mason wrote: > >> It's not quite what you're asking for; but have you checked out any >> of the databases that have resulted from the StreamSQL research? > > A streaming database approach is in fact ideally suited to handling this > particular problem. Looking at the original request here: > >> * for a given client (and sensor), and time interval, I need the min, max, >> and avg of the values; >> * for a given time interval (and sensor), I need min, max, and avg of the >> values; > > The most efficient way possible to compute these queries is to buffer the > full interval worth of data in memory as the values are being inserted, > compute these aggregates once the time window for the interval has ended, > then write a summarized version of the data. Doing that sort of thing and > then operating on the aggregated data, rather than maintaining a bulky index > covering every single row, is exactly the sort of thing a good streaming > database would handle for you. > > I can't comment on the current state of Borealis. But as the original focus > of the streaming database research that spawned Truviso where I work was > accelerating data capture from sensor networks, I know this general style of > approach is quite beneficial here. For example, > http://telegraph.cs.berkeley.edu/telegraphcq/v2.1/ is an open-source > implementation from that research that's integrated with an older version of > PostgreSQL. If you look at the "windowed aggregates" example there, it > shows what a streaming query similar to the requirements here would look > like: an average and other statistics produced on a per-interval basis. > > -- > * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general Thank you for your pointers, I'll check them out... I'm aware of the fact that the problem could be solved in other more efficient ways (for example as you've said, by aggregating the data in memory and flushing only the aggregates), but as I've said in the beginning, I also want to test different storage systems in such a situation... Ciprian Craciun.
On Fri, Nov 21, 2008 at 9:50 AM, Ciprian Dorin Craciun <ciprian.craciun@gmail.com> wrote:
Currently I'm benchmarking the following storage solutions for this:
* Hypertable (http://www.hypertable.org/) -- which has good insert
rate (about 250k inserts / s), but slow read rate (about 150k reads /
s); (the aggregates are manually computed, as Hypertable does not
support other queries except scanning (in fact min, and max are easy
beeing the first / last key in the ordered set, but avg must be done
by sequential scan);)
* BerkeleyDB -- quite Ok insert rate (about 50k inserts / s), but
fabulos read rate (about 2M reads / s); (the same issue with
aggregates;)
* Postgres -- which behaves quite poorly (see below)...
* MySQL -- next to be tested;
I think it'll be also interesting to see how SQLite 3 performs in this scenario. Any plans?
regards
diego
On Fri, Nov 21, 2008 at 10:26 PM, Diego Schulz <dschulz@gmail.com> wrote: > > > On Fri, Nov 21, 2008 at 9:50 AM, Ciprian Dorin Craciun > <ciprian.craciun@gmail.com> wrote: >> >> Currently I'm benchmarking the following storage solutions for this: >> * Hypertable (http://www.hypertable.org/) -- which has good insert >> rate (about 250k inserts / s), but slow read rate (about 150k reads / >> s); (the aggregates are manually computed, as Hypertable does not >> support other queries except scanning (in fact min, and max are easy >> beeing the first / last key in the ordered set, but avg must be done >> by sequential scan);) >> * BerkeleyDB -- quite Ok insert rate (about 50k inserts / s), but >> fabulos read rate (about 2M reads / s); (the same issue with >> aggregates;) >> * Postgres -- which behaves quite poorly (see below)... >> * MySQL -- next to be tested; > > I think it'll be also interesting to see how SQLite 3 performs in this > scenario. Any plans? > > regards > > diego I would try it if I would know that it could handle the load... Do you have some info about this? Any pointers about the configuration issues? Ciprian.
On 21 Lis, 13:50, ciprian.crac...@gmail.com ("Ciprian Dorin Craciun") wrote: > Hello all! > > I would like to ask some advice about the following problem > (related to the Dehems project:http://www.dehems.eu/): > * there are some clients; (the clients are in fact house holds;) > * each device has a number of sensors (about 10), and not all the > clients have the same sensor; also sensors might appear and disappear > dynamicaly; (the sensors are appliances;) > * for each device and each sensor a reading is produced (at about > 6 seconds); (the values could be power consumptions;) > * I would like to store the following data: (client, sensor, > timestamp, value); > * the usual queries are: > * for a given client (and sensor), and time interval, I need > the min, max, and avg of the values; > * for a given time interval (and sensor), I need min, max, and > avg of the values; > * other statistics; How many devices you expect ? As I understand number of expected is more or less: no.of devices * no.sensors (about 10) every 6second. Let assume that you have 100 devices it means 1000 inserts per 6s = 166 insert for 1 seconds. > * inserts are done like this: > * generated 100 million readings by using the following rule: > * client is randomly chosen between 0 and 10 thousand; > * sensor is randomly chosen between 0 and 10; > * the timestamp is always increasing by one; > * the insert is done in batches of 500 thousand inserts (I've > also tried 5, 25, 50 and 100 thousand without big impact); > * the banch inserts are done through COPY sds_benchmark_data > FROM STDIN through libpq (by using UNIX (local) sockets); > What have I observed / tried: > * I've tested without the primary key and the index, and the > results were the best for inserts (600k inserts / s), but the > readings, worked extremly slow (due to the lack of indexing); > * with only the index (or only the primary key) the insert rate is > good at start (for the first 2 million readings), but then drops to > about 200 inserts / s; Try periodicaly execute REINDEX your index, and execute ANALYZE for your table . To be honest should not influance on inserts but will influance on select. Michal Szymanski http://blog.szymanskich.net
On Fri, Nov 21, 2008 at 3:12 PM, Michal Szymanski <dyrex@poczta.onet.pl> wrote: > On 21 Lis, 13:50, ciprian.crac...@gmail.com ("Ciprian Dorin Craciun") > wrote: >> Hello all! >> >> I would like to ask some advice about the following problem >> (related to the Dehems project:http://www.dehems.eu/): >> * there are some clients; (the clients are in fact house holds;) >> * each device has a number of sensors (about 10), and not all the >> clients have the same sensor; also sensors might appear and disappear >> dynamicaly; (the sensors are appliances;) >> * for each device and each sensor a reading is produced (at about >> 6 seconds); (the values could be power consumptions;) >> * I would like to store the following data: (client, sensor, >> timestamp, value); >> * the usual queries are: >> * for a given client (and sensor), and time interval, I need >> the min, max, and avg of the values; >> * for a given time interval (and sensor), I need min, max, and >> avg of the values; >> * other statistics; > > How many devices you expect ? > As I understand number of expected is more or less: > no.of devices * no.sensors (about 10) > every 6second. Let assume that you have 100 devices it means 1000 > inserts per 6s = 166 insert for 1 seconds. Yes, the figures are like this: * average number of raw inserts / second (without any optimization or previous aggregation): #clients (~ 100 thousand) * #sensors (~ 10) / 6seconds = 166 thousand inserts / second... * if I use sharding this number vould drop linearly with the number of Postgres instances... so let's say I use about 10 thousand users / Postgres instance => 16 thousand inserts / second... (a figure which I wasn't able to reach in my Postgres benchmarks...) Either way, I would expect at least 2-3 thousand inserts per second... >> * inserts are done like this: >> * generated 100 million readings by using the following rule: >> * client is randomly chosen between 0 and 10 thousand; >> * sensor is randomly chosen between 0 and 10; >> * the timestamp is always increasing by one; >> * the insert is done in batches of 500 thousand inserts (I've >> also tried 5, 25, 50 and 100 thousand without big impact); >> * the banch inserts are done through COPY sds_benchmark_data >> FROM STDIN through libpq (by using UNIX (local) sockets); > >> What have I observed / tried: >> * I've tested without the primary key and the index, and the >> results were the best for inserts (600k inserts / s), but the >> readings, worked extremly slow (due to the lack of indexing); >> * with only the index (or only the primary key) the insert rate is >> good at start (for the first 2 million readings), but then drops to >> about 200 inserts / s; > > > Try periodicaly execute REINDEX your index, and execute ANALYZE for > your table . To be honest should not influance on inserts but will > influance on select. I'll keep this in mind when I'll reach the select part... For the moment I'm strugling with inserts... (Actually I've kind of given up...) > Michal Szymanski > http://blog.szymanskich.net > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general Thanks, Ciprian Craciun.
Fwd: Using Postgres to store high volume streams of sensor readings
От
"Ciprian Dorin Craciun"
Дата:
(I'm adding the discussion also to the Postgres list.) On Fri, Nov 21, 2008 at 11:19 PM, Dann Corbit <DCorbit@connx.com> wrote: > What is the schema for your table? > If you are using copy rather than insert, 1K rows/sec for PostgreSQL seems very bad unless the table is extremely wide. The schema is posted at the beginning of the thread. But in short it is a table with 4 columns: client, sensor, timestamp and value, all beeing int4 (integer). There is only one (compound) index on the client and sensor... I gues the problem is from the index... > Memory mapped database systems may be the answer to your need for speed. > If you have a single inserting process, you can try FastDB, but unless you use a 64 bit operating system and compiler,you will be limited to 2 GB file size. FastDB is single writer, multiple reader model. See: > http://www.garret.ru/databases.html > > Here is output from the fastdb test program testperf, when compiled in 64 bit mode (the table is ultra-simple with onlya string key and a string value, with also a btree and a hashed index on key): > Elapsed time for inserting 1000000 record: 8 seconds > Commit time: 1 > Elapsed time for 1000000 hash searches: 1 seconds > Elapsed time for 1000000 index searches: 4 seconds > Elapsed time for 10 sequential search through 1000000 records: 2 seconds > Elapsed time for search with sorting 1000000 records: 3 seconds > Elapsed time for deleting all 1000000 records: 0 seconds > > Here is a bigger set so you can get an idea about scaling: > > Elapsed time for inserting 10000000 record: 123 seconds > Commit time: 13 > Elapsed time for 10000000 hash searches: 10 seconds > Elapsed time for 10000000 index searches: 82 seconds > Elapsed time for 10 sequential search through 10000000 records: 8 seconds > Elapsed time for search with sorting 10000000 records: 41 seconds > Elapsed time for deleting all 10000000 records: 4 seconds > > If you have a huge database, then FastDB may be problematic because you need free memory equal to the size of your database. > E.g. a 100 GB database needs 100 GB memory to operate at full speed. In 4GB allotments, at $10-$50/GB 100 GB costs between$1000 and $5000. Unfortunately the database will be too large (eventually) to store all of it inside the memory... For the moment, I don't think I'll be able to try FastDB... Il put it on my reminder list... > MonetDB is worth a try, but I had trouble getting it to work properly on 64 bit Windows: > http://monetdb.cwi.nl/ I've heard of MonetDB -- it's from the same family as Hypertable... Maybe I'll give it a try after I finish with SQLlite... Ciprian Craciun.
Ciprian Dorin Craciun wrote: > > I would try it if I would know that it could handle the load... Do > you have some info about this? Any pointers about the configuration > issues? > > Ciprian. > Apart from the configure options at build time you should read - http://www.sqlite.org/pragma.html It was a few versions ago so may be changed by now, but I reckon it was the temp_store setting - which is described as temp tables and indexes but is (or was) also used for large query and sorting needs. Setting this to memory did make a difference for some queries. -- Shane Ambler pgSQL (at) Sheeky (dot) Biz Get Sheeky @ http://Sheeky.Biz
On Sat, Nov 22, 2008 at 8:04 PM, Shane Ambler <pgsql@sheeky.biz> wrote: > Ciprian Dorin Craciun wrote: > >> >> I would try it if I would know that it could handle the load... Do >> you have some info about this? Any pointers about the configuration >> issues? >> >> Ciprian. >> > > > Apart from the configure options at build time you should read - > http://www.sqlite.org/pragma.html > > It was a few versions ago so may be changed by now, but I reckon it > was the temp_store setting - which is described as temp tables and > indexes but is (or was) also used for large query and sorting needs. > Setting this to memory did make a difference for some queries. > > > -- > > Shane Ambler > pgSQL (at) Sheeky (dot) Biz > > Get Sheeky @ http://Sheeky.Biz Hello all! (This email now is about Sqlite3, but it also relates to Postgres as a coparison.) I've tested also Sqlite3 and it has the same behavior as Postgres... Meaning at beginning it goes really nice 20k inserts, drops to about 10k inserts, but after a few million records, the HDD led starts to blink non-stop, and then it drops to unde 1k.... I've used exactly the same schema as for Postgres, and the following pragmas: * page_size = 8192; * fullsync = 0; * synchronous = off; * journal_mode = off; (this has a 10 fold impact... from 1k inserts at the beginning to 10 or 20k...) * cache_size = 65536; (this is in pages, and it results at 512MB, but I don't see the memory being used during inserts...) * auto_vacuum = none; * analyze at the end of the inserts; So I would conclude that relational stores will not make it for this use case... I'll rerun the tests tomorrow and post a comparison between SQLite and Postgres. Ciprian Craciun.
On Sat, Nov 22, 2008 at 2:37 PM, Ciprian Dorin Craciun <ciprian.craciun@gmail.com> wrote: > > Hello all! SNIP > So I would conclude that relational stores will not make it for > this use case... I was wondering you guys are having to do all individual inserts or if you can batch some number together into a transaction. Being able to put > 1 into a single transaction is a huge win for pgsql.
On Sat, Nov 22, 2008 at 11:51 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Sat, Nov 22, 2008 at 2:37 PM, Ciprian Dorin Craciun > <ciprian.craciun@gmail.com> wrote: >> >> Hello all! > SNIP >> So I would conclude that relational stores will not make it for >> this use case... > > I was wondering you guys are having to do all individual inserts or if > you can batch some number together into a transaction. Being able to > put > 1 into a single transaction is a huge win for pgsql. I'm aware of the performance issues between 1 insert vs x batched inserts in one operation / transaction. That is why in the case of Postgres I am using COPY <table> FROM STDIN, and using 5k batches... (I've tried even 10k, 15k, 25k, 50k, 500k, 1m inserts / batch and no improvement...) Ciprian Craciun.
On Sat, Nov 22, 2008 at 4:54 PM, Ciprian Dorin Craciun <ciprian.craciun@gmail.com> wrote: > On Sat, Nov 22, 2008 at 11:51 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> On Sat, Nov 22, 2008 at 2:37 PM, Ciprian Dorin Craciun >> <ciprian.craciun@gmail.com> wrote: >>> >>> Hello all! >> SNIP >>> So I would conclude that relational stores will not make it for >>> this use case... >> >> I was wondering you guys are having to do all individual inserts or if >> you can batch some number together into a transaction. Being able to >> put > 1 into a single transaction is a huge win for pgsql. > > I'm aware of the performance issues between 1 insert vs x batched > inserts in one operation / transaction. That is why in the case of > Postgres I am using COPY <table> FROM STDIN, and using 5k batches... > (I've tried even 10k, 15k, 25k, 50k, 500k, 1m inserts / batch and no > improvement...) I've had exactly the same experience with Postgres during an attempt to use it as a store for large-scale incoming streams of data at a rate very comparable to what you're looking at (~100k/sec). We eventually just ended up rolling our own solution. -- - David T. Wilson david.t.wilson@gmail.com
Ciprian Dorin Craciun escribió: > I've tested also Sqlite3 and it has the same behavior as > Postgres... Meaning at beginning it goes really nice 20k inserts, > drops to about 10k inserts, but after a few million records, the HDD > led starts to blink non-stop, and then it drops to unde 1k.... The problem is, most likely, on updating the indexes. Heap inserts should always take more or less the same time, but index insertion requires walking down the index struct for each insert, and the path to walk gets larger the more data you have. Postgres does not have bulk index insert, which could perhaps get you a huge performance improvement. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
> On 21 Lis, 13:50, ciprian.crac...@gmail.com ("Ciprian Dorin Craciun") > wrote: > > What have I observed / tried: > > * I've tested without the primary key and the index, and the > > results were the best for inserts (600k inserts / s), but the > > readings, worked extremly slow (due to the lack of indexing); > > * with only the index (or only the primary key) the insert rate is > > good at start (for the first 2 million readings), but then drops to > > about 200 inserts / s; I didn't read the thread so I don't know if this was suggested already: bulk index creation is a lot faster than retail index inserts. Maybe one thing you could try is to have an unindexed table to do the inserts, and a separate table that you periodically truncate, refill with the contents from the other table, then create index. Two main problems: 1. querying during the truncate/refill/reindex process (you can solve it by having a second table that you "rename in place"); 2. the query table is almost always out of date. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > The problem is, most likely, on updating the indexes. Heap inserts > should always take more or less the same time, but index insertion > requires walking down the index struct for each insert, and the path to > walk gets larger the more data you have. It's worse than that: his test case inserts randomly ordered keys, which means that there's no locality of access during the index updates. Once the indexes get bigger than RAM, update speed goes into the toilet, because the working set of index pages that need to be touched also is bigger than RAM. That effect is going to be present in *any* standard-design database, not just Postgres. It's possible that performance in a real-world situation would be better, if the incoming data stream isn't so random; but it's hard to tell about that with the given facts. One possibly useful trick is to partition the data by timestamp with partition sizes chosen so that the indexes don't get out of hand. But the partition management might be enough of a PITA to negate any win. regards, tom lane
Since you always need the timestamp in your selects, have you tried indexing only the timestamp field? Your selects would be slower, but since client and sensor don't have that many distinct values compared to the number ofrows you are inserting maybe the difference in selects would not be that huge.
On Sat, Nov 22, 2008 at 5:54 PM, Scara Maccai <m_lists@yahoo.it> wrote: > Since you always need the timestamp in your selects, have you tried indexing only the timestamp field? > Your selects would be slower, but since client and sensor don't have that many distinct values compared to the number ofrows you are inserting maybe the difference in selects would not be that huge. Even better might be partitioning on the timestamp. IF all access is in a certain timestamp range it's usually a big win, especially because he can move to a new table every hour / day / week or whatever and merge the old one into a big "old data" table.
On Sun, Nov 23, 2008 at 1:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> The problem is, most likely, on updating the indexes. Heap inserts >> should always take more or less the same time, but index insertion >> requires walking down the index struct for each insert, and the path to >> walk gets larger the more data you have. > > It's worse than that: his test case inserts randomly ordered keys, which > means that there's no locality of access during the index updates. Once > the indexes get bigger than RAM, update speed goes into the toilet, > because the working set of index pages that need to be touched also > is bigger than RAM. That effect is going to be present in *any* > standard-design database, not just Postgres. > > It's possible that performance in a real-world situation would be > better, if the incoming data stream isn't so random; but it's > hard to tell about that with the given facts. > > One possibly useful trick is to partition the data by timestamp with > partition sizes chosen so that the indexes don't get out of hand. > But the partition management might be enough of a PITA to negate > any win. > > regards, tom lane Thanks for your feedback! This is just as I supposed, but i didn't had the Postgres experience to be certain. I'll include your conclusion to my report. Ciprian Craciun.
On Sun, Nov 23, 2008 at 12:26 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Ciprian Dorin Craciun escribió: > >> I've tested also Sqlite3 and it has the same behavior as >> Postgres... Meaning at beginning it goes really nice 20k inserts, >> drops to about 10k inserts, but after a few million records, the HDD >> led starts to blink non-stop, and then it drops to unde 1k.... > > The problem is, most likely, on updating the indexes. Heap inserts > should always take more or less the same time, but index insertion > requires walking down the index struct for each insert, and the path to > walk gets larger the more data you have. > > Postgres does not have bulk index insert, which could perhaps get you a > huge performance improvement. > > -- > Alvaro Herrera http://www.CommandPrompt.com/ > The PostgreSQL Company - Command Prompt, Inc. I don't think the index depth is a problem. For example in the case of BerkeleyDB with BTree storage, the tree height is 3 after 100m inserts... So this is not the problem. I think the problem is that after a certain amount of data, perdicaly the entire index is touched, and in this case the HDD becomes a bottleneck... (Demonstrated by the vmstat 1 output I've put in a previous email.) Ciprian.
On Sun, Nov 23, 2008 at 12:32 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > >> On 21 Lis, 13:50, ciprian.crac...@gmail.com ("Ciprian Dorin Craciun") >> wrote: > >> > What have I observed / tried: >> > * I've tested without the primary key and the index, and the >> > results were the best for inserts (600k inserts / s), but the >> > readings, worked extremly slow (due to the lack of indexing); >> > * with only the index (or only the primary key) the insert rate is >> > good at start (for the first 2 million readings), but then drops to >> > about 200 inserts / s; > > I didn't read the thread so I don't know if this was suggested already: > bulk index creation is a lot faster than retail index inserts. Maybe > one thing you could try is to have an unindexed table to do the inserts, > and a separate table that you periodically truncate, refill with the > contents from the other table, then create index. Two main problems: 1. > querying during the truncate/refill/reindex process (you can solve it by > having a second table that you "rename in place"); 2. the query table is > almost always out of date. > > -- > Alvaro Herrera http://www.CommandPrompt.com/ > The PostgreSQL Company - Command Prompt, Inc. > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general The concerts you have listed are very important to me... I will use the database not only for archival and offline analysis, but also for realtime queries (like what is the power consumption in the last minute)... Of course I could use Postgres only for archival like you've said, and some other solution for realtime queries, but this adds complexity to the application... Thanks, Ciprian Craciun.
On Sun, Nov 23, 2008 at 3:09 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Sat, Nov 22, 2008 at 5:54 PM, Scara Maccai <m_lists@yahoo.it> wrote: >> Since you always need the timestamp in your selects, have you tried indexing only the timestamp field? >> Your selects would be slower, but since client and sensor don't have that many distinct values compared to the numberof rows you are inserting maybe the difference in selects would not be that huge. > > Even better might be partitioning on the timestamp. IF all access is > in a certain timestamp range it's usually a big win, especially > because he can move to a new table every hour / day / week or whatever > and merge the old one into a big "old data" table. > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general Yes, If i would speed the inserts tremendously... I've tested it and the insert speed is somewhere at 200k->100k. But unfortunately the query speed is not good at all because most queries are for a specific client (and sensor) in a given time range... Ciprian Craciun.
* Ciprian Dorin Craciun (ciprian.craciun@gmail.com) wrote: > > Even better might be partitioning on the timestamp. IF all access is > > in a certain timestamp range it's usually a big win, especially > > because he can move to a new table every hour / day / week or whatever > > and merge the old one into a big "old data" table. > > Yes, If i would speed the inserts tremendously... I've tested it > and the insert speed is somewhere at 200k->100k. > > But unfortunately the query speed is not good at all because most > queries are for a specific client (and sensor) in a given time > range... Have you set up your partitions correctly (eg, with appropriate CHECK constraints and with constraint_exclusion turned on)? Also, you'd want to keep your indexes on the individual partitions, of course.. That should improve query time quite a bit since it should only be hitting the partitions where the data might be. Stephen
Вложения
> But unfortunately the query speed is not good at all > because most > queries are for a specific client (and sensor) in a given > time > range... Maybe I'm wrong, I don't know a lot of these things; but defining the index as (timestamp, clientid, sensor) instead of (clientid,sensor, timestamp) should give you more "locality of access" in the index creation? I think it would make more sense too, since you are not going to query the db without the timestamp, but maybe you want toquery it without the clientid or probe (to get aggregates for example). Plus: could you give us some numbers about the difference in performance of the selects between the index defined as (timestamp)and defined as (clientid, sensor, timestamp)?
While most of my experience with oracle/informix I would also recommend a) partitioning on DB level Put partitions on on separate hard disks, have the system to be at least dual core, and make the disks to be attached via SCSI controller (not IDE) for parallel performance. b) partitioning on application level (that is having the insert code dynamically figure out what DB/and what table to go (this complicates the application for inserts as well as for reports) c) may be there is a chance to remove the index (if all you are doing is inserts) -- and then recreate it later? e) I did not see the type of index but if the value of at least some of the indexed fields repeated a lot -- Oracle had what's called 'bitmap index' Postgresql might have something similar, where that type of index is optimized for the fact that values are the same for majority of the rows (it becomes much smaller, and therefore quicker to update). f) review that there are no insert triggers and constraints (eithe field or foreign) on those tables if there -- validate why they are there and see if they can be removed -- and the application would then need to gurantee correctness VSP On Sun, 23 Nov 2008 08:34:57 +0200, "Ciprian Dorin Craciun" <ciprian.craciun@gmail.com> said: > On Sun, Nov 23, 2008 at 1:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alvaro Herrera <alvherre@commandprompt.com> writes: > >> The problem is, most likely, on updating the indexes. Heap inserts > >> should always take more or less the same time, but index insertion > >> requires walking down the index struct for each insert, and the path to > >> walk gets larger the more data you have. > > > > It's worse than that: his test case inserts randomly ordered keys, which > > means that there's no locality of access during the index updates. Once > > the indexes get bigger than RAM, update speed goes into the toilet, > > because the working set of index pages that need to be touched also > > is bigger than RAM. That effect is going to be present in *any* > > standard-design database, not just Postgres. > > > > It's possible that performance in a real-world situation would be > > better, if the incoming data stream isn't so random; but it's > > hard to tell about that with the given facts. > > > > One possibly useful trick is to partition the data by timestamp with > > partition sizes chosen so that the indexes don't get out of hand. > > But the partition management might be enough of a PITA to negate > > any win. > > > > regards, tom lane > > Thanks for your feedback! This is just as I supposed, but i didn't > had the Postgres experience to be certain. > I'll include your conclusion to my report. > > Ciprian Craciun. > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- V S P toreason@fastmail.fm -- http://www.fastmail.fm - Email service worth paying for. Try it for free
On Sun, Nov 23, 2008 at 3:28 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Ciprian Dorin Craciun (ciprian.craciun@gmail.com) wrote: >> > Even better might be partitioning on the timestamp. IF all access is >> > in a certain timestamp range it's usually a big win, especially >> > because he can move to a new table every hour / day / week or whatever >> > and merge the old one into a big "old data" table. >> >> Yes, If i would speed the inserts tremendously... I've tested it >> and the insert speed is somewhere at 200k->100k. >> >> But unfortunately the query speed is not good at all because most >> queries are for a specific client (and sensor) in a given time >> range... > > Have you set up your partitions correctly (eg, with appropriate CHECK > constraints and with constraint_exclusion turned on)? Also, you'd want > to keep your indexes on the individual partitions, of course.. That > should improve query time quite a bit since it should only be hitting > the partitions where the data might be. > > Stephen > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.9 (GNU/Linux) > > iEYEARECAAYFAkkpWpEACgkQrzgMPqB3kihvyACgm6ITdkodTqZvDLCjqavj9lkR > w1oAnRrB1rbW+bF6Spr77VcH5/Mty4S6 > =G7aX > -----END PGP SIGNATURE----- Well, now that I've read the previous two emails better, I understand what Scot and Stephen are talking about... So if I understood it correctly: I should build indexes only for certain parts of the data (like previous full hour and so). But I see a problem: wouldn't this lead to a lot of indices beeing created (24 / hour, ~150 / week, ...)? Another question: wouldn't the index creation impact the insertion and query speed during they are created? Either case I don't think this is a very easy to implement solution... Ciprian Craciun.
Thanks for your info! Please see my observations below. By the way, we are planning to also try Informix (the time series extension?)... Do you have some other tips about Informix? Ciprian Craciun. On Sun, Nov 23, 2008 at 6:06 PM, V S P <toreason@fastmail.fm> wrote: > While most of my experience with oracle/informix > > I would also recommend > a) partitioning on DB level > Put partitions on on separate hard disks, have the system to be > at least dual core, and make the disks to be attached via SCSI > controller (not IDE) for parallel performance. This I found out, but for now I'm not able to change the disk layout... > b) partitioning on application level (that is having > the insert code dynamically figure out what DB/and what table to go > (this complicates the application for inserts as well as for reports) We wanted to do this, and in this circumstance the winner for the moment is BerkeleyDB as it's super fast for readings. (This is what it's currently called sharding, right?) > c) may be there is a chance to remove the index (if all you are doing > is inserts) -- and then recreate it later? Not possible, as we would like to use the same database (and table) for both inserts and real time queries... Otherwise the application would complicate a lot... > e) I did not see the type of index but if the value of at least > some of the indexed fields repeated a lot -- Oracle had what's called > 'bitmap index' > Postgresql might have something similar, where that type of index > is optimized for the fact that values are the same for majority > of the rows (it becomes much smaller, and therefore quicker to update). For the moment the index type is the default one (btree), and from the documentation I didn't see another matching (with the current usage) one. > f) review that there are no insert triggers and > constraints (eithe field or foreign) on those tables > if there -- validate why they are there and see if they can > be removed -- and the application would then need to gurantee > correctness Nop, no triggers or constraints (other than not null). > VSP > > > On Sun, 23 Nov 2008 08:34:57 +0200, "Ciprian Dorin Craciun" > <ciprian.craciun@gmail.com> said: >> On Sun, Nov 23, 2008 at 1:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> > Alvaro Herrera <alvherre@commandprompt.com> writes: >> >> The problem is, most likely, on updating the indexes. Heap inserts >> >> should always take more or less the same time, but index insertion >> >> requires walking down the index struct for each insert, and the path to >> >> walk gets larger the more data you have. >> > >> > It's worse than that: his test case inserts randomly ordered keys, which >> > means that there's no locality of access during the index updates. Once >> > the indexes get bigger than RAM, update speed goes into the toilet, >> > because the working set of index pages that need to be touched also >> > is bigger than RAM. That effect is going to be present in *any* >> > standard-design database, not just Postgres. >> > >> > It's possible that performance in a real-world situation would be >> > better, if the incoming data stream isn't so random; but it's >> > hard to tell about that with the given facts. >> > >> > One possibly useful trick is to partition the data by timestamp with >> > partition sizes chosen so that the indexes don't get out of hand. >> > But the partition management might be enough of a PITA to negate >> > any win. >> > >> > regards, tom lane >> >> Thanks for your feedback! This is just as I supposed, but i didn't >> had the Postgres experience to be certain. >> I'll include your conclusion to my report. >> >> Ciprian Craciun. >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general > -- > V S P > toreason@fastmail.fm > > -- > http://www.fastmail.fm - Email service worth paying for. Try it for free
> If you watch the speed, you'll see that the insert > speed is the > same, but the scan speed is worse (from 32k to 200). As I said, I don't know a lot about these things. But I would like someone to comment on this (so that maybe I will know something!): 1) I thought the poor insert performance was due to a "locality of access" in the index creation, hence I thought that sincethe timestamp is always increasing putting it as first column in the index should give a better insert speed, but itdidn't: why???? 2) I thought that given a query like: select * from taba where clientid=2 and sensor=4 and timestamp between 'start_t' and 'end_t' there shouldn't be a huge difference in speed between an index defined as (timestamp, clientid, sensorid) and another onedefined as (clientid, sensor, timestamp) but I was VERY wrong: it's 1000 times worst. How is it possible??? It's obvious I don't know how multicolumn indexes work... Can someone explain?
On Sun, Nov 23, 2008 at 10:01 AM, Ciprian Dorin Craciun <ciprian.craciun@gmail.com> wrote: > On Sun, Nov 23, 2008 at 3:28 PM, Stephen Frost <sfrost@snowman.net> wrote: >> * Ciprian Dorin Craciun (ciprian.craciun@gmail.com) wrote: >>> > Even better might be partitioning on the timestamp. IF all access is >>> > in a certain timestamp range it's usually a big win, especially >>> > because he can move to a new table every hour / day / week or whatever >>> > and merge the old one into a big "old data" table. >>> >>> Yes, If i would speed the inserts tremendously... I've tested it >>> and the insert speed is somewhere at 200k->100k. >>> >>> But unfortunately the query speed is not good at all because most >>> queries are for a specific client (and sensor) in a given time >>> range... >> >> Have you set up your partitions correctly (eg, with appropriate CHECK >> constraints and with constraint_exclusion turned on)? Also, you'd want >> to keep your indexes on the individual partitions, of course.. That >> should improve query time quite a bit since it should only be hitting >> the partitions where the data might be. >> >> Stephen >> >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.9 (GNU/Linux) >> >> iEYEARECAAYFAkkpWpEACgkQrzgMPqB3kihvyACgm6ITdkodTqZvDLCjqavj9lkR >> w1oAnRrB1rbW+bF6Spr77VcH5/Mty4S6 >> =G7aX >> -----END PGP SIGNATURE----- > > Well, now that I've read the previous two emails better, I > understand what Scot and Stephen are talking about... > > So if I understood it correctly: I should build indexes only for > certain parts of the data (like previous full hour and so). But I see > a problem: wouldn't this lead to a lot of indices beeing created (24 / > hour, ~150 / week, ...)? No, not exactly what I'm talking about. I'm talking about pre-creating partitions that the data will soon go into (let's say a new one every hour) with indexes in place, and having a trigger that fires on insert to put the data into the right partition. Once that partition is no longer being inserted into, and we aren't running a bunch of queries on it, we migrate it to a historical partition. So, your table looks something like this all the time: |**|^^|##|##|##|##|$$$$$$$$$$$$$$...| Where: ** is a partition we have created in advance of needing it. ^^ is the partition we are currently writing to ## are the partitions we're still using in select queries a lot $$$... are the old data stuffed into the monolithic history table. When it's time to switch to writing to the new partition (i.e. **) we make a new one ahead of that, and the trigger starts writing to what was a ** partition but is now the new ^^, and the ^^ becomes a ##. At the end of the day / week whatever, we take all the old ## partitions and move their data into the $$$ and drop the ## partitions. Note that we only need to put data into an archive partition to keep from having hundreds or thousands of partitions. There's a limit of a few hundred partitions where things start getting slow again due to planner overhead.
> Yes, the figures are like this: > * average number of raw inserts / second (without any optimization > or previous aggregation): #clients (~ 100 thousand) * #sensors (~ 10) > / 6seconds = 166 thousand inserts / second... this is average? 166 000 * 20 bytes per record * 86400 seconds per day = 280GB / day , not counting indices. What is the time span You want to have the data from? Greetings Marcin
On Mon, Nov 24, 2008 at 3:42 AM, marcin mank <marcin.mank@gmail.com> wrote: >> Yes, the figures are like this: >> * average number of raw inserts / second (without any optimization >> or previous aggregation): #clients (~ 100 thousand) * #sensors (~ 10) >> / 6seconds = 166 thousand inserts / second... > > this is average? > 166 000 * 20 bytes per record * 86400 seconds per day = 280GB / day , > not counting indices. > > What is the time span You want to have the data from? > > Greetings > Marcin Well I'm not sure for the archival period... Maybe a day, maybe a week... For the moment I'm just struggling with the insert speed. (We could also use sharding -- horizontal partitioning on different machines -- and this wourd reduce the load...) Ciprian.