Обсуждение: Scaling 10 million records in PostgreSQL table
Hi all, I have 10 million records in my postgres table.I am running the database in amazon ec2 medium instance. I need to accessthe last week data from the table. It takes huge time to process the simple query.So, i throws time out exception error. query is : select count(*) from dealer_vehicle_details where modified_on between '2012-10-01' and '2012-10-08' and dealer_id=270001; After a lot of time it responds 1184 as count what are the ways i have to follow to increase the performance of this query? The insertion also going parallel since the daily realtime updation. what could be the reason exactly for this lacking performace?
On Mon, Oct 8, 2012 at 10:26 AM, Navaneethan R <nava@gridlex.com> wrote: > Hi all, > > I have 10 million records in my postgres table.I am running the database in amazon ec2 medium instance. I need toaccess the last week data from the table. > It takes huge time to process the simple query.So, i throws time out exception error. > > query is : > select count(*) from dealer_vehicle_details where modified_on between '2012-10-01' and '2012-10-08' and dealer_id=270001; > > After a lot of time it responds 1184 as count > > what are the ways i have to follow to increase the performance of this query? > > The insertion also going parallel since the daily realtime updation. > > what could be the reason exactly for this lacking performace? can you send explain analyze? also table structure? merlin
On 10/08/2012 17:26, Navaneethan R wrote: > Hi all, Hello, > I have 10 million records in my postgres table.I am running the database in amazon ec2 medium instance. I need toaccess the last week data from the table. > It takes huge time to process the simple query.So, i throws time out exception error. > > query is : > select count(*) from dealer_vehicle_details where modified_on between '2012-10-01' and '2012-10-08' and dealer_id=270001; please show us an EXPLAIN ANALYZE of the query > After a lot of time it responds 1184 as count > > what are the ways i have to follow to increase the performance of this query? > > The insertion also going parallel since the daily realtime updation. > > what could be the reason exactly for this lacking performace? missing index, wrong configuration, ... please also note that, generally, all those "cloud stuff" have generally very poor I/O performance .. > >
On 2012-10-08 10:26, Navaneethan R wrote: > Hi all, > > I have 10 million records in my postgres table.I am running the > database in amazon ec2 medium instance. I need to access the last > week > data from the table. > It takes huge time to process the simple query.So, i throws time out > exception error. > > query is : > select count(*) from dealer_vehicle_details where modified_on > between '2012-10-01' and '2012-10-08' and dealer_id=270001; > > After a lot of time it responds 1184 as count > > what are the ways i have to follow to increase the performance of > this query? > > The insertion also going parallel since the daily realtime updation. > > what could be the reason exactly for this lacking performace? What indexes do you have on your table? I'll bet none. What does an explain select count(*) from dealer_vehicle_details where modified_on between '2012-10-01' and '2012-10-08' and dealer_id=270001; show? I have a 380Million row table, with LOTS of indexing, and we perform very well. Without indexes, the query had to sequential scan all 10 million rows. That's going to be bad on ANY database.
On 10/08/2012 08:26 AM, Navaneethan R wrote: > Hi all, > > I have 10 million records in my postgres table.I am running the database in amazon ec2 medium instance. I need toaccess the last week data from the table. > It takes huge time to process the simple query.So, i throws time out exception error. > > query is : > select count(*) from dealer_vehicle_details where modified_on between '2012-10-01' and '2012-10-08' and dealer_id=270001; > > After a lot of time it responds 1184 as count > > what are the ways i have to follow to increase the performance of this query? > > The insertion also going parallel since the daily realtime updation. > > what could be the reason exactly for this lacking performace? > > What version of PostgreSQL? You can use "select version();" and note that 9.2 has index-only scans which can result in a substantial performance boost for queries of this type. What is the structure of your table? You can use "\d+ dealer_vehicle_details" in psql. Have you tuned PostgreSQL in any way? If so, what? Cheers, Steve
On 10/08/2012 11:26 PM, Navaneethan R wrote: > Hi all, > > I have 10 million records in my postgres table.I am running the database in amazon ec2 medium instance. EC2 usually means "My I/O performance is terrible" and "medium instance" means "I don't have enough RAM for caching to make up for my terrible I/O" at the database sizes you're talking. Anything that hits most of the database is likely to perform pretty poorly on something like EC2. It might be worth considering one of the high memory or high I/O instances, but unfortunately they only come in "really big and really expensive". If you already have appropriate indexes and have used `explain analyze` to verify that the query isn't doing anything slow and expensive, it's possible the easiest way to improve performance is to set up async replication or log shipping to a local hot standby on real physical hardware, then do the query there. -- Craig Ringer
On Mon, Oct 8, 2012 at 1:27 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:
If you already have appropriate indexes and have used `explain analyze` to verify that the query isn't doing anything slow and expensive, it's possible the easiest way to improve performance is to set up async replication or log shipping to a local hot standby on real physical hardware, then do the query there.
I've run postgresql on medium instances using elastic block store for the storage and had no difficulty running queries like this one on tables of comparable (and larger) size. It might not come back in 10ms, but such queries weren't so slow that I would describe the wait as "a lot of time" either. My guess is that this is a sequential scan on a 10 million record table with lots of bloat due to updates. Without more info about table structure and explain analyze output, we are all just guessing, though. Please read the wiki page which describes how to submit performance problems and restate your question.
On Tuesday, October 9, 2012 1:40:08 AM UTC+5:30, Steve Crawford wrote: > On 10/08/2012 08:26 AM, Navaneethan R wrote: > > > Hi all, > > > > > > I have 10 million records in my postgres table.I am running the database in amazon ec2 medium instance. I needto access the last week data from the table. > > > It takes huge time to process the simple query.So, i throws time out exception error. > > > > > > query is : > > > select count(*) from dealer_vehicle_details where modified_on between '2012-10-01' and '2012-10-08' and dealer_id=270001; > > > > > > After a lot of time it responds 1184 as count > > > > > > what are the ways i have to follow to increase the performance of this query? > > > > > > The insertion also going parallel since the daily realtime updation. > > > > > > what could be the reason exactly for this lacking performace? > > > > > > > > What version of PostgreSQL? You can use "select version();" and note > > that 9.2 has index-only scans which can result in a substantial > > performance boost for queries of this type. > > > > What is the structure of your table? You can use "\d+ > > dealer_vehicle_details" in psql. > > > > Have you tuned PostgreSQL in any way? If so, what? > > > > Cheers, > > Steve > > > > > > -- > > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-performance version(): PostgreSQL 8.4.8 on i686-pc-linux-gnu, compiled by GCC gcc-4.5.real (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2, 32-bit Desc: Table "public.dealer_vehicle_details" Column | Type | Modifiers | Storage| Description ----------------+--------------------------+-------------------------------------------------------------------------+---------+------------- id | integer | not null default nextval('dealer_vehicle_details_new_id_seq'::regclass) | plain | vin_id | integer | not null | plain | vin_details_id | integer | | plain | price | integer | | plain | mileage | double precision | | plain | dealer_id | integer | not null | plain | created_on | timestamp with time zone | not null | plain | modified_on | timestamp with time zone | not null | plain | Indexes: "dealer_vehicle_details_pkey" PRIMARY KEY, btree (id) "idx_dealer_sites_id" UNIQUE, btree (id) WHERE dealer_id = 270001 "idx_dealer_sites_id_526889" UNIQUE, btree (id) WHERE dealer_id = 526889 "idx_dealer_sites_id_9765" UNIQUE, btree (id, vin_id) WHERE dealer_id = 9765 "idx_dealer_sites_id_9765_all" UNIQUE, btree (id, vin_id, price, mileage, modified_on, created_on, vin_details_id) WHEREdealer_id = 9765 "mileage_idx" btree (mileage) "price_idx" btree (price) "vehiclecre_idx" btree (created_on) "vehicleid_idx" btree (id) "vehiclemod_idx" btree (modified_on) "vin_details_id_idx" btree (vin_details_id) "vin_id_idx" btree (vin_id) Foreign-key constraints: "dealer_vehicle_master_dealer_id_fkey" FOREIGN KEY (dealer_id) REFERENCES dealer_dealer_master(id) DEFERRABLE INITIALLYDEFERRED "dealer_vehicle_master_vehicle_id_fkey" FOREIGN KEY (vin_id) REFERENCES dealer_vehicle(id) DEFERRABLE INITIALLY DEFERRED "dealer_vehicle_master_vin_details_id_fkey" FOREIGN KEY (vin_details_id) REFERENCES vin_lookup_table(id) DEFERRABLE INITIALLYDEFERRED Has OIDs: no After created the index for WHERE clause "WHERE dealer_id = 270001"..It is performing better.I have more dealer ids ShouldI do it for each dealer_id? And The insertion service also happening background parallel. So, What are the important steps I should follow frequently to keep the database healthy? Since, the insertion is happening all time..It would reach millions of millions soon.What are precautions should be followed?
On Mon, Oct 8, 2012 at 1:25 PM, Navaneethan R <nava@gridlex.com> wrote:
After created the index for WHERE clause "WHERE dealer_id = 270001"..It is performing better.I have more dealer ids Should I do it for each dealer_id?
All you've really done is confuse the issue. Please read the wiki page on how to submit performance questions and actually follow the directions. Show us the table structure when the query is performing poorly ALONG WITH explain analyze output, so we can see how the query is being handled by the db. Adding indexes for just one particular value isn't likely a great solution unless there's a reason why that value is special or performance for that value needs to be particularly good. Far better to get at the root problem of performance issues on that table, whether it is table bloat, insufficient indexes, invalid statistics, or something else.
On Mon, Oct 8, 2012 at 1:25 PM, Navaneethan R <nava@gridlex.com> wrote: > On Tuesday, October 9, 2012 1:40:08 AM UTC+5:30, Steve Crawford wrote: >> On 10/08/2012 08:26 AM, Navaneethan R wrote: >> >> > Hi all, >> >> > >> >> > I have 10 million records in my postgres table.I am running the database in amazon ec2 medium instance. I needto access the last week data from the table. >> >> > It takes huge time to process the simple query.So, i throws time out exception error. >> >> > >> >> > query is : >> >> > select count(*) from dealer_vehicle_details where modified_on between '2012-10-01' and '2012-10-08' and dealer_id=270001; >> >> > >> >> > After a lot of time it responds 1184 as count >> >> > >> >> > what are the ways i have to follow to increase the performance of this query? >> >> > >> >> > The insertion also going parallel since the daily realtime updation. >> >> > >> >> > what could be the reason exactly for this lacking performace? >> >> > >> >> > >> >> What version of PostgreSQL? You can use "select version();" and note >> >> that 9.2 has index-only scans which can result in a substantial >> >> performance boost for queries of this type. >> >> >> >> What is the structure of your table? You can use "\d+ >> >> dealer_vehicle_details" in psql. >> >> >> >> Have you tuned PostgreSQL in any way? If so, what? >> >> >> >> Cheers, >> >> Steve >> >> >> >> >> >> -- >> >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >> >> To make changes to your subscription: >> >> http://www.postgresql.org/mailpref/pgsql-performance > > > version(): > > PostgreSQL 8.4.8 on i686-pc-linux-gnu, compiled by GCC gcc-4.5.real (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2, 32-bit > > Desc: > Table "public.dealer_vehicle_details" > Column | Type | Modifiers |Storage | Description > ----------------+--------------------------+-------------------------------------------------------------------------+---------+------------- > id | integer | not null default nextval('dealer_vehicle_details_new_id_seq'::regclass) |plain | > vin_id | integer | not null |plain | > vin_details_id | integer | |plain | > price | integer | |plain | > mileage | double precision | |plain | > dealer_id | integer | not null |plain | > created_on | timestamp with time zone | not null |plain | > modified_on | timestamp with time zone | not null |plain | > Indexes: > "dealer_vehicle_details_pkey" PRIMARY KEY, btree (id) > "idx_dealer_sites_id" UNIQUE, btree (id) WHERE dealer_id = 270001 > "idx_dealer_sites_id_526889" UNIQUE, btree (id) WHERE dealer_id = 526889 > "idx_dealer_sites_id_9765" UNIQUE, btree (id, vin_id) WHERE dealer_id = 9765 > "idx_dealer_sites_id_9765_all" UNIQUE, btree (id, vin_id, price, mileage, modified_on, created_on, vin_details_id)WHERE dealer_id = 9765 > "mileage_idx" btree (mileage) > "price_idx" btree (price) > "vehiclecre_idx" btree (created_on) > "vehicleid_idx" btree (id) > "vehiclemod_idx" btree (modified_on) > "vin_details_id_idx" btree (vin_details_id) > "vin_id_idx" btree (vin_id) > Foreign-key constraints: > "dealer_vehicle_master_dealer_id_fkey" FOREIGN KEY (dealer_id) REFERENCES dealer_dealer_master(id) DEFERRABLE INITIALLYDEFERRED > "dealer_vehicle_master_vehicle_id_fkey" FOREIGN KEY (vin_id) REFERENCES dealer_vehicle(id) DEFERRABLE INITIALLY DEFERRED > "dealer_vehicle_master_vin_details_id_fkey" FOREIGN KEY (vin_details_id) REFERENCES vin_lookup_table(id) DEFERRABLEINITIALLY DEFERRED > Has OIDs: no > > > After created the index for WHERE clause "WHERE dealer_id = 270001"..It is performing better.I have more dealer idsShould I do it for each dealer_id? You seem to have created a partial index. Normally, that's not what you want. You just want an index on the field "dealer_id", without the conditional index. Conditional indexes are useful when you have a lot of queries with the same WHERE clause entry, such as "WHERE deleted_at IS NULL" or whatnot where most of the table has been soft-deleted. Here's a recent blog post discussing the topic that doesn't presume a lot of familiarity with database performance, geared towards application developers writing OLTP applications, which this seems like one of: http://www.craigkerstiens.com/2012/10/01/understanding-postgres-performance/ -- fdr