Обсуждение: Large Table Performance
List, I've got a problem where I need to make a table that is going to grow by an average of 230,000 records per day. There are only 20 columns in the table, mostly char and integer. It's FK'd in two places to another table for import/export transaction id's and I have a serial primary key and an index on a date column for when I need to search (every search is done inside a date range). I thought it would be OK but after a few weeks of operation I have more than five million records in there. Some queries take more than five minutes to complete and I'm sad about that. How can I make this faster? I could munge dates into integers if their faster, I'm OK with that. What can I tweak in the configuration file to speed things up? What about some drastic schema change that more experience would have shown me? I cannot show the full schema but it's like this: -- My import/export data information table ie_data (id serial primary key, date date, [12 other columns here]) big_transaction_table(id serial primary key, import_id int w/FK, export_id int w/FK, date date, [20 other necessary transaction detail columns]) So when I say select x,y,z from big_transaction_table where date>='10/2/2005' and date<='10/4/2005' and transaction_status in (1,2,3) order by date; it takes five+ minutes. TIA for any suggestions. /djb
On 22 Oct 2005, at 01:25, Edoceo Lists wrote: > List, > I've got a problem where I need to make a table that is going to > grow by an average of 230,000 records per day. There are only 20 > columns in the table, mostly char and integer. It's FK'd in two > places to another table for import/export transaction id's and I > have a serial primary key and an index on a date column for when I > need to search (every search is done inside a date range). I > thought it would be OK but after a few weeks of operation I have > more than five million records in there. Some queries take more > than five minutes to complete and I'm sad about that. How can I > make this faster? I could munge dates into integers if their > faster, I'm OK with that. What can I tweak in the configuration > file to speed things up? What about some drastic schema change > that more experience would have shown me? I cannot show the full > schema but it's like this: > > -- My import/export data information table > ie_data (id serial primary key, date date, [12 other columns here]) > > big_transaction_table(id serial primary key, import_id int w/FK, > export_id int w/FK, date date, [20 other necessary transaction > detail columns]) > > So when I say > select x,y,z from big_transaction_table where date>='10/2/2005' and > date<='10/4/2005' and transaction_status in (1,2,3) order by date; > it takes five+ minutes. > > TIA for any suggestions. What hardware are you on? What query plans (output from explain) do your queries give you? What PG version? We do about 100,000 rows a minute (300 MB+) a day so I suspect your queries are doing full table scans or something. Of course we don't use any FKs so I suppose they could be biting you.
On Fri, Oct 21, 2005 at 05:25:22PM -0700, Edoceo Lists wrote: [summary of situation] > Some queries take more than five minutes to complete and I'm sad > about that. How can I make this faster? You might get more help on pgsql-performance, which is specifically for discussions of performance issues. > I could munge dates into integers if their faster, I'm OK with that. Let's identify the problem before thinking about possible solutions. > What can I tweak in the configuration file to speed things up? What version of PostgreSQL are you using? What operating system? What kind of hardware (CPU, disks, amount of memory, etc.)? Are you regularly vacuuming and analyzing all tables in the database? Have you tuned your configuration at all? What are the following settings? shared_buffers work_mem (8.x) or sort_mem (7.x) effective_cache_size random_page_cost You might want to read through a tuning guide like this one: http://www.powerpostgresql.com/PerfList > So when I say > select x,y,z from big_transaction_table where date>='10/2/2005' and > date<='10/4/2005' and transaction_status in (1,2,3) order by date; > it takes five+ minutes. Please post the EXPLAIN ANALYZE output (not just EXPLAIN) of this query; that'll show us what the query planner is doing and how accurate its row count estimates are. If the row counts are way off then you might see an improvement by increasing the statistics target for the date and maybe the transaction_status columns. Or you might just need to analyze the table to update its statistics, and possibly vacuum it to get rid of dead tuples. -- Michael Fuhr