Re: optimizing queries and indexes...
От | Josh Berkus |
---|---|
Тема | Re: optimizing queries and indexes... |
Дата | |
Msg-id | web-115943@davinci.ethosmedia.com обсуждение исходный текст |
Ответ на | optimizing queries and indexes... ("Robert J. Sanford, Jr." <rsanford@nolimitsystems.com>) |
Ответы |
Re: optimizing queries and indexes...
(Stephan Szabo <sszabo@megazone23.bigpanda.com>)
|
Список | pgsql-sql |
Robert, I'd advise you to buy a book, but frankly I don't know a good one on DB performance optimization. The DB optimizers I've met tend to guard their secrets closely. Suggestions, anyone? > i have several friends that are DBA's by profession and work on > oracle > and/or ms sql server. they have all told me that while there are some > general rules to follow that each database is different. Yup. They told ya right. > for example, one thing that one of my friends said is: > select X > from big_table > ,little_table > > Generally speaking, Oracle optimizes better > when the smaller/reducing tables are on the > bottom and the larger tables are on the top. > I believe SQLServer likes them in the opposite > direction. Generally speaking, in PG SQL it pays to leave the join order up to the optimizer as much as possible. Tom and Stephan have built a phenominal query optimizer, and you are much more likely to slow it down if you limit its choices. Also, for the implicit join style in Postgres the order in which you give tables is largely ignored by the optimizer. Order only matters in explicit joins. That being said, there are a few practices that can help: 1. When joining a limited subset of an exceptionally large table to several smaller tables, consider using a subselect for the large table. In some cases this will speed up query execution. Sometimes it won't. Example: 2. When doing several Inner (normal) Joins and several Outer Joins, do the inner joins first and the outer joins second, as the inner joins should limit the result set that is being matched for the outer joins. 3. All joins and where conditions, ideally, should be executed on indexed columns. 4. EXISTS and NOT EXISTS are almost always faster than IN and NOT IN for a sub-select in the WHERE clause. 5. Some Postgres-specific tips: a. VACUUM ANALYZE regularly! b. If you have indexes on tables with a large number of regular deletions, you will need to drop and re-create the indexes during periods of inactivity. This is a Postgres defect that will be fixed in ver. 7.3 or later. 6. Make sure to establish Foriegn Key constraints wherever appropriate. This will speed up joins on the constrained columns considerably, as the parser does not have to worry about unmatched rows. 7. Set Unique indexes on all unique columns. This also helps the parser. > and also: > Generally speaking indexes should be built > with column names in the order of higher > cardinality. I frequently screw this up > because it runs counter to the way you think > about building your joins. This may be true in Oracle (opinions?) but is is neither true in Postgres nor true in SQL Server (or Frontbase). If you build your indexes according to this rule you will be dissapointed in the results. The real rule is: Indexes should always be built according to the manner and order in which they will be queried. Example: If you have a database in which there is a unique set of EmployeeIDs for each Region, but not unique between Regions. Thus all joins to Employees join on both the regionID and the employeeID. Further, you force the user to select a region before s/he can look up an employee. In this case, you would want to establish your indexes as CREATE INDEX reg_emp_ix UNIQUE ( regionID, employeeID ) Create all your joins as: JOIN employees e ON (t.regionID = e.regionID AND t.employeeID = e.employeeID) And your WHERE clauses as: WHERE regionID = $region AND employeeID = $emp For all of these, order is immensely important. However, this index, while immensely useful for the above purposes, would be useless if there are more than 2 regions and you wanted to select on employeeID only for some reason. In that case, you would need to add a second index on employeeID alone, as the regionID, employeeID index would not be used in a query that filtered only by employeeID and ignored the regionID. One of my clients didn't know any of the above and established a number of indexes on tables > 3,000,000 records using columns in alphabetical order (!?!). They then had to call me, and re-ordering the index columns cut the delay in single-row queries (especially DELETE queries) by 80%. > so, i am hopeful that there is some sort of postgresql performance > faq > for queries. Somebody wanna re-organize the above with more examples? -Josh Berkus ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete information technology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
Вложения
В списке pgsql-sql по дате отправления: