Re: Query results caching?
От | Ben-Nes Yonatan |
---|---|
Тема | Re: Query results caching? |
Дата | |
Msg-id | 430D8DA5.2040704@canaan.co.il обсуждение исходный текст |
Ответ на | Re: Query results caching? ("Dann Corbit" <DCorbit@connx.com>) |
Список | pgsql-general |
Dann Corbit wrote: >>-----Original Message----- >>From: Ben-Nes Yonatan [mailto:da@canaan.co.il] >>Sent: Monday, August 22, 2005 3:28 PM >>To: Jim C. Nasby; Sean Davis; Dann Corbit >>Cc: pgsql-general@postgresql.org >>Subject: Re: [GENERAL] Query results caching? >> >> >> >>>On Mon, Aug 22, 2005 at 10:13:49PM +0200, Ben-Nes Yonatan wrote: >>> >>> >>> >>>>I think that I was misunderstood, Ill make an example: >>>>Lets say that im making the following query for the first time on >>>> >>>> >the > > >>>>"motorcycles" table which got an index on the "manufacturer" field: >>>> >>>>EXPLAIN ANALYZE SELECT manufacturer FROM motorcycles WHERE >>>>manufacturer='suzuki'; >>>>... Total runtime: 3139.587 ms >>>> >>>>Now im doing the same query again and i get a much faster result >>>> >>>> >(cause > > >>>>of the "caching"): Total runtime: 332.53 ms >>>> >>>>After both of those queries I drop the index and query the table >>>> >>>> >again > > >>>>with the exact same query as before and now I receive: Total >>>> >>>> >runtime: > > >>>>216834.871 ms >>>> >>>>And for my last check I run the exact same query again (without >>>> >>>> >creating > > >>>>the INDEX back again) and I get quite similar result to my third >>>> >>>> >query: > > >>>>Total runtime: 209218.01 ms >>>> >>>> >>>>My problem is that (maybe I just dont understand something basic >>>>here...) the last 2 (also the second query but I dont care about >>>> >>>> >that) > > >>>>queries were using the "cache" that was created after the first >>>> >>>> >query > > >>>>(which had an INDEX) so none of them actually showed me what will >>>> >>>> >happen > > >>>>if a client will do such a search (without an INDEX) for the first >>>> >>>> >time. > > >>>>I want to delete that "caching" after I do the first 2 queries so my >>>>next queries will show me "real life results". >>>> >>>> >>> >>> >>Ok I tried to handle both of your replies cause I got them at 2 >> >> >seperate > > >>emails. >> >>Dann Corbit wrote: >> > These results are all what I would expect. When you delete the >> >> >index, > > >> > the query will be forced to do a table scan (to examine every >> >> >single > > >> > record in the table one by one). If the table is non-trivial it is >> > unlikely that either the OS or the database will cache the whole >> >> >thing > > >> > in memory. However, when you query a small record set, then it is >> > likely to be retained in RAM which is literally thousands of times >> > faster than disk. >> >>Didnt know that, good to know though doesnt assure me... >>What if I drop the INDEX but create a diffrent INDEX which also make >> >> >the > > >>process alot faster then without an INDEX but slower/faster then the >> >> >one > > >>before, will it wont use the former "caching"? >> >> > >You can add several indexes to a single table. >If you do some statistics on the query patterns, you can find what >indexes are needed to make the queries as fast as possible. > > > >> > If a search is to be made on a frequent basis, you should create an >> > index. >> > The query results above show you why. >> >>Obvious :) >> >> > Think about this for a minute. The real life results you want are >> > very fast results. For that reason, you should try to model the >> > customer queries as nearly as possible. If you have a canned >> > application like order entry, then the real parameterized query set >> > will probably be quite small in real life. If you are creating a >> > server for ad-hoc queries then it will be far more difficult to >> >> >model > > >> > in real life. >> > >> > What is the real purpose of the application that you are writing? >> > >> > Will users be using a pre-programmed front end, or will they be >> >> >typing > > >> > in queries free-form for whatever their heart desires? >> >>Ok ill try to describe the system as short & precise as possible (its >>also passed midnight here :)). >>Each day I receive about 4 million rows of data (products) which I >>insert into table1 (after I delete all of the previous data it had), >>along it I receive for every row about another 15 keywords which I >>insert into table2 (where as in table1 I delete all of the previous >> >> >data > > >>it had also), this process is a fact that I cant change. >> >> > >If the data arrives on a daily basis, and is not updated until the next >day, I suggest creating a lot of indexes, and cluster on the index used >most frequently. > >What exactly are the 15 keywords in the second table for? >Are they column names? >Are they categories for the first table? >Why is the second table necessary at all? > > Now clustering was unknown to me when I received this email from you... THANKS!!! I created 4 replicas of my table ordered by the diffrent order that I want to allow my users to use, yep its quite alot of GB but I dont care about it as long its working fast, and damn its flying! less then 100 ms and thats on a weak server which will be replaced soon!. > > >>Now the users of the site can search for data from table1 by typing >>whichever (and up to 4) words as they want at a text field (search >> >> >input > > >>string) and the server should display the correct results by querying >>table1 & join table2 for its keywords. >> >> > >Can you give the exact table definitions for the two tables, and also >the most likely queries you are going to receive? > >When the users type in keywords -- can these keywords be applied against >any column in the table or only against a single column or against a >small set of columns or something else? > > Well yea its working on a single column and that column got indexed so its flying. > > >>I succeded to do it quite fast but when I tried to ORDER BY my results >>its times jumped up drastically (2-3 seconds for a query... and thats >>after the caching..). >> >> > >Order by will complicate quite a bit. I have not tried it on >PostgreSQL, but if you know the result set is small, a technique is to >select into a temp table and then order by on the temp table. It works >well on other database systems (caveat: it has been a while since I >worked as a DBA and I have not worked as a DBA on PostgreSQL). > > That cant work cause it wont order all of the results by the desired column but only the returned results so when a user will want to see more results he can see results which were supposed to be displayed before his previous display. > > >>I can't allow a situation where a user will search with a keyword >> >> >which > > >>wasnt 'cached' before and because of that he will wait 15 seconds for >> >> >a > > >>result. >> >> > >You might try throwing hardware at it. A 4 CPU AMD 64 machine with >Ultra 320 striped SCSI disk array and a few gigabytes of ram will >perform admirably. > >You might want more than is possible. If you have 4 million rows, and >each row is 1K, then that is 4 GB. If your users do a query that has >not been performed yet and you have to do a table scan, then you cannot >expect some kind of sub-second response times because it won't be >physically possible on any system. > >If you know what most queries may look like or if you only have a few >character columns so that you can make an index on each of them and if >you can put a unique clustered index on the most important (frequently >used) item, then you can get the majority of your queries to run very >quickly, and only on rare occasions will the query be slow. > >If I have a 4 million row table, with long rows and big varchar columns >and I run a query on a column like this: > > SELECT * FROM inventory WHERE product LIKE '%Table%' > >It isn't going to be fast on any system with any database. >[snip] > > Well a better hardware will soon be working but anyway your idea about clustering solved my problem. Thanks alot again! :) Ben-Nes Yonatan Canaan Surfing ltd. http://www.canaan.net.il
В списке pgsql-general по дате отправления: