Re: Shared detoast Datum proposal
От | Andy Fan |
---|---|
Тема | Re: Shared detoast Datum proposal |
Дата | |
Msg-id | 875xv6k3k1.fsf@163.com обсуждение исходный текст |
Ответ на | Re: Shared detoast Datum proposal (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: Shared detoast Datum proposal
Re: Shared detoast Datum proposal |
Список | pgsql-hackers |
Hi Robert, > Andy Fan asked me off-list for some feedback about this proposal. I > have hesitated to comment on it for lack of having studied the matter > in any detail, but since I've been asked for my input, here goes: Thanks for doing this! Since we have two totally different designs between Tomas and me and I think we both understand the two sides of each design, just that the following things are really puzzled to me. - which factors we should care more. - the possiblility to improve the design-A/B for its badness. But for the point 2, we probably needs some prefer design first. for now let's highlight that there are 2 totally different method to handle this problem. Let's call: design a: detoast the datum into slot->tts_values (and manage the memory with the lifespan of *slot*). From me. design b: detost the datum into a CACHE (and manage the memory with CACHE eviction and at released the end-of-query). From Tomas. Currently I pefer design a, I'm not sure if I want desgin a because a is really good or just because design a is my own design. So I will summarize the the current state for the easier discussion. I summarize them so that you don't need to go to the long discussion, and I summarize the each point with the link to the original post since I may misunderstand some points and the original post can be used as a double check. > I agree that there's a problem here, but I suspect this is not the > right way to solve it. Let's compare this to something like the > syscache. Specifically, let's think about the syscache on > pg_class.relname. OK. > In the case of the syscache on pg_class.relname, there are two reasons > why we might repeatedly look up the same values in the cache. One is > that the code might do multiple name lookups when it really ought to > do only one. Doing multiple lookups is bad for security and bad for > performance, but we have code like that in some places and some of it > is hard to fix. The other reason is that it is likely that the user > will mention the same table names repeatedly; each time they do, they > will trigger a lookup on the same name. By caching the result of the > lookup, we can make it much faster. An important point to note here is > that the queries that the user will issue in the future are unknown to > us. In a certain sense, we cannot even know whether the same table > name will be mentioned more than once in the same query: when we reach > the first instance of the table name and look it up, we do not have > any good way of knowing whether it will be mentioned again later, say > due to a self-join. Hence, the pattern of cache lookups is > fundamentally unknowable. True. > But that's not the case in the motivating example that started this > thread. In that case, the target list includes three separate > references to the same column. We can therefore predict that if the > column value is toasted, we're going to de-TOAST it exactly three > times. True. > If, on the other hand, the column value were mentioned only > once, it would be detoasted just once. In that latter case, which is > probably quite common, this whole cache idea seems like it ought to > just waste cycles, which makes me suspect that no matter what is done > to this patch, there will always be cases where it causes significant > regressions. I agree. > In the former case, where the column reference is > repeated, it will win, but it will also hold onto the detoasted value > after the third instance of detoasting, even though there will never > be any more cache hits. True. > Here, unlike the syscache case, the cache is > thrown away at the end of the query, so future queries can't benefit. > Even if we could find a way to preserve the cache in some cases, it's > not very likely to pay off. People are much more likely to hit the > same table more than once than to hit the same values in the same > table more than once in the same session. I agree. One more things I want to highlight it "syscache" is used for metadata and *detoast cache* is used for user data. user data is more likely bigger than metadata, so cache size control (hence eviction logic run more often) is more necessary in this case. The first graph in [1] (including the quota text) highlight this idea. > Suppose we had a design where, when a value got detoasted, the > detoasted value went into the place where the toasted value had come > from. Then this problem would be handled pretty much perfectly: the > detoasted value would last until the scan advanced to the next row, > and then it would be thrown out. So there would be no query-lifespan > memory leak. I think that is same idea as design a. > Now, I don't really know how this would work, because the > TOAST pointer is coming out of a slot and we can't necessarily write > one attribute of any arbitrary slot type without a bunch of extra > overhead, but maybe there's some way. I think the key points includes: 1. Where to put the *detoast value* so that ExprEval system can find out it cheaply. The soluation in A slot->tts_values[*]. 2. How to manage the memory for holding the detoast value. The soluation in A is: - using a lazy created MemoryContext in slot. - let the detoast system detoast the datum into the designed MemoryContext. 3. How to let Expr evalution run the extra cycles when needed. The soluation in A is: EEOP_XXX_TOAST step is designed to get riding of some *run-time* *if (condition)" . I found the existing ExprEval system have many specialized steps. I will post a detailed overall design later and explain the its known badness so far. > If it could be done cheaply enough, it would gain all the benefits of > this proposal a lot more cheaply and with fewer > downsides. I agree. > Alternatively, maybe there's a way > to notice the multiple references and introduce some kind of > intermediate slot or other holding area where the detoasted value can > be stored. > > In other words, instead of computing > > big_jsonb_col->'a', big_jsonb_col->'b', big_jsonb_col->'c' > > Maybe we ought to be trying to compute this: > > big_jsonb_col=detoast(big_jsonb_col) > big_jsonb_col->'a', big_jsonb_col->'b', big_jsonb_col->'c' > > Or perhaps this: > > tmp=detoast(big_jsonb_col) > tmp->'a', tmp->'b', tmp->'c' Current the design a doesn't use 'tmp' so that the existing ExprEval engine can find out "detoast datum" easily, so it is more like: > big_jsonb_col=detoast(big_jsonb_col) > big_jsonb_col->'a', big_jsonb_col->'b', big_jsonb_col->'c' [1] https://www.postgresql.org/message-id/87ttllbd00.fsf%40163.com -- Best Regards Andy Fan
В списке pgsql-hackers по дате отправления:
Предыдущее
От: "Imseih (AWS), Sami"Дата:
Сообщение: Re: problems with "Shared Memory and Semaphores" section of docs
Следующее
От: Jeff DavisДата:
Сообщение: Re: Skip adding row-marks for non target tables when result relation is foreign table.