Variable-length FunctionCallInfoData
От | Andres Freund |
---|---|
Тема | Variable-length FunctionCallInfoData |
Дата | |
Msg-id | 20180605172952.x34m5uz6ju6enaem@alap3.anarazel.de обсуждение исходный текст |
Ответы |
RE: Variable-length FunctionCallInfoData
Re: Variable-length FunctionCallInfoData Re: Variable-length FunctionCallInfoData Re: Variable-length FunctionCallInfoData |
Список | pgsql-hackers |
Hi, While prototyping codegen improvements for JITed expression evaluation, I once more hit the issue that the FunctionCallInfoData structs are really large (936 bytes), despite arguments beyond the fourth barely every being used. I think we should fix that. What I think we should do is convert FunctionCallInfoData->{arg,argisnull} into an array of NullableDatum (new type, a struct of Datum and bool), and then use a variable length array for the arguments. In the super common case of 2 arguments that reduces the size of the array from 936 to 64 bytes. Besides the size reduction this also noticably reduces the number of cachelines accessed - before it's absolutely guaranteed that the arg and argnull arrays for the same argument aren't on the same cacheline, after it's almost guaranteed to be the case. Attached is a *PROTOTYPE* patch doing so. Note I was too lazy to fully fix up the jit code, I didn't want to do the legwork before we've some agreement on this. We also can get rid of FUNC_MAX_ARGS after this, but there's surrounding code that still relies on it. There's some added uglyness, which I hope we can polish a bit further. Right now we allocate a good number of FunctionCallInfoData struct on the stack - which doesn't quite work afterwards anymore. So the stack allocations, for the majoroity cases where the argument number is known, currently looks like: union { FunctionCallInfoData fcinfo; char *fcinfo_data[SizeForFunctionCallInfoData(0)]; } fcinfodata; FunctionCallInfo fcinfo = &fcinfodata.fcinfo; that's not pretty, but also not that bad. It's a bit unfortunate that this'll break some extensions, but I don't really see a way around that. The current approach, to me, clearly doesn't have a future. I wonder if we should add a bunch of accessor macros / inline functions that we (or extension authors) can backpatch to reduce the pain of maintaining different code paths. Besides the change here, I think we should also go much further with the conversion to NullableDatum. There's two main areas of change: I want to move the execExpr.c related code so steps return data into NullableDatums - that removes a good chunk of pointer dereferences and allocations. Secondly I think we should move TupleTableSlot to this format - the issue with nulls / datums being on separate cachelines is noticeable in profiles, but more importantly the code looks more consistent with it. As an example for the difference in memory usage, here's the memory consumption at ExecutorRun time, for TPCH's Q01: master: TopPortalContext: 8192 total in 1 blocks; 7664 free (0 chunks); 528 used PortalContext: 1024 total in 1 blocks; 576 free (0 chunks); 448 used: ExecutorState: 90744 total in 5 blocks; 31568 free (2 chunks); 59176 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 3488 free (0 chunks); 4704 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used Grand total: 149112 bytes in 13 blocks; 82976 free (2 chunks); 66136 used patch: TopPortalContext: 8192 total in 1 blocks; 7664 free (0 chunks); 528 used PortalContext: 1024 total in 1 blocks; 576 free (0 chunks); 448 used: ExecutorState: 65536 total in 4 blocks; 33536 free (6 chunks); 32000 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 5408 free (0 chunks); 2784 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used Grand total: 123904 bytes in 12 blocks; 86864 free (6 chunks); 37040 used As you can see, the ExecutorState context uses nearly half the amount of memory as before. In a lot of cases a good chunk of the benefit is going to be hidden due to memory context sizing, but I'd expect that to matter much less for more complex statements and plpgsql functions etc. Comments? Greetings, Andres Freund
Вложения
В списке pgsql-hackers по дате отправления: