Обсуждение: Patch: Optimize memory allocation in function 'bringetbitmap'
BRIN Scan: Optimize memory allocation in function 'bringetbitmap'.
We can allocate memory for some pointer before do long loop instead of allocating
memory in long loop.
Before optimizing code (warm run)
postgres=# select count(*) from lineitem where l_orderkey=1;
count
-------
6
(1 row)
Time: 456.219 ms
After optimizing code (warm run)
postgres=# select count(*) from lineitem where l_orderkey=1;
count
-------
6
(1 row)
Time: 349.219 ms
The following shows the DDL of this test case.
CREATE TABLE LINEITEM ( L_ORDERKEY INTEGER NOT NULL,
L_PARTKEY INTEGER NOT NULL,
L_SUPPKEY INTEGER NOT NULL,
L_LINENUMBER INTEGER NOT NULL,
L_QUANTITY DECIMAL(15,2) NOT NULL,
L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL,
L_DISCOUNT DECIMAL(15,2) NOT NULL,
L_TAX DECIMAL(15,2) NOT NULL,
L_RETURNFLAG CHAR(1) NOT NULL,
L_LINESTATUS CHAR(1) NOT NULL,
L_SHIPDATE DATE NOT NULL,
L_COMMITDATE DATE NOT NULL,
L_RECEIPTDATE DATE NOT NULL,
L_SHIPINSTRUCT CHAR(25) NOT NULL,
L_SHIPMODE CHAR(10) NOT NULL,
L_COMMENT VARCHAR(44) NOT NULL);
copy lineitem from '/home/jinyu/mywork/dbgen/lineitem.tbl' delimiter '|';
create index brinLineitem on lineitem using brin(L_ORDERKEY) with(pages_per_range = 1);
Jinyu Zhang
Вложения
Jinyu Zhang wrote: > > BRIN Scan: Optimize memory allocation in function 'bringetbitmap'. > We can allocate memory for some pointer before do long loop instead of allocating > memory in long loop. > > Before optimizing code (warm run) > postgres=# select count(*) from lineitem where l_orderkey=1; > Time: 456.219 ms > > After optimizing code (warm run) > postgres=# select count(*) from lineitem where l_orderkey=1; > Time: 349.219 ms Hmm, did you measure this in a c-assert-enabled build? Those are slower in allocation because of the memory-clobber thingy; without that, the allocations would be a lot faster, so perhaps the gains are not so interesting. Still, it's a lot of palloc/pfree calls that do not happen with your patch, so perhaps it's still worthwhile, but please do measure it. However I wonder if it would be simpler to have the dtup structure have the pointers, so that you can pass it as NULL in the first call and then followup calls reuse the one allocated in the first call. That makes the callers a bit less ugly, I think. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Yes, I forgot disable-c-assert last test. The following show the test result when disable-c-assert. I think it's still worthwhile. *After optimize code (warm run)* postgres=# select count(*) from lineitem where l_orderkey=1;count ------- 6 (1 row) Time: 327.143 ms *Before optimizing code (warm run)* postgres=# select count(*) from lineitem where l_orderkey=1;count ------- 6 (1 row) Time: 404.323 ms >>>>However I wonder if it would be simpler to have the dtup structure have >>>>the pointers, so that you can pass it as NULL in the first call and then >>>>followup calls reuse the one allocated in the first call. Jinyu: the memory is allocated from perRangeCxt and perRangeCxt will be reset in loop, so this way don't work. Jinyu Zhang -- View this message in context: http://postgresql.nabble.com/Patch-Optimize-memory-allocation-in-function-bringetbitmap-tp5867537p5867647.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
zhangjinyu wrote: > >>>>However I wonder if it would be simpler to have the dtup structure have > >>>>the pointers, so that you can pass it as NULL in the first call and then > >>>>followup calls reuse the one allocated in the first call. > Jinyu: the memory is allocated from perRangeCxt and perRangeCxt will be > reset in loop, > so this way don't work. You're right. I think we can do better: have brin_deform_tuple accept another argument of type BrinMemTuple *, which can be NULL. If NULL, the function calls brin_new_memtuple to allocate a new one (just like current code); if not NULL, that one is used. Have brin_new_memtuple also allocate the values/allnulls/hasnulls arrays, which are now part of the BrinMemTuple struct. Then, bringetbitmap calls brin_new_memtuple once before changing to perRangeCxt, then reuse the returned inside the loop (we probably need brin_memtuple_initialize to clear out the struct at the end of each loop iteration). Other callers of brin_deform_tuple just pass NULL to get the current behavior. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Also, please make sure your patch is registered in https://commitfest.postgresql.org/7/ so that we don't forget. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Update the patch_brin_optimze_mem according to your comment.
At 2015-10-16 10:13:35, "Alvaro Herrera" <alvherre@2ndquadrant.com> wrote: >zhangjinyu wrote: > >> >>>>However I wonder if it would be simpler to have the dtup structure have >> >>>>the pointers, so that you can pass it as NULL in the first call and then >> >>>>followup calls reuse the one allocated in the first call. >> Jinyu: the memory is allocated from perRangeCxt and perRangeCxt will be >> reset in loop, >> so this way don't work. > >You're right. I think we can do better: have brin_deform_tuple accept >another argument of type BrinMemTuple *, which can be NULL. If NULL, >the function calls brin_new_memtuple to allocate a new one (just like >current code); if not NULL, that one is used. Have brin_new_memtuple >also allocate the values/allnulls/hasnulls arrays, which are now part of >the BrinMemTuple struct. Then, bringetbitmap calls brin_new_memtuple >once before changing to perRangeCxt, then reuse the returned inside the >loop (we probably need brin_memtuple_initialize to clear out the struct >at the end of each loop iteration). Other callers of brin_deform_tuple >just pass NULL to get the current behavior. > >-- >Álvaro Herrera http://www.2ndQuadrant.com/ >PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > >-- >Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
Hi, On 10/16/2015 08:00 PM, Jinyu Zhang wrote: > > Update the patch_brin_optimze_mem according to your comment. I've reviewed the last version of the patch, and I do have a few comments. I haven't done any performance testing at this point, as the machine I'm using for that is chewing on something else at the moment. 1) bringetbitmap(PG_FUNCTION_ARGS) + /* + * Estimate the approximate size of btup and allocate memory for btup. + */ + btupInitSize = bdesc->bd_tupdesc->natts * 16; + btup = palloc(btupInitSize); What is the reasoning behind this estimate? I mean, this only accounts for the values, not the NULL bitmap or BrinTuple fields. Maybe it does not really matter much, but I'd like to see some brief description in the docs, please. This also interacts with our memory allocator in a rather annoying way, because palloc() always allocates memory in 2^k chunks, but sadly the estimate ignores that. So for natts=3 we get btupInitSize=48, but internally allocate 64B (and ignore the top 16B). 2) bringetbitmap(PG_FUNCTION_ARGS) if (tup) { if(size <= btupInitSize) memcpy(btup, tup, size); else btup = brin_copy_tuple(tup,size); LockBuffer(buf, BUFFER_LOCK_UNLOCK); } Is there any particular reason why not to update btupInitSize to the size value? I mean, if the estimate is too low, then we'll call brin_copy_tuple() for all BRIN tuples, no? In other words, I think the code should do this: if (tup) { if(size <= btupInitSize) memcpy(btup, tup, size); else { btup= brin_copy_tuple(tup, size); brinInitSize = size; } LockBuffer(buf, BUFFER_LOCK_UNLOCK); } Another option might be the usual "doubling" strategy, i.e. do something like this instead: if (tup) { while (btupInitSize < size) btupInitSize *= 2; btup = repalloc(btup, btupInitSize); memcpy(btup, tup, size); LockBuffer(buf, BUFFER_LOCK_UNLOCK); } Possibly with only doing the repalloc when actually needed. 3) brin_new_memtuple(...) The changes in this function seem wrong to me. Not only you've removed the code that initialized all the bv_* attributes, you're also using palloc() so there could easily be random data. This might be OK for our code as we call brin_memtuple_initialize() right after this function, and that seems to do the init, but AFAIK brin_new_memtuple() is a part of our public API at it's in a header file. And these changes surely break the contract documented in the comment above the function: * Create a new BrinMemTuple from scratch, and initialize it to an empty * state. So I think this is wrong and needs to be reverted. It's possible that we'll immediately reset the attributes, but that's negligible cost - we expect to do brin_memtuple_initialize() many times for the tuple, so it should not make any difference. Another thing is that the three arrays (bt_values, bt_allnulls and bt_hasnulls) may be allocated as a single chunk and then sliced. char * ptr = palloc0(space for all three arrays); bt_values = ptr; bt_allnulls = ptr + (size of bt_values) bt_bt_hasnulls= ptr + (size of bt_values + bt_allnulls) which is essentially what the original code did with bv_values. This reduces palloc() overhead and fragmentation. 4) brin_memtuple_initialize(...) It's possible that the new code needs to reset more fields, but I don't really see why it should mess with dtuple->bt_columns[i].bv_values for example. 5) brin_deform_tuple(..) The comment before the function probably should explain the purpose of the new parameter (that it can either be NULL or an existing tuple). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Dec 15, 2015 at 10:41 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > On 10/16/2015 08:00 PM, Jinyu Zhang wrote: >> Update the patch_brin_optimze_mem according to your comment. > > I've reviewed the last version of the patch, and I do have a few comments. I > haven't done any performance testing at this point, as the machine I'm using > for that is chewing on something else at the moment. I have switched the patch as returned with feedback for now based on the review of Tomas that is still waiting some input from the author. Jinyu, if you are still planning to work on that, feel free to move your patch to the next commit fest 2016-01. Thanks! -- Michael
Jinyu Zhang wrote: > > > Update the patch_brin_optimze_mem according to your comment. I have added this patch to the commitfest, which I've been intending to get in for a long time. I'll be submitting an updated patch soon. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera wrote: > Jinyu Zhang wrote: > > > Update the patch_brin_optimze_mem according to your comment. > > I have added this patch to the commitfest, which I've been intending to > get in for a long time. I'll be submitting an updated patch soon. Here it is. I addressed some of Tomas' comments, but not all (so this is mostly just a rebased on Jinyu's original submission). -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Wed, Mar 1, 2017 at 11:28 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Alvaro Herrera wrote: >> Jinyu Zhang wrote: >> >> > Update the patch_brin_optimze_mem according to your comment. >> >> I have added this patch to the commitfest, which I've been intending to >> get in for a long time. I'll be submitting an updated patch soon. > > Here it is. I addressed some of Tomas' comments, but not all (so this > is mostly just a rebased on Jinyu's original submission). I think we should regard this resubmission as untimely, unless there is an argument that it amounts to a bug fix. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3/4/17 12:43 AM, Robert Haas wrote: > On Wed, Mar 1, 2017 at 11:28 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: >> Alvaro Herrera wrote: >>> Jinyu Zhang wrote: >>> >>>> Update the patch_brin_optimze_mem according to your comment. >>> >>> I have added this patch to the commitfest, which I've been intending to >>> get in for a long time. I'll be submitting an updated patch soon. >> >> Here it is. I addressed some of Tomas' comments, but not all (so this >> is mostly just a rebased on Jinyu's original submission). > > I think we should regard this resubmission as untimely, unless there > is an argument that it amounts to a bug fix. I agree and I'm also confused about which author this is waiting on. Is it Jinyu or Álvaro? Thanks, -- -David david@pgmasters.net
David Steele wrote: > On 3/4/17 12:43 AM, Robert Haas wrote: > > I think we should regard this resubmission as untimely, unless there > > is an argument that it amounts to a bug fix. > > I agree and I'm also confused about which author this is waiting on. Is > it Jinyu or Álvaro? I don't think Jinyu is interested in the project anymore, so it's me that this is waiting on. I intend to push this at some point before feature freeze. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra wrote: > 1) bringetbitmap(PG_FUNCTION_ARGS) > > + /* > + * Estimate the approximate size of btup and allocate memory for btup. > + */ > + btupInitSize = bdesc->bd_tupdesc->natts * 16; > + btup = palloc(btupInitSize); > > What is the reasoning behind this estimate? I mean, this only accounts for > the values, not the NULL bitmap or BrinTuple fields. Maybe it does not > really matter much, but I'd like to see some brief description in the docs, > please. > > This also interacts with our memory allocator in a rather annoying way, > because palloc() always allocates memory in 2^k chunks, but sadly the > estimate ignores that. So for natts=3 we get btupInitSize=48, but internally > allocate 64B (and ignore the top 16B). I fixed this point (and the following one, which is essentially complaining about the same thing) by changing the API of brin_copy_tuple, similarly to the changes we made to brin_deform_tuple: pass the previously allocated memory (along with its size) as arguments, so that it can be repalloc'ed if necessary, or just re-used as-is. That seemed the most effective way to solve the points you raised. Your review was extremely useful, many thanks. In a extremely simpleminded test to measure the benefit, I did this: create table t (a int, b bigint, c bigint) ; insert into t select g, g, g from generate_series(1, 10000 * 1000) g; -- generates about 800 MB of data; fits in shared_buffers create index ti on t using brin (a, b, c) with (pages_per_range = 1); -- index contains about 84 revmap pages and then measured this query: explain analyze select * from t where b between 245782 and 1256782; with and without this patch. It goes from 40ms without patch to 25ms with, which seems a decent enough improvement. (Assertions were enabled, but I disabled memory clobbering). -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services