Обсуждение: BUG #10976: Two memory leaks in regcomp cleanup
The following bug has been logged on the website: Bug reference: 10976 Logged by: Arthur O'Dwyer Email address: arthur.j.odwyer@gmail.com PostgreSQL version: 9.3.0 Operating system: Ubuntu Linux Description: When MALLOC fails, pg_regcomp leaks memory in at least two places: (A) In freev(), the line freesubre(info, v, v->tree); should be freesubre(info, NULL, v->tree); as otherwise the "freed" subres will end up on v->treefree, which is leaked by the cleanst() two lines later. That is, given the precondition that there are things in v->tree that aren't in v->treechain. This precondition is invariably true if we are being called because nfatree() has run out of memory here: markst(v->tree); cleanst(info, v); /* clears v->treechain without clearing v->tree */ [...some comments...] re->re_info |= nfatree(info, v, v->tree, debug); CNOERR(); /* calls freev() */ (B) newlacon() leaks memory if REALLOC returns NULL on this line: v->lacons = (struct subre *) REALLOC(v->lacons, (v->nlacons + 1) * sizeof(struct subre)); The fix is to use the same idiom already used everywhere else REALLOC is called in this module.
arthur.j.odwyer@gmail.com writes: > When MALLOC fails, pg_regcomp leaks memory in at least two places: > (A) In freev(), the line > freesubre(info, v, v->tree); > should be > freesubre(info, NULL, v->tree); > as otherwise the "freed" subres will end up on v->treefree, which is leaked > by the cleanst() two lines later. Hmm ... what version of the code are you looking at exactly? There's no "info" argument here in Postgres. > That is, given the precondition that there are things in v->tree that aren't > in v->treechain. The problem with this proposal is that if there are subres in v->tree that *are* in the treechain, we'll possibly try to free them twice (if they're not marked INUSE), and definitely will be accessing already-freed memory when cleanst looks at them. It looks to me like there are two different operating regimes in this code: one where all the subres are still in the treechain, and one where we've marked as INUSE all the ones that are reachable from v->tree and garbage-collected the rest. The markst/cleanst steps in pg_regcomp are where the conversion is made. freev needs to work correctly either before or after that. In short, I think you're right that there's a bug here, but this is not a good fix for it. I'm not sure freev has enough info to do the right thing; we may need to rethink the data structure invariants, so that there's not two different ways to clean up. > (B) newlacon() leaks memory if REALLOC returns NULL on this line: > v->lacons = (struct subre *) REALLOC(v->lacons, > (v->nlacons + 1) * sizeof(struct subre)); > The fix is to use the same idiom already used everywhere else REALLOC is > called in this module. A temp variable, you mean. Yeah, that's a bug. regards, tom lane
On Thu, Jul 17, 2014 at 10:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > arthur.j.odwyer@gmail.com writes: >> When MALLOC fails, pg_regcomp leaks memory in at least two places: > >> (A) In freev(), the line >> freesubre(info, v, v->tree); >> should be >> freesubre(info, NULL, v->tree); >> as otherwise the "freed" subres will end up on v->treefree, which is leaked >> by the cleanst() two lines later. > > Hmm ... what version of the code are you looking at exactly? There's no > "info" argument here in Postgres. Ah, yes, we piped an "info" argument through all the places that call MALLOC/FREE, so that compiling a regex wouldn't have to touch global state. You can safely pretend I didn't write "info," there. >> That is, given the precondition that there are things in v->tree that aren't >> in v->treechain. > > The problem with this proposal is that if there are subres in v->tree > that *are* in the treechain, we'll possibly try to free them twice > (if they're not marked INUSE), and definitely will be accessing > already-freed memory when cleanst looks at them. Hmm. I think you're right --- I *think* the subres in v->tree are INUSE by definition, so double-free isn't an issue, but cleanst will definitely be looking at them after they've been freed, which is still a bug. What if you just swap the order that freev() does cleanst() and freesubre() so that the cleanst() happens first? > It looks to me like there are two different operating regimes in this > code: one where all the subres are still in the treechain, and one where > we've marked as INUSE all the ones that are reachable from v->tree and > garbage-collected the rest. The markst/cleanst steps in pg_regcomp are > where the conversion is made. freev needs to work correctly either > before or after that. > > In short, I think you're right that there's a bug here, but this is > not a good fix for it. I'm not sure freev has enough info to do the > right thing; we may need to rethink the data structure invariants, > so that there's not two different ways to clean up. Please keep me cc'ed and/or send me an email when there's an "official" patch for this leak. Thanks, Arthur
"Arthur O'Dwyer" <arthur.j.odwyer@gmail.com> writes: > On Thu, Jul 17, 2014 at 10:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The problem with this proposal is that if there are subres in v->tree >> that *are* in the treechain, we'll possibly try to free them twice >> (if they're not marked INUSE), and definitely will be accessing >> already-freed memory when cleanst looks at them. > Hmm. I think you're right --- I *think* the subres in v->tree are > INUSE by definition, so double-free isn't an issue, but cleanst will > definitely be looking at them after they've been freed, which is still > a bug. What if you just swap the order that freev() does cleanst() and > freesubre() so that the cleanst() happens first? No, the INUSE marking doesn't happen till pg_regcomp runs markst(), so that would break cleanup of failures occurring before that. There's somewhat of a narrow window for this case, since v->tree doesn't get set until parse() returns, but failures there certainly are possible. After some reflection I decided that what we need is to teach freesubre to stick things back into the treefree list if and only if treechain is non-NULL. This guarantees we can't corrupt the treechain list with a premature free of a subre, and it preserves the existing not-broken logic for cleaning up after an error occuring before we reach markst(). So it ends up being one line of code change, though I added a bunch of commentary as well: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1567e659a877d35ab4b85dafff41b2845d50990f regards, tom lane
Thanks! Your patch successfully passes my tests (the ones that found the original freev() issue =E2=80=94 the REALLOC bug was found by reading t= he code). =E2=80=93Arthur On Fri, Jul 18, 2014 at 10:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Arthur O'Dwyer" <arthur.j.odwyer@gmail.com> writes: >> On Thu, Jul 17, 2014 at 10:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> The problem with this proposal is that if there are subres in v->tree >>> that *are* in the treechain, we'll possibly try to free them twice >>> (if they're not marked INUSE), and definitely will be accessing >>> already-freed memory when cleanst looks at them. > >> Hmm. I think you're right --- I *think* the subres in v->tree are >> INUSE by definition, so double-free isn't an issue, but cleanst will >> definitely be looking at them after they've been freed, which is still >> a bug. What if you just swap the order that freev() does cleanst() and >> freesubre() so that the cleanst() happens first? > > No, the INUSE marking doesn't happen till pg_regcomp runs markst(), so > that would break cleanup of failures occurring before that. There's > somewhat of a narrow window for this case, since v->tree doesn't get > set until parse() returns, but failures there certainly are possible. > > After some reflection I decided that what we need is to teach freesubre > to stick things back into the treefree list if and only if treechain is > non-NULL. This guarantees we can't corrupt the treechain list with a > premature free of a subre, and it preserves the existing not-broken > logic for cleaning up after an error occuring before we reach markst(). > So it ends up being one line of code change, though I added a bunch of > commentary as well: > > http://git.postgresql.org/gitweb/?p=3Dpostgresql.git;a=3Dcommitdiff;h=3D1= 567e659a877d35ab4b85dafff41b2845d50990f > > regards, tom lane