Обсуждение: Copypasta in the PostgreSQL source

Поиск

Список

Период

Сортировка

Copypasta in the PostgreSQL source

От

David Fetter

Дата:

17 декабря 2018 г., 20:35:55

Folks,

Please find attached a run of a tool that looks for duplicated tokens.
I've removed some things that seem like false positives, basically all
from the stemmer part of the source, but there's still a lot.

- Is it worth trying to track these down and factor them out?
- Would it make sense to make some kind of git commit trigger that at
  least warns when a new one has been introduced?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Вложения

copypasta.txt.gz

Re: Copypasta in the PostgreSQL source

От

Tom Lane

Дата:

18 декабря 2018 г., 01:31:22

David Fetter <david@fetter.org> writes:
> Please find attached a run of a tool that looks for duplicated tokens.
> I've removed some things that seem like false positives, basically all
> from the stemmer part of the source, but there's still a lot.

I thought you were talking about problems like "that that" typos,
but on looking at the file, what this is actually complaining
about is any duplicated code segments anywhere.  I do not find
this helpful.  Refactoring to the point that dozen-line code
stanzas never appear more than once would be incredibly invasive,
likely very bad for performance, and I don't think it'd improve
readability either.

> - Would it make sense to make some kind of git commit trigger that at
>   least warns when a new one has been introduced?

Commit triggers are NOT the place for heuristics about code quality,
even if they're well-considered heuristics.

            regards, tom lane

Re: Copypasta in the PostgreSQL source

От

Alvaro Herrera

Дата:

18 декабря 2018 г., 01:35:23

On 2018-Dec-17, Tom Lane wrote:

> David Fetter <david@fetter.org> writes:
> > Please find attached a run of a tool that looks for duplicated tokens.
> > I've removed some things that seem like false positives, basically all
> > from the stemmer part of the source, but there's still a lot.
> 
> I thought you were talking about problems like "that that" typos,
> but on looking at the file, what this is actually complaining
> about is any duplicated code segments anywhere.  I do not find
> this helpful.  Refactoring to the point that dozen-line code
> stanzas never appear more than once would be incredibly invasive,
> likely very bad for performance, and I don't think it'd improve
> readability either.

Agreed.  Skimming the report, we get some very silly duplications, such
as a function definition duplicating its prototype.

I agree that this is mostly unhelpful noise and we shouldn't spend too
much time on it.

On the other hand, I'm not clear on why do we need four copies of
number_of_ones.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Copypasta in the PostgreSQL source

От

Tom Lane

Дата:

18 декабря 2018 г., 01:46:29

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On the other hand, I'm not clear on why do we need four copies of
> number_of_ones.

Yeah, it might be time to move something like that into a common
location.  Not sure where the threshold of pain is, though.

            regards, tom lane

Re: Copypasta in the PostgreSQL source

От

David Fetter

Дата:

18 декабря 2018 г., 02:36:55

On Mon, Dec 17, 2018 at 05:31:22PM -0500, Tom Lane wrote:
> David Fetter <david@fetter.org> writes:
> > Please find attached a run of a tool that looks for duplicated
> > tokens.  I've removed some things that seem like false positives,
> > basically all from the stemmer part of the source, but there's
> > still a lot.
> 
> I thought you were talking about problems like "that that" typos,
> but on looking at the file, what this is actually complaining about
> is any duplicated code segments anywhere.  I do not find this
> helpful.  Refactoring to the point that dozen-line code stanzas
> never appear more than once would be incredibly invasive, likely
> very bad for performance, and I don't think it'd improve readability
> either.

Is there a threshold, possibly much larger than a dozen lines, where
such refactoring would actually make sense?

> > - Would it make sense to make some kind of git commit trigger that
> > at least warns when a new one has been introduced?
> 
> Commit triggers are NOT the place for heuristics about code quality,
> even if they're well-considered heuristics.

Where's a better place for such heuristics?  I'd like to think that
there are opportunities for more automation than we currently have, on
that score.  Maybe a `make` target?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Copypasta in the PostgreSQL source

От

Andres Freund

Дата:

18 декабря 2018 г., 02:41:23

Hi,

On 2018-12-18 00:36:55 +0100, David Fetter wrote:
> On Mon, Dec 17, 2018 at 05:31:22PM -0500, Tom Lane wrote:
> > David Fetter <david@fetter.org> writes:
> > > Please find attached a run of a tool that looks for duplicated
> > > tokens.  I've removed some things that seem like false positives,
> > > basically all from the stemmer part of the source, but there's
> > > still a lot.
> > 
> > I thought you were talking about problems like "that that" typos,
> > but on looking at the file, what this is actually complaining about
> > is any duplicated code segments anywhere.  I do not find this
> > helpful.  Refactoring to the point that dozen-line code stanzas
> > never appear more than once would be incredibly invasive, likely
> > very bad for performance, and I don't think it'd improve readability
> > either.
> 
> Is there a threshold, possibly much larger than a dozen lines, where
> such refactoring would actually make sense?

Sure, sometimes. But it seems unlikely that a report based on the
technology at hand here would be useful. If none of the code has changed
in recent times, it's not that usually worthy of a lot of attention to
adapt, unless we're talking much larger pieces of code. And a fair bit
of what's pointed out in that report is well known (e.g. that ecpg
duplicates a lot of code in a horrible manner).  I think to be useful
it'd need to actually point out very similar code, that's starting to
diverge further - but that's obviously a much harder thing to achieve.

> > > - Would it make sense to make some kind of git commit trigger that
> > > at least warns when a new one has been introduced?
> > 
> > Commit triggers are NOT the place for heuristics about code quality,
> > even if they're well-considered heuristics.
> 
> Where's a better place for such heuristics?  I'd like to think that
> there are opportunities for more automation than we currently have, on
> that score.  Maybe a `make` target?

If we find useful targets, we can integrate them that way. I don't
consider this to be a worthy case though.

Greetings,

Andres Freund

Re: Copypasta in the PostgreSQL source

От

Thomas Munro

Дата:

18 декабря 2018 г., 03:13:22

On Tue, Dec 18, 2018 at 9:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > On the other hand, I'm not clear on why do we need four copies of
> > number_of_ones.
>
> Yeah, it might be time to move something like that into a common
> location.  Not sure where the threshold of pain is, though.

Yeah.  I think we need bitutils.c as discussed here:

https://www.postgresql.org/message-id/flat/CAEepm%3D3k%2B%2BYtf2LNQCvpP6m1%3DgY9zZHP_cfnn47%3DWTsoCrLCvA%40mail.gmail.com

I sort of foundered on the the difficulty of using popcnt and
bitscan/log2 instructions/builtins without putting a runtime test and
a honking great function pointer in front of them.  +1 for at least
deduplicating all these implementations as discussed in that thread.

-- 
Thomas Munro
http://www.enterprisedb.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Copypasta in the PostgreSQL source

Copypasta in the PostgreSQL source

Вложения

Re: Copypasta in the PostgreSQL source

Re: Copypasta in the PostgreSQL source

Re: Copypasta in the PostgreSQL source

Re: Copypasta in the PostgreSQL source

Re: Copypasta in the PostgreSQL source

Re: Copypasta in the PostgreSQL source