Обсуждение: cvs to git migration - keywords
In the previous discussions of how to migrate from cvs to git, we've all agreed we should kill the keyword expansion that we have now. I don't think, however, that we ever decided what to do with the *old* keywords. We did say we want to be able to reproduce backbranches/tags *identically* to what they are now, which indicates we need to leave the keywords in for those. That has other drawbacks, though. The way I see it, we have two ways to do it: 1) We can migrate the repository with the keywords, and then make one big commit just after (or before, that doesn't make a difference) removing them. In this case, backbranches and tags look exactly like they do now, but it also means if you do "git diff" between old versions, the keywords will show up there. 2) We can filter out that row during the conversion, so they look like they never existed.That means that if you check out 7.4.3 or whatever fro git, it will look like the keyword lines never existed. Since they're in comments it shouldn''t affect functionality, but it does mean that we are *not* keeping history unmodified. The advantage is that "git diff" on and between old revision won't include the keyword changes, of course. #1 is most likely the easiest one. It really comes down to which is most important - being able to get "easy to use diffs" between old revisions, or keeping history intact. Obviously, for all *new* commits, either one of these two methods will make the diffs readable. And if they are new commits, well, they are by definition not history that needs to be kept :-) Thoughts? -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On Wed, Jul 7, 2010 at 10:01 AM, Magnus Hagander <magnus@hagander.net> wrote: > 1) We can migrate the repository with the keywords, and then make one big > commit just after (or before, that doesn't make a difference) removing > them. In this case, backbranches and tags look exactly like they do > now, but it also means if you do "git diff" between old versions, the > keywords will show up there. > > 2) We can filter out that row during the conversion, so they look like > they never existed.That means that if you check out 7.4.3 or whatever > fro git, it will look like the keyword lines never existed. Since > they're in comments it shouldn''t affect functionality, but it does mean > that we are *not* keeping history unmodified. The advantage is that > "git diff" on and between old revision won't include the keyword > changes, of course. > > #1 is most likely the easiest one. +1 for #1. Changing history and the resulting possibility of becoming one's own grandfather always makes me nervous. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com The Enterprise Postgres Company
Magnus Hagander wrote: > In the previous discussions of how to migrate from cvs to git, we've > all agreed we should kill the keyword expansion that we have now. I > don't think, however, that we ever decided what to do with the *old* > keywords. We did say we want to be able to reproduce backbranches/tags > *identically* to what they are now, which indicates we need to leave > the keywords in for those. That has other drawbacks, though. > > The way I see it, we have two ways to do it: > > > 1) We can migrate the repository with the keywords, and then make one big > commit just after (or before, that doesn't make a difference) removing > them. In this case, backbranches and tags look exactly like they do > now, but it also means if you do "git diff" between old versions, the > keywords will show up there. > I don't think this would be a terrible tragedy. Import, remove keyword lines on live branches, commit. That's what I'd do. cheers andrew
Dave Page <dpage@pgadmin.org> writes: > On Wed, Jul 7, 2010 at 10:01 AM, Magnus Hagander <magnus@hagander.net> wrote: >> 1) We can migrate the repository with the keywords, and then make one big >> commit just after (or before, that doesn't make a difference) removing >> them. In this case, backbranches and tags look exactly like they do >> now, but it also means if you do "git diff" between old versions, the >> keywords will show up there. > +1 for #1. Changing history and the resulting possibility of becoming > one's own grandfather always makes me nervous. Yeah. One concrete problem with removing the $PostgreSQL$ lines is it will affect line numbering everywhere. Yeah, it's only off-by-one, but there could still be confusion. One point that isn't completely clear from Magnus' description is whether we should remove the $PostgreSQL$ lines from the HEAD branch only, or from the still-active back branches as well. I vote for the latter --- that is, if you pull a historical version of some file from the archives, you should see the appropriate $PostgreSQL$ line, but we won't have them in the source files for any future minor release. The reason for this is that otherwise there will be files floating around that claim to be CVS version x.y.z, but actually are different from that, because of back-patching activity after the git transition. That seems like a recipe for huge confusion in itself. regards, tom lane
On Wed, Jul 7, 2010 at 3:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > One point that isn't completely clear from Magnus' description is > whether we should remove the $PostgreSQL$ lines from the HEAD branch > only, or from the still-active back branches as well. I vote for the > latter --- that is, if you pull a historical version of some file > from the archives, you should see the appropriate $PostgreSQL$ line, > but we won't have them in the source files for any future minor > release. The reason for this is that otherwise there will be files > floating around that claim to be CVS version x.y.z, but actually are > different from that, because of back-patching activity after the git > transition. That seems like a recipe for huge confusion in itself. Agreed. They should be removed from the active back branches. -- Dave Page EnterpriseDB UK: http://www.enterprisedb.com The Enterprise Postgres Company
On Wed, Jul 7, 2010 at 5:01 AM, Magnus Hagander <magnus@hagander.net> wrote: > In the previous discussions of how to migrate from cvs to git, we've > all agreed we should kill the keyword expansion that we have now. I > don't think, however, that we ever decided what to do with the *old* > keywords. We did say we want to be able to reproduce backbranches/tags > *identically* to what they are now, which indicates we need to leave > the keywords in for those. That has other drawbacks, though. > > The way I see it, we have two ways to do it: > > > 1) We can migrate the repository with the keywords, and then make one big > commit just after (or before, that doesn't make a difference) removing > them. In this case, backbranches and tags look exactly like they do > now, but it also means if you do "git diff" between old versions, the > keywords will show up there. > > 2) We can filter out that row during the conversion, so they look like > they never existed.That means that if you check out 7.4.3 or whatever > fro git, it will look like the keyword lines never existed. Since > they're in comments it shouldn''t affect functionality, but it does mean > that we are *not* keeping history unmodified. The advantage is that > "git diff" on and between old revision won't include the keyword > changes, of course. > > #1 is most likely the easiest one. > > It really comes down to which is most important - being able to get > "easy to use diffs" between old revisions, or keeping history intact. > > Obviously, for all *new* commits, either one of these two methods will > make the diffs readable. And if they are new commits, well, they are > by definition not history that needs to be kept :-) > > Thoughts? So what happens right now using the existing git repository is that the $PostgeSQL$ tags are there, but they're unexpanded. They just say $PostgreSQL$ rather than $PostgreSQL: tgl blah blah$. I'm all in favor of removing them, but it would be nice if we could avoid cluttering the old changesets with useless changes to the keyword expansions. Maybe I'm smoking crack, though... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
* Dave Page <dpage@pgadmin.org> [100707 05:05]: > > +1 for #1. Changing history and the resulting possibility of becoming > one's own grandfather always makes me nervous. But, since we're already using CVS, our grandfather is already our granddaughter... I'll just point out that if you "expand" the CVS keywords in the conversion, then your git will differe from every CVS branch/date/tag checkout I do... Remember... Keywords don't *need* to be expanded... And yes, Magnus, I found that old cvs->pg stuff, I'm trying to get that info to you today... a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
Robert Haas <robertmhaas@gmail.com> writes: > So what happens right now using the existing git repository is that > the $PostgeSQL$ tags are there, but they're unexpanded. They just say > $PostgreSQL$ rather than $PostgreSQL: tgl blah blah$. Really? All of them? Seems like that would have taken some intentional processing somewhere. If we could make the conversion work like that (rather than removing the whole line) it would negate my line-number-change argument, which might mean that files pulled from the repository would be "close enough" to their actual historical form that no one would mind. It's still a judgment call though. On balance I think I'd rather adopt the simple rule that historical file states in the git repository should match what you would have gotten from the cvs repository. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> 1) We can migrate the repository with the keywords, and then make one big >>> commit just after (or before, that doesn't make a difference) removing >>> them. In this case, backbranches and tags look exactly like they do >>> now, but it also means if you do "git diff" between old versions, the >>> keywords will show up there. >> +1 for #1. Changing history and the resulting possibility of becoming >> one's own grandfather always makes me nervous. > Yeah. One concrete problem with removing the $PostgreSQL$ lines is it > will affect line numbering everywhere. Yeah, it's only off-by-one, but > there could still be confusion. > [...] If not the whole line was removed, but only the "$PostgreSQL$" part, the numbering should stay the same. I guess it would otherwise be challenging to automatically not only delete the "$PostgreSQL$" line, but also leading and/or trailing empty (comment) lines, and not mess up. Tim
On Wed, Jul 7, 2010 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> So what happens right now using the existing git repository is that >> the $PostgeSQL$ tags are there, but they're unexpanded. They just say >> $PostgreSQL$ rather than $PostgreSQL: tgl blah blah$. > > Really? All of them? Seems like that would have taken some intentional > processing somewhere. I'm sure it did... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas wrote: > So what happens right now using the existing git repository is that > the $PostgeSQL$ tags are there, but they're unexpanded. They just say > $PostgreSQL$ rather than $PostgreSQL: tgl blah blah$. I'm all in > favor of removing them, but it would be nice if we could avoid > cluttering the old changesets with useless changes to the keyword > expansions. > > > Personally I favor leaving the expanded keywords in what we import, so that there's an exact mapping between what's in the final CVS repo and what's in the inital git repo, and then removing them entirely. I don't see that having old keyword expansions in the historical changesets is a bid deal. Nobody is going to base patches on them (I hope). cheers andrew
On Wed, Jul 7, 2010 at 20:31, Andrew Dunstan <andrew@dunslane.net> wrote: > > > Robert Haas wrote: >> >> So what happens right now using the existing git repository is that >> the $PostgeSQL$ tags are there, but they're unexpanded. They just say >> $PostgreSQL$ rather than $PostgreSQL: tgl blah blah$. I'm all in >> favor of removing them, but it would be nice if we could avoid >> cluttering the old changesets with useless changes to the keyword >> expansions. >> >> >> > > Personally I favor leaving the expanded keywords in what we import, so that > there's an exact mapping between what's in the final CVS repo and what's in > the inital git repo, and then removing them entirely. I don't see that > having old keyword expansions in the historical changesets is a bid deal. > Nobody is going to base patches on them (I hope). This is my general feeling as well. If there are outstanding patches they will need to be merged, but actually getting a conflict there would require that someone is working off their own cvs repository which expands the same tags - which would cause the conflicts today anyway. other than that, just rebasing across a HEAD that no longer has the keywords should be a very straightforward operation. Given that we generally *backpatch* fixes (rather than make them on backbranches and merge back into head), it shouldn't affect that at all. -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On Wed, Jul 7, 2010 at 16:40, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Dave Page <dpage@pgadmin.org> writes: >> On Wed, Jul 7, 2010 at 10:01 AM, Magnus Hagander <magnus@hagander.net> wrote: >>> 1) We can migrate the repository with the keywords, and then make one big >>> commit just after (or before, that doesn't make a difference) removing >>> them. In this case, backbranches and tags look exactly like they do >>> now, but it also means if you do "git diff" between old versions, the >>> keywords will show up there. > >> +1 for #1. Changing history and the resulting possibility of becoming >> one's own grandfather always makes me nervous. > > Yeah. One concrete problem with removing the $PostgreSQL$ lines is it > will affect line numbering everywhere. Yeah, it's only off-by-one, but > there could still be confusion. Uh, wouldn't that simply be dealt with by replacing them with an empty line instead of removing it? > One point that isn't completely clear from Magnus' description is > whether we should remove the $PostgreSQL$ lines from the HEAD branch > only, or from the still-active back branches as well. I vote for the > latter --- that is, if you pull a historical version of some file > from the archives, you should see the appropriate $PostgreSQL$ line, > but we won't have them in the source files for any future minor > release. The reason for this is that otherwise there will be files > floating around that claim to be CVS version x.y.z, but actually are > different from that, because of back-patching activity after the git > transition. That seems like a recipe for huge confusion in itself. Yeah, clearly I didn't say that :-) My intention was for them to be removed from head and all active back-branches at the time (e.g. we don't bother with 6.x, just the platforms that are currently being used). -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
Hi, On 07/07/2010 08:31 PM, Andrew Dunstan wrote: > Personally I favor leaving the expanded keywords in what we import, so > that there's an exact mapping between what's in the final CVS repo and > what's in the inital git repo, and then removing them entirely. I don't > see that having old keyword expansions in the historical changesets is a > bid deal. Nobody is going to base patches on them (I hope). Sorry for being somewhat late on this discussion. Another reason keeping the expanded keywords in historic revisions that hasn't been raised so far is, that they can easily be un-expanded with a script. But it's a lot harder to do the expansion, once you are on git, if you once happen to need that info. Of course, I'd also remove the keywords from every (active?) branch as a first commit after the import. I'd even favor removing those lines completely, just as sort of a cleanup commit. And no, that shouldn't pose any problem with outstanding patches, except you are fiddling with the tag itself. In which case you deserve to get a conflict. ;-) Regards Markus Wanner
On 7/7/10, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > So what happens right now using the existing git repository is that > > the $PostgeSQL$ tags are there, but they're unexpanded. They just say > > $PostgreSQL$ rather than $PostgreSQL: tgl blah blah$. > > > Really? All of them? Seems like that would have taken some intentional > processing somewhere. AFAIK that's what CVS actually keeps in repo, it expands keywords when writing files out. > If we could make the conversion work like that (rather than removing the > whole line) it would negate my line-number-change argument, which might > mean that files pulled from the repository would be "close enough" to > their actual historical form that no one would mind. It's still a > judgment call though. On balance I think I'd rather adopt the simple > rule that historical file states in the git repository should match what > you would have gotten from the cvs repository. I would prefer that the diffs should match what CVS gives / what got committed. Sanity-checking by comparing CVS checkout with GIT checkout with unexpanded keywords can be scripted easily enough, and is one-time affair. But humans want to review old diffs quite more frequently... +1 keeping keywords, but unexpanded. -- marko
Marko Kreen wrote: > On 7/7/10, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Robert Haas <robertmhaas@gmail.com> writes: >> > So what happens right now using the existing git repository is that >> > the $PostgeSQL$ tags are there, but they're unexpanded. They just say >> > $PostgreSQL$ rather than $PostgreSQL: tgl blah blah$. >> >> >> Really? All of them? Seems like that would have taken some intentional >> processing somewhere. >> > > AFAIK that's what CVS actually keeps in repo, it expands keywords > when writing files out. > > No. It stores the expanded keyword. Just look in the ,v files in a CVS mirror and you'll see them. cheers andrew
On 7/15/10, Andrew Dunstan <andrew@dunslane.net> wrote: > Marko Kreen wrote: > > On 7/7/10, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > Robert Haas <robertmhaas@gmail.com> writes: > > > > So what happens right now using the existing git repository is that > > > > the $PostgeSQL$ tags are there, but they're unexpanded. They just > say > > > > $PostgreSQL$ rather than $PostgreSQL: tgl blah blah$. > > > > > > > > > Really? All of them? Seems like that would have taken some intentional > > > processing somewhere. > > > > > > > > > > AFAIK that's what CVS actually keeps in repo, it expands keywords > > when writing files out. > > > > > > > > No. It stores the expanded keyword. Just look in the ,v files in a CVS > mirror and you'll see them. Eh. I stand corrected - what it actually does is even more bizarre - it stores whatever is on the disk, but then expands on re-write. So: - r1.1 contains $Id$ in the repo. - r1.2 contains $Id: 1.1$ in the repo. and so on... -- marko
* Marko Kreen <markokr@gmail.com> [100715 13:49]: > Eh. I stand corrected - what it actually does is even more > bizarre - it stores whatever is on the disk, but then > expands on re-write. So: > > - r1.1 contains $Id$ in the repo. > - r1.2 contains $Id: 1.1$ in the repo. > > and so on... It's actually slightly *worse* than that... The repository r$N contains what was in the commiters $N-1 *checked out* copy when he commits. So what's in the ,v file has *nothing* to do with reality, except by chance it's $n-1 because that's what was last checkout/updated most of the time.. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
* Aidan Van Dyk <aidan@highrise.ca> [100715 13:56]: > * Marko Kreen <markokr@gmail.com> [100715 13:49]: > > > Eh. I stand corrected - what it actually does is even more > > bizarre - it stores whatever is on the disk, but then > > expands on re-write. So: > > > > - r1.1 contains $Id$ in the repo. > > - r1.2 contains $Id: 1.1$ in the repo. > > > > and so on... > > It's actually slightly *worse* than that... The repository r$N contains > what was in the commiters $N-1 *checked out* copy when he commits. So > what's in the ,v file has *nothing* to do with reality, except by chance > it's $n-1 because that's what was last checkout/updated most of the > time.. And as a demo of what you can see in a project where some of my machines have -kk in .cvsrc, and others don't:[aidan@d1 faxd]$ grep '\$Id' faxQueueApp.c++,v |less/* $Id$ *//* $Id: faxQueueApp.c++,v1.115 2007/09/17 19:34:41 aidan Exp $ *//* $Id$ *//* $Id: faxQueueApp.c++,v 1.112 2007/07/23 21:04:09aidan Exp $ *//* $Id$ *//* $Id: faxQueueApp.c++,v 1.113.2.2 2007/09/07 18:39:26 aidan Exp $/* $Id$*//* $Id$ *//* $Id: faxQueueApp.c++,v 1.111 2007/06/05 18:51:16 aidan Exp $ *//* $Id$ */ a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.