Skip to main content

Home/ dvcs-vs-vcs/ Group items tagged access-rights

Rss Feed Group items tagged

Daniel Jomphe

[Kde-scm-interest] Distributed model VS accountability - 0 views

  • Distributed development needs a network of trust. For small projects, things are very simple: There's (say) one person, and that person reviews all incoming changes, and publishes (pushes) the result to a known location. Users need only trust that single point of distribution. If someone else publishes a different version of the same project, it's the user's problem to trust that someone else. For a larger project things become more involved. It starts by a single point of distribution, which only one or few people have push access to. Those people must be trustworthy. Let's call them BD. (Of course, nobody is forced to trust them.) But since it cannot be expected that they can review all incoming changes, they need other people, "lieutenants", whom they in turn trust. That is, BD will pull changes from the lieutenants without detailed review and push them out to the single point of distribution. The lieutenants themselves are possibly again backed by "major contributors" whom they trust. (At this point the fields of responsibility are likely small enough that incoming changes can be reviews in detail by the lieutenant or major contributors.)
  • At the infrastructure level, you need means that the chains of trust cannot be broken. At the lowest level, git provides a chain of trust by the SHA1 signatures. At a higher level, trust must be implemented such that each person can change (basically) only his own repository. For this reason, it is good that in your earlier proposal access rights to certain repositories were very limited. BD and lieutenants will only pull from such restricted repositories. Major contributors will review all incoming changes and push to their own trustable repositories and ask the lieutenants to propagate the changes upstream. In this context, "incoming" will mean mostly changes pushed to the publically accessible repository tree by random contributors, and the users' repositories. To come back to your example: I will trust *you*. And will blindly pull from *your* repository, no matter whose name the commit carries.
Daniel Jomphe

'Re: clarification on git, central repositories and commit access lists' - MARC - 0 views

  • I certainly agree that almost any project will want a "central" repository in the sense that you want to have one canonical default source base that people think of as the "primary" source base. But that should not be a *technical* distinction, it should be a *social* one, if you see what I mean.
  • Release management: you often want the central "development" repository to be totally separate from the release management tree. Yes, you approximate that with branches
  • Yes, you can branch in a truly centralized model too, but it's generally a "big issue"
  • ...20 more annotations...
  • the branches are globally visible
  • The other problem is the "permission from maintainers" thing: I have an ego the size of a small planet, but I'm not _always_ right, and in that kind of situation it would be a total disaster if everybody had to ask for my permission to create a branch to do some re-architecting work.
  • the "globally visible" part means that if you're not sure this makes sense, you're much less likely to begin a branch - even if it's cheap, it's still something that everybody else will see, and as such you can't really do "throwaway" development that way.
  • So you absolutely need *private* branches, that can becom "central" for the people involved in some re-architecting, even if they never ever show up in the "truly central" repository.
  • and you need permission from the maintainers of the centralized model too.
  • So it's not strictly true that there is a single "central" one, even if you ignore the stable tree (or the vendor trees). There are subsystems that end up working with each other even before they hit the central tree - but you are right that most people don't even see it. Again, it's the difference between a technical limitation, and a social rule: people use multiple trees for development, but because it's easier for everybody to have one default tree, that's obviously what most people who aren't actively developing do.
  • I'm literally talking about things like some people wanting to use the "stable" tree, and not my tree at all, or the vendor trees. And they are obviously *connected*, but it doesn't have to be a totally central notion at all.
  • There are lots of kernel subprojects that are used by developers - exactly so that if you report a bug against a particular driver or subsystem, the developer can tell you to test an experimental branch that may fix it.
  • In the KDE group, for example, there really is no reason why the people who work on one particular application should ever use the same "central" repository as the people who work on another app do. You'd have a *separate* group (that probably also maintains some central part like the kdelibs stuff) that might be in charge of *integrating* it all, and that integration/core group might be seen to outsiders as the "one central repository", but to the actual application developers, that may actually be pretty secondary, and as with the kernel, they may maintain their own trees at places like ftp.linux-mips.org - and then just ask the core people to pull from them when they are reasonably ready. See? There's really no more "one central place" any more. To the casual observer, it *looks* like one central place (since casual users would always go for the core/integration tree), but the developers themselves would know better. If you wanted to develop some bleeding edge koffice stuff, you'd use *that* tree - and it might not have been merged into the core tree yet, because it might be really buggy at the moment!
  • This is one of the big advantages of true distribution: you can have that kind of "central" tree that does integration, but it doesn't actually have to integrate the development "as it happens". In fact, it really really shouldn't. If you look at my merges, for example, when I merge big changes from somebody else who actually maintains them in a git tree, they will have often been done much earlier, and be a series of changes, and I only merge when they are "ready". So the core/central people should generally not necessarily even do any real development at all: the tree that people see as the "one tree" is really mostly just an integration thing. When the koffice/kdelibs/whatever people decide that they are ready and stable, they can tell the integration group to pull their changes. There's obviously going to be overlap between developers/integrators (hopefully a *lot* of overlap), but it doesn't have to be that way (for example, I personally do almost *only* integration, and very little serious development).
  • Yes, you want a central build-bot and commit mailing list. But you don't necessarily want just *one* central build-bot and commit mailing list. There's absolutely no reason why everybody would be interested in some random part of the tree (say, kwin), and there's no reason why the people who really only do kwin stuff should have to listen to everybody elses work. They may well want to have their *own* build-bot and commit mailing list! So making one central one is certainly not a mistake, but making *only* a central one is. Why shouldn't the groups that do specialized work have specialized test-farms? The kernel does. The NFS stuff, for example, tends to have its own test infrastructure.
  • So we do commit mailing lists from kernel.org, but (a) that doesn't mean that everything else should be done from that central site and (b) it also doesn't mean that subprojects shouldn't do their *own* commit mailing lists. In fact, there's a "gitstat" project (which tracks the kernel, but it's designed to be available for *any* git project), and you can see an example of it in action at http://tree.celinuxforum.org/gitstat
  • So centralized is not at all always good. Quite the reverse: having distributed services allows *specialized* services, and it also allows the above kind of experimental stuff that does some (fairly simple, but maybe it will expand) data-mining on the project!
  • So I do disagree, but only in the sense that there's a big difference between "a central place that people can go to" and "ONLY ONE central place". See? Distribution doesn't mean that you cannot have central places - but it means that you can have *different* central places for different things. You'd generally have one central place for "default" things (kde.org), but other central places for more specific or specialized services! And whether it's specialized by project, or by things like the above "special statistics" kind of thing, or by usage, is another matter! For example, maybe you have kde.org as the "default central place", but then some subgroup that specializes in mobility and small-memory-footprint issues might use something like kde.mobile.org as _their_ central site, and then developers would occasionally merge stuff (hopefully both ways!)
  • different sub-parts of the kernel really do use their own trees, and their own mailing lists. You, as a KDE developer, would generally never care about it, so you only _see_ the main one.
  • You don't see how those lieutenants have their own development trees, and while the kernel is fairly modular (so the different development trees seldom have to interact with each others), they *do* interact. We've had the SCSI development tree interact with the "block layer" development tree, and all you ever see is the end result in my tree, but the fact is, the development happened entirely *outside* my tree. The networking parts, for example, merge the crypto changes, and I then merge the end result of the crypto _and_ network changes. Or take the powerpc people: they actually merge their basic architecture stuff to me, but their network driver stuff goes through Jeff Garzik - and you as a user never even realize that there was another "central" tree for network driver development, because you would never use it unless you had reported a bug to Jeff, and Jeff might have sent you a patch for it, or alternatively he might have asked if you were a git user, and if so, please pull from his 'e1000e' branch.
  • The fact that anybody can create a branch without me having to know about it or care about it is a big issue to me: I think it keeps me honest. Basically, the fundamental tool we use for the kernel makes sure that if I'm not doing a good job, anybody else can show people that they do a better job, and nobody is really "inconvenienced". Compare that to some centralized model, and something like the gcc/egcs fork: the centralized model made the fork so painful that it became a huge political fight, instead of just becoming an issue of "we can do this better"!
  • you can use your old model if you want to. git doesn't *force* you to change. But trust me, once you start noticing how different groups can have their own experimental branches, and can ask people to test stuff that isn't ready for mainline yet, you'll see what the big deal is all about. Centralized _works_. It's just *inferior*.
  • you do a single commit in each submodule that is atomic to that *private* copy of that submodule (and nobody will ever see it on its own, since you'd not push it out), and then in the supermodule you make *another* commit that updates the supermodule to all the changes in each submodule. See? It's totally atomic.
  • Git actually does perform fairly well even for huge repositories (I fixed a few nasty problems with 100,000+ file repos just a week ago), so if you absolutely *have* to, you can consider the KDE repos to be just one single git repository, but that unquestionably will perform worse for some things (notably, "git annotate/blame" and friends). But what's probably worse, a single large repository will force everybody to always download the whole thing. That does not necessarily mean the whole *history* - git does support the notion of "shallow clones" that just download part of the history - but since git at a very fundamental level tracks the whole tree, it forces you to download the whole "width" of the tree, and you cannot say "I want just the kdelibs part".
1 - 3 of 3
Showing 20 items per page