Cooperation in Parallel: Lessons From Ubuntu and Debian ========================================================= :Author: Benjamin Mako Hill :Contact: mako@atdot.cc :Date: Monday, 26 Nov 2007 19:00 :Affiliation: MIT / Ubuntu Project / Debian Project .. Note:: This talk was given at Kibepipe__ in Ljubljana, Slovenia. It is based on a talk delivered (in slightly different forms) at Linuxtag 2005 in Karlsruhe, Germany, at Libre Software Meeting in Dijon, France and at What The Hack near Boxtel, the Netherlands. More information on this talk my other talks is available at http://mako.cc/ __ http://www.kiberpipa.org/ Introduction ============================= .. Note:: SLIDE 1: Title and Two Forks Picture Ask for hands: * How many Debian developers here? * Debian users? * Ubuntu users? * Ubuntu developers? Overview ---------- * Big Questions and Context - Derivation:Why derive? - Forking: Benefits and Difficulties * Case Studies: Debian and Ubuntu * The Answer: Cooperation in Parallel: Joint work in groups working toward divergent ends. * Approaches/Solutions - Strategic Divergence - Distributed Source Control - Problem Specific Tools - Social Solutions Big Questions and Context ========================== .. Note:: SLIDE 2: World of Debian Customizers There are over 200 distributions derived from Debian. Why Derive? ------------- - The work of these communities is becoming increasingly difficult to recreate; - Single projects end up being asked to serve the needs of large communities with diverse needs; There are 200 different distributions because there are 200 different needs. Some distributions may be redundant in their implementation but they are not redundant in their needs. Derivations, in one way or another, must exist to fit a diverse group of needs from a large group. The result: - Derivation (ironically) becomes both increasingly important and increasingly difficult to do (or at least do right). We're seeing it in distributions first, because distributions are bigger and more complex, but we're seeing it other places as well. What Is Forking? ------------------------------ .. Note:: SLIDE 3: Fork is a Four Letter Word * Define 4 letter word * Define "Fork" (bifurcation in a project) Fork are not merely, or even primarily, technical; Forks happen on many levels (political, code, social, all of the above); * Examples of forks (emacs, gcc, etc) Difficulties of forking and derivation? ---------------------------------------- Historical view: "Forks are Bad" From the Free Software Project Management HOWTO: The short version of the fork section is, don't do them. Forks force developers to choose one project to work with, cause nasty political divisions, and redundancy of work. *In the best situations*: competition, redundancy, tracking outside project in addition. Using poor merge tools *In the worst (common) situations:* things get dropped on the floor. Forking has historically been so bad that a threat can keep the fork from happening. Case Studies ============== Debian -------- .. Note:: SLIDE 4: Debian Debian is, for the purpose of this discussion, *very* big: * The most packages * The most volunteers * The most derivations - Internal - External Everyone here understands Debian so I won't spend too much time on it. Ubuntu ------- .. Note:: SLIDE 5: Ubuntu *Joke:* To Scale Drawing Ubuntu is a Debian derivation. I'm not going to spend *too* much time explaining things. The key points for this conversation: * Debian Derivation * Regular and predictable releases * An emphasis on free software that will maintain the derivability of the distribution. * An emphasis on usability and a consistent desktop vision. Derivation is significant: * Code level changes (mostly trivial) to ~1300 packages. Derivation is also different. .. Note:: SLIDE 6: Ubuntu Derivation Model (Explain process.) Mark Shuttleworth has said, "every line of code in our delta that must maintain has a cost. It's in *our* interests to minimize this." This means getting code into Debian or -- in whatever way -- making sure that we don't go in different directions. Cooperation in Parallel ======================== .. Note:: SLIDE 7: Cooperation in Parallel This new model of cooperative work, *cooperation in parallel* (CIP), describes joint work in groups working toward divergent ends. The result is that groups working toward separate goals can collaborate and contribute to each others projects in ways that strengthen and bolster their individual projects. Criteria: - Supporting parallel work among individuals or groups interspersed by merge sessions. During merges, users are presented with work from collaborators and are asked to evaluate and integrate relevant work; - Emphasizing transparency over explicit forms of communication, modularity and traditional forms of awareness; - Effectively displaying and summarizing changes between documents in ways that take into account previous decisions to diverge; - Representing, reflecting, and resolving conflicts upon merges; - Allowing collaborators to quickly and easily decide when and who to work with. Toward this end, collaborators should easily be able to determine who appropriate collaborators are; .. Note:: SLIDE 8: Resonant Divergence The goal of CIP, when done right, is what I call **Resonant Divergence**: people achieve much more than they would have before. The trick, in resonant divergence, is to reduce the cost of maintaining a delta. This is done in a variety of ways, some of which we are still figuring out. These include: Approaches/Solutions ====================== Strategic Divergence -------------------------------- .. Note:: SLIDE 9: Strategic Divergence Break down the problem into a set of component parts. The example in deriving distributions can be: 1. *Selection of individual pieces of software* ``main``, ``universe``, ``multiverse`` -- e.g., UserLinux 2. Changes to the way that packages are installed or run (e.g., in a Live CD type environment or using a different installer) e.g., Anaconda, a Live CD -- also low impact 3. Configuration of different pieces of software Configuration changes can be handled different because they can be organized through a configuration system framework (e.g., Debconf, cfengine). CDDs approach this 4. Changes made to the actual software package (made on the level of changes to the packages code); Most invasive. By breaking down the problem in this way. Debian derivers have been able to approach derivation in ways that focus energy on the less intrusive problems first. Smaller teams can limit themselves to less intrusive types of changes to be successful. Distributed Source Control --------------------------- .. Note:: SLIDE 10: Distributed Version Control 5-minute intro to distributed version control Distributed version control aims to solve a number of problems introduced by CVS and alluded to above by: * Allowing people to work disconnected from each other and to sync with each other, in whole or in part, in an arbitrary and ad-hoc fashion. * Allowing deltas to be maintained over time. Recently, Linus Torvalds said: In fact, one impact BK has had is to very fundamentally make us (and me in particular) change how we do things. That ranges from the fine-grained changeset tracking to just how I ended up trusting sub-maintainers with much bigger things, and not having to work on a patch-by-patch basis any more Distributed systems include Arch, TLA, Bazaar, Bazaar-NG, SVK, Darcs, Monotone, Bitkeeper, others. While Ubuntu uses this heavily to maintain it's changes -- and will use it more in the future, this is even more useful for small projects. Distributed version control allows people to maintain deltas over time. Merge Tools ------------- .. Note:: SLIDE 11: Merge Tools - Merging still has a high cost - Merge modes address this Problem Specific Tools ----------------------- .. Note:: SLIDE 12: Problem Specific Tools Because there are a number of projects associated with branching a distribution (e.g., different patch system, upstream vs. non-upstream, etc), Canonical is building a front-end to Arch/VCS specifically designed for distributions. I've built my own system for documents that solves the particular problems of document management. Social Solutions ------------------ .. Note:: SLIDE 13: Social Solutions "Technical Solution to a Social Problem" -- unknown Things we've run into so far: * Keeping changelog entries * Working in "the right way" with projects and trying to work on their terms. * Maintainer field issues (giving credit but not giving too much credit. * Maintaining a good and open relationship with the project * Constructive engagement This is the hard part and this is where a derivation is made or broken. It is has where Ubuntu has suffered most. Applicability ============== .. Note:: SLIDE 14: Applicability While distributions and other large projects are being forced to confront this idea of balancing the benefits of forking and collaboration first, any project of any size can harness this power to make a better distribution right away. Clearly, the amount of code and people is on a different scale. Clearly, the solutions that projects of radically different sizes embrace will be different. I believe that in the next decade, the free software community is going to see a shift toward a development methodology where forking is not bad. Through this shift and through many other developments in the community, free software will be faster, better, and and ultimately successful on a scale we can only imagine now. The way this will happen will be different in different projects. Conclusions ============= On pragmatic grounds, Free Software succeeds because it harnesses the power of collaboration toward software production in a very deep and meaningful way. Through allowing people to share while diverging, free software will gain a benefit that proprietary competitors *can't* emulate.