sustainability

while i was at code4lib 2018, i was a little distracted by some good discussions about software sustainability going on in the samvera slack. so i took the opportunity to propose a breakout session on the topic (notes here). based on those discussions, i think there are a few areas where we need to improve our processes.

first of all, i’m not talking about the things we’re already doing, like code review, high test coverage, automatic builds with Travis, etc. those are all great things, and there may be more tools we need to add there. but i’m really talking about the human processes where we decide what code to write, when to make releases, what work is important, and what can wait.

above all, we need to agree on a long-term plan. the samvera governance working group has made some recommendations that involve, amongst other things, hiring a technical coordinator, having a community roadmap, and making sure that all of our software has product owners who can engage in planning and release management. i wholeheartedly suppport those recommendations, and encourage you to review them, and provide feedback ASAP (by 2017-03-07).

i think that a roadmap will have many benefits, but the most important one is giving guidance on what is most important to work on now, and what can be deferred. part of developing a roadmap will be agreeing on the scope and direction of each component. i think this has been sorely lacking in the samvera community so far, and has led to people working at cross purposes. to date, we have worked by rough consensus, with implicit community values to guide us. this has produced a lot of good software. but it’s also produced confusion and frustration as the direction of some projects have drifted without explicit agreement on what the direction was. being more intentional about this will make development more predictable, and give our discussions about what to do next a framework to decide what is a good approach, or what is in scope.

in my opinion, the biggest challenge over the last few years has been the development of several large features in long-running branches. these include reversing collection membership, sipity workflow, collection extensions, and valkyrie support. these megabranches have been disruptive to ongoing incremental development, complicated release management, and created headaches for code review. having huge contributions on long-lived branches also creates pressure to merge them, even if there is ambivalence, to avoid the entire branch becoming unmergable and losing the contribution altogether.

the key approach, and one we normally follow, it to break up contributions into smaller pull requests which can be reviewed and merged individually. the thing all of the above megabranches have in common is that they led to a lot of breakage until they were complete, so it’s hard to see how the could be broken up. there have been a couple of suggestions, such as using a feature flipper to disable the new functionality by default, but allow those who want to try it out to enable it. i’m skeptical that this will always work well, and truly isolate the breakage. but it may work for some cases.

i think a more effective approach is to focus development on a single task, to complete the big disruptive feature work more quickly. when this work is ongoing, we should stop doing other incremental work to avoid conflicts. this will require a roadmap and community commitment. and it will require marshalling more focused development effort than we have typically done. the samvera developer event at penn state in the fall of 2016 might be a good model for this, since it featured a large number of developers, and planning before the event so development work could proceed rapidly. but one of the major products of that event was the sipity workflow engine, so clearly that’s only part of the answer. ultimately, we must work harder to narrow focus and break disruptive feature work into smaller pieces that can be landed in a few weeks instead of months. that work needs to be released and subjected to rigorous testing more frequently, to find and fix breakage more quickly.

and more frequent releases has a very real cost: if those are breaking changes, then more migration and painful upgrades will result. even if they are non-breaking changes, new releases often result in regressions of all kinds. so any move to more frequent releases must be accompanied by more effort to mainstream user experience and accessibility work, migration tooling and upgrade paths. we should not expect this work to be part of the release process, but should instead be working on them throughout the development cycle, and integrating automated tools to help prevent regressions and catch problems as they are introduced.