Keeping research code accessible and relevant
It is an all too common occurrence these days. Code that someone wrote years ago stops working and results in a cascading set of downstream failures, or, possibly worse, old data ends up downstream and no one immediately notices. I remember this happening to a colleague who maintains a numerical weather prediction model for research purposes. As spring wore on, it seemed that the overnight low temperatures were consistently too cool over the northern United States. Then it became June and model-forecasted lows were still tumbling into the 30s every night. It was only then that he dug into the code and realized the problem: the snow cover analysis was from the middle of winter. Oops.
He tracked down the offending lines of code that did not properly check the date of the analysis and then remarked how the old Fortran code provided great “job security” because it had become so complex and unwieldy to maintain that it would take someone else eons to detect and fix issues. That may be so, but that does not make it desirable, especially for the organization, because it is a single point of failure. Eventually, the lack of reliability will cause others to abandon the resulting products in place of something new.
Academics sometimes have a challenge sharing and collaborating on their research efforts, or they do not have funding to bring another person onboard for a task they are passionate about. Some do not know how to delegate, feel that they need to be closely involved with all portions of the research, are concerned that successive research ideas may be poached, or are uncomfortable sharing credit. Others are “hot in pursuit” and do not want to make the effort to add team members that may not understand the project immediately, and temporarily slow down the research process as they learn. Regardless of the reason, this poses a challenge to R2O efforts.
Promoting a sharing culture in a research organization is the first major recommendation for establishing efficient R2O processes. If there is an existing code base that can serve as a platform for a startup research effort or R2O exercise, all parties benefit from its use and availability. The adopted code is no longer under the ownership of a single person or small team, and the value of the code increases from the expanded set of users as part of the new project. The new project is able to start at an accelerated pace because the team does not have to “reinvent the wheel”. It is a serious mistake to allow individual researchers or small teams to maintain “proprietary” code because it stymies code longevity and does not enable a natural evolution and expansion of the underlying effort that would keep it relevant. Those who leverage the code of others should make sure to acknowledge the contribution at the appropriate time.
Other recommendations are as follows.
Encourage the use of version control. Using a recognized code management system and/or version control software, keeping a good record of what code is changing, and who is changing it is essential for maintaining a systematic procedure when there are multiple developers in an organization. When developing code, whether adding new methods, or modifying existing ones, it is important to understand how existing, surrounding code works, its dependencies, and its role in the overall code base before making any changes, no matter if the changes are copied to a new tree or not.
Make incremental code improvements. Sometimes code does need to be rewritten, and during rapid prototyping, this may be necessary more than once. Reject poor coding practices and do not make logic more complex than required. The short-term time saving of leaving legacy code can result in major time sinks later during troubleshooting. Rewriting code may mean implementing new control loops and decision structures to decrease the execution time. Code may even need to be redeveloped in a different programming language to realize certain performance benefits, or expand the number of supported platforms or user groups.
Phase out old code. It is necessary to keep an eye out for code that is running on old hardware without routine maintenance, or there is no obvious owner and/or user. There should be a transition plan for that code to become incorporated into a maintained code base, or simply phased out. Research organizations should keep a record of end users who receive real-time research products, regardless of whether they are internal or external. To ensure the list is accurate, it will require following up with those users annually. It also is a great opportunity to have a discussion about new research products that are in line for transition as part of evolving R2O activities, or allow for another iteration of the existing R2O cycle for that product.