REPORT - ERLANG/OTP LIFE CYCLE COSTS PROF. FERGUS O'BRIEN SOFTWARE ENGINEERING RESEARCH CENTRE March 1998 1. INTRODUCTION SERC is currently undertaking a research project with the University of Technology, Sydney (UTS) on the characteristics of long-lived software systems. A number of systems are being analysed, including AXE-10 and a military combat system written in Ada. The time-scale for all these systems is a lifetime of 20 years plus. A primary objective of this research program is to isolate the characteristics of these systems that have enabled their survival. An associated objective is to analyse reasons for such systems becoming unmaintainable. This report includes results that have been obtained during this ongoing research program. 2. SCOPE For an individual software system contract the ongoing support contract is of the order of ten percent of the original development cost. This support or maintenance contract covers three separate activities:- * fixing faults (bugs) * cleaning up code (perfective) * enhancement (tracking market features) Over a 20 year lifetime a quick sum gives a support cost of 2 x development cost, for this single installation. In the case of the military combat systems, in Australia, there are only a limited number, but for AXE-10 there are, reportedly, some seven and a half thousand installations world-wide. A very rough estimate of the per annum support costs for AXE-10 is forty percent of $8bn, based on revenue and software component percentage. This gives a support cost of $400,000 per system per annum, or equally roughly three people per system. The problem faced with these twenty year old systems is that the support requirements are now beginning to exceed the contractual expectations, leading, at the best, to reduced revenues. However, the systems have been able to generate a positive revenue stream from their support contracts over the twenty year period, and it is this positive result that needs to be analysed. Report - OTP Life Cycle Costs 2 3. TWENTY YEAR RESULTS The systems studied have had very strong and robust architectures, together with an original set of design guidelines; typically:- * a minimal set of messages or signals between modules, thus following the software engineering coupling/cohesion principles * no patches, all changes to be made in the language itself * no numeric parameters In all the case studies such guidelines have been progressively ignored, and even totally forgotten in some instances. For example, a combat system of 20,000 Ada packages showed that, when traced, virtually every guideline had been totally broken. The reasons for this progressive dilution of the guidelines have yet to be examined in detail, but three aspects have been put forward:- a) Market pressures: the cycle time to bring new features to market has been significantly reduced over the twenty years. The consistent use of the guidelines in a procedural manner has been seen as a factor slowing down this time-scale. b) No system monitoring: the guidelines have never been implemented in a computer based monitoring system; they have remained procedural and hence subject to knowledge loss. c) Short-cut programming: the possibility of using hacking techniques, such as patching, was permitted within the maintenance tool set and even, as in a), encouraged. The complex coupling between modules, and the indeterminacy introduced by patching and numeric parameters eventually breaks the original architectures; the system becomes uneconomic to support. 4. FEATURES FOR LONG LIFETIMES The underlying message/block structure of PLEX and Ada systems has survived remarkably well, as these systems had learnt from the difficulties of previous generations. Over the past twenty years further lessons have emerged, giving rise to desirable features including:- a) a minimal semantic gap between the way an application is described and the language definition. The narrowing of the gap has been an evolutionary process ever since the first generation, programmed in absolute machine code. b) a set of guidelines that are executable, and incorporated in the system design. For example, the maximum number of messages per module should be a system limit. c) a language that has the minimal number of constructs in order to permit both rapid training and rapid time to market. d) no other constructs other than the language itself for implementation, but a strong communication capability towards other implementations. Report - OTP Life Cycle Costs 3 The last feature has come from languages such as Lisp that fulfil the general ideas, but are closed environments. This means that importing from a database, or driving a Web interface is extremely difficult. 5. ERLANG/OTP FEATURES The design of any approach to the set of desirable features, noted in section 4, is application domain dependent. For example, banking systems require transaction semantics; radar systems require hard real time performance. The domain for Ericsson is seen to be primarily in the soft real time, concurrent, distributed, reliable intelligent network applications. If the feature set in section four is applied to general purpose languages such as C++ none of them apply, primarily because C++ derived from a completely different domain. In the case of Erlang/OTP a), c) and d) apply since they were three of the drivers behind the design. For b), this is the primary focus for the Software Engineering streams research at SERC, and the framework project has given strong indications that b) is achievable. Another key attribute that has emerged from SERC's research is that Erlang shows a very tight coupling between a fault and a consequent failure. This has major positive implications for the bug fixing component of support. Experience on large projects using Erlang has shown a productivity, reflected in time to market, some five times that of C++. This minimises the pressure for the aspects noted in section two, pressures to hack a system to fulfil a perceived market pressure. This experience has also shown that the incorporation of the desirable set, plus other attributes due to the functional nature of Erlang, have given rise to extremely robust and reliable systems. 6. A TWENTY YEAR VIEW OF ERLANG/OTP Systems, such as the twenty year ones studied in the research project, were designed to be complete and self contained to a large degree. As a result considerable time and ingenuity has been devoted to breaking their architectures. Application Modules (AM) are an example in AXE-10, and client to client systems in Java environments are current examples. These experiences have resulted in the feature d) in section four. A system might chose to use Java in a programmable mobile device, with Erlang in the associated high reliability server. The key is a rich set of communications capabilities. With major products now in service for a number of years this intrinsic openness of Erlang/OTP has been clearly demonstrated. The latest example is the ATM Switch the AXD301; an outstanding product. This openness implies:- * There is no pressure to continuously extend Erlang to cope with as yet unforseen requirements. Such extensions may be implemented if they fit the functional paradigm, but equally, they may be achieved in another language, or in hardware. * The SERC research program has shown how to incorporate executable guidelines into the Erlang/OTP environment. This will ensure their survival over the long term. Report - OTP Life Cycle Costs 4 * The openness has already been used by SERC to close the semantic gap even further by linking total business case requirements through to production usage. These requirements include functional aspects, and non-functional aspects such as performance and reliability. The twenty case is at the three year point, but all the indications are that an order of magnitude reduction in life cycle support costs may be achieved, without compromising time to market, or openness. 7. CONCLUSIONS The research on long lived systems undertaken between SERC and UTS has already given indicators to be used in designing future languages. These indicators have been matched against popular languages such as C++; none are followed. They have also been matched against Erlang/OTP, and a majority of features have been incorporated. The remaining features are the subject of research at SERC, and this research is very positive. The net result is seen to be an order of magnitude reduction in life cycle support costs. Report - OTP Life Cycle Costs 5 APPENDIX A THE PLEX TRAP The considerations for taking full advantage of an Ericsson designed tool such as Erlang/OTP must include an analysis of the predecessor language PLEX. AXE-10 has been one of the most successful switches ever built, and has been a very strong revenue earner over the twenty years, since it first went into traffic. At the time AXE-10 was being designed, the primary competition was IT&T in Brussels. They chose a computer language, PL/I, and computer science graduates, for their strategy, the direct opposite to Ericsson. IT&T are now out of business. The PLEX trap that Ericsson describes now is a very real one, it is difficult to maintain and enhance systems, and difficult to interface to AXE-10, as the Legacy Project at SERC has illustrated. The Legacy Project together with the GA project are direct examples of how to escape the trap. One apparent lesson that this trap might teach is that Ericsson should not design a new language, but this is not correct. However, the lesson is actually that AXE-10 succeeded because its design, both hardware and software, gave a sustained competitive advantage, but that the associated guidelines were slowly eroded over time. As a consequence Ericsson is now placed to learn from this twenty year experience, and further develop Erlang/OTP to give a powerful competitive edge.