Why there is a point in viewing an integrated system as a system-of-systems

Many discussions on systems-of-systems (SoS) take as a starting point a situation where the circumstances require a SoS approach. However, it can also be beneficial to choose to regard an integrated system as an SoS even in a situation where it is not an external requirement. In this essay, I will discuss why.

Traditional systems engineering practices are inherently top-down, focusing initially on the overall requirements and breaking them down into subsystems and components. Then these are integrated and tested in various ways. SoS engineering, on the other hand, is more bottom-up, and focused on how existing systems can be combined to collaborate towards an objective which cannot be achieved by one system in isolation. Often, the particular characteristics of an SoS are regarded as problems that need to be solved, and this implicitly implies that the traditional top-down systems engineering process is the ideal to strive towards.

But is this necessarily so? Maybe there could be situations where the SoS bottom-up perspective is actually superior to top-down systems engineering? Let us return to Maier’s five defining characteristics of an SoS, and analyze how they can be reflected as opportunities in contemporary software-intensive systems practices:

  • Operational independence. A strong trend in the software architecture field has for a long time been to componentize applications, and this is in order to enable reuse of artifacts between different systems. Sometimes, this reuse can be more systematic, by using platforms and product lines. More recently, the concept of microservices has become popular, and has been well described by Martin Fowler. The term is perhaps a bit misleading, in that a microservice is not necessarily small, but can be arbitrarily complex. Each microservice is deployable on its own, and relates to specific business objectives, and thus fulfils the concept of operational independence. The microservices typically communicate through Internet protocols and web services. One benefit of microservices compared to monolithic systems is that they allow a more flexible capacity scaling when being deployed in cloud environments, since additional instances of a particular service can be added without having to replicate the whole system.
  • Managerial independence. A strong trend in contemporary system development is to use agile principles, where small teams work fairly independently on a part of the system. This has proven to be extremely efficient in many situations, but it can create problems in more complex systems where the integration between teams becomes tricky. However, when combined with an architectural approach like microservices, each team can be given the responsibility for one service, and hence the components become managerially independent from each other. The benefit is to make maximal use of the agile strengths and minimize costly coordination activities. Of course, one could argue that typical agile teams are not totally independent, but usually part of the same larger organization. However, in open source communities this is not the case, but the organizations are voluntary participants with clear managerial independence.
  • Evolutionary development. Having the microservices in place, and under the responsibility of agile teams, it becomes quite natural to let them evolve their services fairly independently. Thus, the evolutionary development becomes a natural consequence of the independence of the components, and makes it possible to adapt to new needs or fix issues much more rapidly than in a monolithic system.
  • Emergence. Even though the independence and evolutionary development of the microservices provide many benefits, there is still a need for some central authority which ensures that the combination of components provide the desired emergent capabilities and properties. This is achieved by selecting what constituent systems are needed, and designing the rules of interaction between them. As discussed in one of my previous essays this can be achieved by orchestration and choreography between the selected microservices. The fact that the system has been componentized is a strength, since the components can be used to create new SoS with different emergent properties, either as part of an organized product line, or more organically.
  • Geographical distribution. A typical deployment of microservices is in a cloud platform, and even though this can occur within a single cloud datacenter at one geographical location, it still fulfills the criteria of geographical distribution. This is because the components do not make any assumptions of the location of other components and uses general Internet techniques for communication. In this way, it becomes possible to build a globally distributed system that has benefits in response time for local users as well as resilience in not being dependent on a single deployment, but being possible to quickly reassign to different server halls.

To summarize, the combination of microservice based architectures with development in agile teams that allow components to evolve at their own pace and be deployed independently in a cloud environment is effectively a SoS approach to software development.

So maybe SoS engineering is the answer to how to do systems engineering in a more agile way – something that has been a long-standing challenge to the systems engineering community. Maybe it is time to do away with the top-down approach as the ideal for systems engineering? However, this way of working of course gives new challenges for the SoS architects, since it becomes crucial to find a division of the system into microservice that is effective over time. Possibly, the principles for such architectures could be common for traditional SoS and the software-intensive variants described in this text.

Acknowledgment: This essay was inspired by a talk by Ben Ramsey at the Colloquium on Software-intensive Systems-of-Systems in Copenhagen, Nov. 29, 2016, where he argued that certain open source software projects can be fruitfully regarded as SoS.

Orchestration vs. choreography in systems-of-systems

Orchestration.jpg

When creating an SoS, there is usually a certain functionality or capability which is sought. This functionality emerges as a consequence of the collaboration that the constituent systems engage in. The functionality does however not emerge out of the blue, but it has to be designed like any other engineered system. In some way, the constituents have to be told what they should do or how they should behave in order for the SoS to reach its objectives. How this information should be conveyed to the constituents is a fundamental design decision, which is sometimes also limited by the context of the SoS, including what authority the different stakeholders have in relation to each other.

There have been attempts to identify recurring patterns, or archetypes, that are common in SoS with respect to this coordination. The most influential set of archetypes was initially proposed by Maier (1996, 1998), and then extended by Dahmann and Baldwin (2008). It is based on the authority and responsibility in managing the evolution of the SoS, and consists of the following archetypes, as interpreted by Lane and Epstein (2013):

  • Directed: The SoS is built for a specific purpose, and has a dedicated central management. The constituent systems retain their individual capabilities but are normally subordinated to the SoS.
  • Acknowledged: The SoS is built for specific purpose (similar to directed), and has central management in the form of a dedicated organization. However, the constituent systems are not normally subordinated (similar to collaborative). Typically, it is a result of building an SoS out of a combination of existing and new systems. Evolution takes place through collaboration between the constituent systems’ owners.
  • Collaborative: The SoS has an agreed upon purpose, and central management, but with limited power. Typically, the central management is formed through a cooperation between the organizations behind the constituent systems, rather than being a dedicated organization for the SoS. The constituent systems collaborate voluntarily to fulfil the agreed upon purposes.
  • Virtual: There is no agreed upon SoS purpose and no central management. The SoS behavior is emergent, and not caused by explicit mechanisms. The formation is ad hoc and the constituent systems are not necessarily known.

The virtual archetype is somewhat questionable, since it can be discussed if an SoS without a purpose is even to be considered a system. The example of a virtual SoS proposed in the literature is the World Wide Web. On the other extreme, a very directed system would probably have constituents with very limited use outside the SoS context, and hence it is more of a system than an SoS.

In a directed SoS, the central management organizations typically define the design of the constituent systems, whereas in the acknowledged archetype, it reaches agreements with the organizations responsible for the constituent systems.

Although this set of archetypes gives interesting perspectives on the power distribution in an SoS, it does not provide details about how the information about desired behavior is conveyed to the constituents. However, in the domain of service-oriented architecture (SOA), this has been discussed using the concepts of orchestration and choreography. The differences between these two concepts are as follows (as explained here):

  • Orchestration: A single centralized component (called the orchestrator) coordinates the interactions between the other components. It is thus responsible for implementing the SoS services.
  • Choreography: A global description is created, which contains information about the participating components, the information exchanges between them, rules of interaction, and agreements between them. This description is used by all the participating components, and thus constitutes a decentralized approach. In practice it could be implemented in different ways, e.g. as a specification document, a software plug-in, or a data file read by a generic interpreter.

The parallel to the SoS archetypes is pretty obvious: a directed SoS is designed as an orchestration, whereas a virtual SoS is based on choreography. The other archetypes constitute a mix of the two.

Personally, I have always found the SoS archetypes difficult to apply in practice, possibly since most concrete SoS examples are in the mixed categories, and these are not very distinct. Possibly, a route forward is to instead refine the concepts of orchestration and choreography to concretize for a given case how collaboration actually is coordinated?

Choreography.jpg

Society as a metaphor for systems-of-systems

Whereas it is possible to reason about a single autonomous system by making parallels to an individual, and in particular the functioning of the human brain, the natural analogy for systems-of-systems is a society. Here, I will make an initial characterization of this analogy.

One of the reasons why I am fascinated by the topic of systems-of-systems (SoS) is because it needs to mirror in a (socio-)technical system many of the characteristics of human life. It is, or should at least be, an interdisciplinary subject, where it is possible to draw on many fields of research and practice.

In many situations, the elements, or constituent systems, of an SoS have to be intelligent in some sense, in order to be able to adapt to changes in other parts of the SoS or in the external environment. This intelligence can be achieved by either using a socio-technical solution, where the technical parts interact with humans who provide the intelligence. Increasingly, we see a trend where the systems themselves become autonomous, and not relying on human input, at least not for all kinds of adaptation. Personally, I am currently involved in the field of autonomous vehicle, and my experience there is that when designing the autonomous systems, it is often useful to start by describe what happens in the socio-technical system. Then, the role of the human part of that socio-technical system is analyzed and codified, so that parts of it can be transferred to the technical system. Since the human is the model for what the system will do, the solution typically ends up including elements of what we believe are parts of the human mind, such as perception, world model, consciousness etc.

But when we move from single systems to SoS, the analogy is no longer with a single human, but with a group of interacting, collaborating people. If the system grows complex enough, the parallel is even to a society as a whole. One of the key thoughts I want to explore on this site is what that parallel looks like, and if and how it can be useful when designing SoS. When I write this, I do not know where this thought will lead, but my gut feeling is that it is so important that it makes sense to use it as a metaphor for the whole endeavor, and hence the name of the site became societies of systems.

As a starting point, I think it makes sense to look at what constitutes a society, just as the human mind was a reasonable starting point when looking at automating a socio-technical system. So what is a society? This is what Wikipedia has to say about that:

“A society is a group of people involved in persistent social interaction, or a large social grouping sharing the same geographical or social territory, typically subject to the same political authority and dominant cultural expectations. Societies are characterized by patterns of relationships (social relations) between individuals who share a distinctive culture and institutions; a given society may be described as the sum total of such relationships among its constituent members. In the social sciences, a larger society often evinces stratification or dominance patterns in subgroups.

Insofar as it is collaborative, a society can enable its members to benefit in ways that would not otherwise be possible on an individual basis; both individual and social (common) benefits can thus be distinguished, or in many cases found to overlap.”

I believe there are clear parallels here to how an SoS is typically described. So what are then the elements of a society? The above text makes a reference to the subject of social sciences, so let us once again consult Wikipedia on this topic:

“Social science is a major category of academic disciplines, concerned with society and the relationships among individuals within a society. It in turn has many branches, each of which is considered a “social science”. The main social sciences include economics, political science, human geography, demography and sociology. In a wider sense, social science also includes some fields in the humanities such as anthropology, archaeology, jurisprudence, psychology, history, and linguistics.”

Further down the Wikipedia article, a slightly different list of branches are described in more detail. There are thus many aspects of society studied by different branches of social science (possibly more than the ones described above), and some of them appear more relevant than others as a parallel for technical SoS. In particular, these appear to be important:

  • Economics: How are resources shared between the participants in the SoS, and what services do they provide to each other?
  • Jurisprudence: What are the rules that are needed to make the SoS function, and how can they be enforced?
  • Political science: How should the system of government of the SoS be designed?
  • Communication studies: How should the constituent systems in the SoS communicate with each other?

In future posts, I plan to dig deeper into some of these subjects, and try to understand how they can be useful in the engineering of SoS.

Digitization: An introduction

One of the most important trends in society today is the digitization of almost everything. It is also the trend that makes systems-of-systems an urgent topic. But what is digitization all about? Here, I will give an overview of what I perceive as some of the most important trends that make digitization happen today, as well as some of the challenges.

On Nov. 24, 2015, I had the honor of addressing a group of some 200 key decision makers in the automotive industry at the annual conference of the Swedish vehicle research and innovation program (FFI). The topic I was asked to talk about was digitization, to give the audience an understanding of what it is and why it is happening right now. The talk was in Swedish, and can be seen online. In this post, I will summarize some of the topics I brought up in the talk, as a background on digitization that is important to understand the challenges related to systems-of-systems.

Digitization is a trend that is affecting everybody and every part of society, and due to the many uncertainties connected to it, a lot of people are experiencing what one might call a “digital anxiety”. The effects will be large, some thinkers are suggesting that we will all be out of jobs, and that the gaps between rich and poor will increase. Others, on the contrary, argue that this could be the beginning of a new golden age, where new and better jobs will replace the old ones. Politicians are talking about a new industrialization where the digital technology is breathing new life into manufacturing.

In my work as a researcher, I often meet professionals who describe a similar digital anxiety, but from the perspective of their company. They ask questions such as: How should we relate to the new technology? How will the market and business models change? Will Google or some other actor steal our customers? Of course, nobody knows the answers to these questions, but the best one can do is to increase one’s knowledge of the underlying forces. My target with this post is to briefly summarize the technical trends that are behind industrialization, and point at some challenges for industry, in particular manufacturing companies as exemplified by the automotive business.

Technical trends

So what is behind digitization? Well, in essence, it is a technology development that has been going on at an even pace for a long time, and which is often referred to as Moore’s law. This well known trend is 50 years old, and says, a bit simplified, that the performance of electronics doubles every second year without increasing the cost. The below graph shows this development from 1970 to today. Note that the y axis is logarithmic, so that today’s electronics has 10 million times the performance of the technology 50 years back. This means that it is all the time becoming possible to perform faster calculations, store and send more data, and provide more sensors at the same cost.

Moores law

A similar development can be seen for Internet bandwidth, as shown in the next graph. Communication is now so fast that it pays off to build very distributed systems. Data can be gathered in one place, and processed or stored in another, while being presented in a third.

Internet connectivity

It has also become cheap to put powerful sensors on physical things. The next graph shows how the sensor industry predicts that the number of sensors will grow over the next ten years, from ten billion today to ten trillion. With improved communication, it becomes possible to access the data from a distance, which increases the value of the sensors.

Sensor trend

Lots of sensors produce lots of data that can now be gathered in the same place. Big data is a popular term for this, which also relates to the influential power provided by having access to data. Note in the next graph that the y-axis is linear, so this is what an exponential growth really looks like.

Global data

System trends

So what are the consequences of these technology trends on how systems are built? One consequence is that it becomes interesting to put plenty of sensors on physical items, and connect them to the Internet, which is often referred to as the Internet of Things. Examples of this can be found in building automation, so called smart homes, but similar solutions are discussed in many industries.

Access to computational resources has also become very flexible through cloud computing. In essence, this is just large server halls where users can buy capacity instead of buying their own computers. But this leads to extremely low entrance thresholds for new IT companies, since almost no capital investments are required. Instead, more computational power is rented as demand is increasing.

To have any use for all the data that these systems generate, it must be processed, but this is challenged by the limitations of our ability to develop algorithms that can solve complex problems. Instead of hand crafting algorithms, advanced techniques from artificial intelligence such as machine learning are needed, to automatically find interesting information in all the data. Simply speaking, this means that the computer is not told how to solve the task, but instead it is trained on a number of examples to find patterns. Like many of the issues already described above, these are old techniques, but it is only now that there are sufficient computational resources available to solve interesting, real-life problems. It is also only now that there is sufficient amount of data to train the algorithms.

One important factor why digitization is happening now, and not before, is also that cheap, powerful, and always connected terminals have become available for most people through smart phones. This increases the potential clientele for digital services, and increases the time and place where those services are available, since they no more require that we are using them in front of our computers.

Last but not least has the standards for software matured, and in particular the standards used on the web. This makes it very easy to give a system a data interface towards other systems. One very important development in this is that it now becomes possible to view a system as a component in a larger system-of-systems.

Benefits

So what benefits is it that people try to achieve through digitization? Well, if you scratch the surface and try to find common factors, the services are often about automation of workflows. It is about connecting different systems to a system-of-system, where data is transmitted automatically between the parts, whereas before a person had to move the data manually. Along the way, automatic data processing is also introduced, that can make better decisions than humans due to a more detailed world view. It further becomes faster, more predictable, and eliminates the risk for human mistakes. One example of companies utilizing this is Klarna to make extremely fast and accurate credit assessments.

Another type of efficiency improvement is to better match supply and demand. Companies like Uber and AirBnB make unused resources in the form of cars and beds available for customers who could perhaps not afford a traditional taxi or hotel.

A third effect is the possibility to open up the systems for external extensions, which is sometimes referred to as open innovation. A good example of this is the possibility to download apps from third party developers in a smart phone, but similar possibilities exist also in other types of systems. The base system developer gets a more attractive product that can be customized for many uses. The app developer gets a market for their ideas. The customer gets a broader range of products to choose from. One can reason in similar ways regarding sharing of data and the creation of open service interfaces.

Consequences for industry

So what are the consequences for the manufacturing industries like the automotive business? One major concern is to what extent companies want to, and dare to, open up their products. This question is partly technical, focusing on what the system architecture should look like to be able to handle openness. But it also contains business concerns. Do you want to continue to “own” the customer, i.e. put your company in the center of the picture? Can you even own the customer? I do not believe that this will work in the future. Instead, I believe that we will increasingly view products such as cars as components in systems-of-systems, where they interact with other similar products (other cars); with the OEM’s cloud services; with public services; and with services from third parties. Focus will be moved from the individual products, to the larger context where the product is residing.

Connected to this is also the OEM’s strategy for handling partners. Who do you want to work with in order to offer attractive services? How will you handle an increasingly fragmented landscape, where new actors rapidly appear, and then maybe disappear again? Where the software lifecycle is completely different from that of the physical product. Speed and flexibility will become keywords.

A third aspect that I would like to emphasize is how to make the systems trustworthy. In the future, many products will become reliant on other actors’ data, software and services. These will change at an increasingly higher pace, and the OEM will not be able to apply quality assurance in the same way as they do to physical components from suppliers today. So how can you ascertain that it is safe to use a product such as a vehicle as a component in a system-of-systems? Who is responsible for quality? How do you test it? My research indicates that there is a need to work with more dynamic solutions, where data is continuously brought back from operations, where incidents are evaluated before they become accidents, and where improved software is rapidly deployed to products in the field.

So to conclude, it is apparent that digitization is a development that results from improved technology, and fundamentally by Moore’s law. However, the challenges for industry concern to a large extent how the technology can best be applied to the business, and how to do systems engineering under these new circumstances.

System-of-systems challenges

The key topic of this site is systems-of-systems. But what is that? Here, I will explain some key definitions, and provide an overview of some of the challenges in the area.

The term systems-of-systems (SoS) has been around for a few decades, and is seen as an important concept, e.g., by the European Union in its Digital Agenda. Still, a lot of people seem to equate SoS with large and complex system in general, and not give it any more specific meaning. However, at least in the research community, there is a rather clear and widely accepted definition put forward already 20 years ago by Mark Maier. He identified five key dimensions:

  1. Operational independence of the elements. The constituent systems can operate independently in a meaningful way, and are useful in their own right.
  2. Managerial independence of the elements. The constituent systems not only can operate independently, but they do operate independently even while being part of the SoS. They are acquired separately.
  3. Evolutionary development. The SoS does not appear fully formed, and functions and purposes are added based on experience.
  4. Emergent behavior. The principal purposes of the SoS are fulfilled by behaviors that cannot be localized to any individual constituent system.
  5. Geographical distribution. The constituent systems only exchange information and not substantial quantities of mass or energy.

An intuitive interpretation of this is that an SoS is a group of independent collaborating systems. The elements of an SoS, called constituent systems, retain an operational and managerial independence, but when combined in a certain way, they provide together a new capability that is emergent from their cooperation.

In the spring of 2015, I had the pleasure of leading a project on developing a strategic research and innovation agenda for SoS in Sweden. The project was sponsored by the Government Agency for Innovation, VINNOVA, and it was carried out through a set of workshops with representatives from large and small companies with an interest in the subject, many of them being member of the Swedish INCOSE chapter. There were also workshops gathering Scandinavian academic researchers in the field.

Much of the research previously done on SoS has been focusing on military and similar applications in a US setting. In our work, it soon became apparent that a broader view was needed, taking into account civilian and industrial applications as well. This opens up many important avenues for research, and a key conclusion of the agenda is that there is a need for capabilities to rapidly develop trustworthy SoS. This is a grand challenge to the area in light of the digitization of society, and it requires advances in many areas. A set of challenges were identified that require particular attention:

  • Theoretical foundations. There is a need in general for a more advanced theoretical foundation for the SoS field, including a more precise language for describing and reasoning about SoS. Specific topics include emergence, which is currently not well understood, but is essential since creating an emergent behavior is usually the raison d’être for the SoS. The principles for the design of mechanisms that create the desired emergent behavior and properties is also a key topic which is in its infancy.
  • Socio-technical aspects. Many of the SoS challenges in practice relate to the organizations that manage the SoS and its constituent systems, and the need for agreements and negotiations between them. Finding efficient ways to deal with this is essential for rapid SoS development. Also, as automation progresses, a more fundamental understanding of the interplay between the technical systems and the people and organizations is needed, as the distribution of work between them changes.
  • Architecture. Architecture is and will remain a central part of SoS engineering, and further refinement of methods for describing and evaluating the architecture is needed. In particular, the architecture is an enabler for rapid assembly of constituent systems into an SoS, simply because a good architecture will make the pieces fit better together, thus requiring less time for adjustments. It is also an enabler for trustworthiness, by describing clear principles and distribution of responsibilities between the constituent systems. There is also a need to focus on the architecture of systems that could become a constituent, and find ways of building flexibility into those systems from the beginning to make them adaptable to the needs of a future SoS, thereby reducing the duration of SoS integration.
  • Modeling and simulation. Modeling involves describing the SoS in a simplified way, and has a strong relation to architecture. Capturing the essential structures and behavior in a concise way is an enabler for an efficient communication between the involved organizations, and thus leading to more rapid agreements. Many of the existing modeling frameworks can be improved, in particular there is a need for light-weight versions that can be used to rapidly capture the essentials. Models are also used as input to simulations, which allow for early verification of the emergent properties. In particular, co-simulations where existing models of constituent systems can be integrated are of importance, especially when complemented with efficient ways of visualizing the effects for decision makers. Modeling and simulation are thus a foundation for rapid SoS engineering, allowing fast iterations of design and evaluation. Their value in establishing trust, through extensive analysis of, e.g., safety, merits further research.
  • Interoperability. To be able to link the constituent systems together, interoperability is a key. Techniques for achieving this exist, especially on the syntactic and to some extent semantic levels, but further development is needed to handle pragmatic and organizational interoperability. Achieving interoperability is largely founded on standards, which take a very long time to develop, and there is an urgent need of finding more flexible mechanisms that allow rapid achievement of interoperability, even between existing systems.
  • Trust. There are multiple dimensions of trust that need to be handled, including dependability, robustness, security, and privacy. The particular aspects related to trust in SoS are caused by the operational and managerial independence of the constituent systems, and maintaining trust over the evolution. An overarching challenge lies in combining trust with rapid development. For this, new techniques need to be developed, and the most promising way forward is based on systems thinking, where progress is already being made in the safety area. This should be combined with simulation based approaches which allow rapid reevaluation of trust during system evolution.
  • Business and legal aspects. Much of existing knowledge about SoS comes from government driven applications such as defense, and there is a lack of understanding of business models for commercially oriented applications. This includes also the design of mechanisms for keeping the SoS together, including both motivations and punishments for constituent systems. In many situations, legal contracts are needed, and to avoid lengthy negotiations, template contracts should be developed for rapid conclusion of the necessary agreements. There is also a lack of understanding of liability issues related to an SoS, when severe losses result from the emergent behavior of the SoS rather than from an individual constituent system.
  • Processes and methods. Although many of the principles of general systems engineering (SE) also apply to SoS as well, there are also fundamental differences. The SE processes are typically characterized by working top-down, and by tackling complexity through decomposition. The SoS processes need to be bottom-up and focusing on integration of existing element. To this should be added the need for speed, which is not one of the strengths of SE. There is a need for a better understanding in general of the SoS processes, and in particular how to efficiently organize cross-organizational development. In this, management and leadership aspects play important roles.

The agenda document discuss these challenges in more detail and also the topic of standardization  which is important, but to some extent in conflict with rapidity, since standards can only be developed once the understanding of technical solutions has matured.

The above challenges are driving my current and future research in the SoS area, and I will return to some of them in more detail in future posts.