DISTRIBUTED PROCESS COMPUTER SYSTEMS
PRINCIPLES AND APPLICATIONS OF DECENTRALIZED PROCESS CONTROL COMPUTER SYSTEMS G. Farber Lehrstuhl fur Prozessrechner, Technische Universitiit Munchen, Munchen, Germany
Abstract. The main reasons for the decentralization of proces s control computer systems are the necessity to control topologically l a rger proce s ses, the growing demand for higher availability and the growing functional complexity of processes, that requires more proces s ing performance. These points influence the structure of a decentralized computer-system for a given process in different ways: The optimal topological structure is generally not functionally optimal. The fir s t part of the paper makes some suggestions how to coordinate thes e different demands and the application-oriented second part demonstrat es these principles at several examples. Keywords. Distributed process control; message transport system; decentralized task administration; redundant system concepts; power management. INTRODUCTION
by sensors and actuator s .
Up to some years ago the high cost of process control computers prevented their use for single-task applications. Much effort was spent providing effective tools to make the power of one computer available to many different applications. Very often economic reasons led to an overload of single computer systems running too many tasks - the complexity usually necessitated time-consuming efforts to get the system operational. Today the development of hardware technology allows the economical realization of decentralized systems. The following reasons are responsible for the idea of distributed processing in process control applications not only being of theoretical interest but also resulting in products that can be offered on the market. a) Reduction of Functinal Complexity By dividing complex tasks into smaller clearly defined parts the total functinal complexity may be reduced and subsequent system planning and installation made easier.
c) Increase of Reliability An growing number of applications require high reliability which is conventionally realized by expensive back-up-instrumentation systems. Decentralized computer systems are inherently more reliable: If one component fails only a part of the controlsystem stops operating - the behavi or thus bears compari s on with conventional instrumentation systems. Additionally the decentralized system concept may provide an automatic passing over of functions from a failing system to one or more back-up- s ystems . Distributed systems help overcome some well known difficulties like the priority assignment problem within single computers that have to handle several time-critical processes. But some other problems are onl y dependent on the proce s s it s elf and not on the structure of the controlling system. By dividing complex processes into s everal parts and delegating th e tasks controlling the resulting parts to several dedicated processors, the problems are partially shifted on to the communi ca ti o n system between the computers.
b) Cost Reduction The concept of decentralized process control-computers allows the installation of initially very small systems that may be expanded at a constant cost/capacity-ratio. Most cabling costs (which increase according to labour- and copper-costs) may be saved by installing the computers close
If one considers decentralized computer systems in more detail one di s cover s new problems that do not arise with centralized systems. The implementation of a distributed system does'nt just mean installing some computers and linking th e m together: The highly organized system re s ult s
in challenging new problems for example remote diagnostics, error-recovery, software organization, planning- and installation-tools and methods,synchronisation of tasks residing in processors with different time bases to name just a few.
data to/from another computer responsible for the functional subprocess. In fact only the second proposal is economically feasable, because of the great influence of cabling on the total system cost; moreover data from some sensors may be needed for several subprocesses.
PRINCIPLES OF DISTRIBUTED PROCESSING FOR PROCESS CONTROL Topological and Functional Decentraliza tion The spatial distribution of sensors and activators is the basis for the well defined mathematical model that is used to determine the topologically 07timal points for computer installations The sum of all cable lengths should be a minimum. Usually the results of such computations can only be used as a first approximation since other restrictions (like the conditions for local installation) have to be considered.
Tr ansport S;o.'s t e. !otT:;
I com put er "' ith r e8 p o n 8~
The functional decentralization has the aim of reducing system complexity by dividing a large process into smaller parts. In some cases this results in naturally separated sub-processes with only weak interdependencies - they may be dedicated to computers that are to be programmed separately, and the system complexity is thus significantly reduced. But other technical processes are complex in the sense that any subdivision generates subprocesses having strong interdependencies. Here algorithms have to be found to determine the "minimum-interface"-configuration: As in the rules of "structured programming" for breaking down flow-diagrams into more detailed blocks the new sub-processes should have minimal and clearly defined interfaces /2/. It was difficult to the serial operation of digital computers and consequently to serialize parallel processes (like today's mUltitasking-systems do); but it is much more difficult to parallelize a serial pro c ess to distribute its control to several processors. The job of subdividing processes or the corresponding controlling functions is similar to the definition of tasks in a single -proces sor -multit asking system: Additionally the single tasks have to be delegated to different processors. The resulting functionally optimal decentralization will generally not be the same as the topological optimum. Simplified you have the choice of installing the computers according to the functional optimum and to lengthen the cables from sensors and to actuators, or you leave the computers in their topological optimum positions and provide a means of transporting
Fig. 1. Effects of transport delays on system responsetime Fig. 1 demonstrates one important aspect: Consider a subprocess managed by computer B. There is an alarm message from a sensor connected to computer A that must result in a reaction at an actuator at computer C. It is clear that the total system reaction time consists of the conventional responsetime in system B plus the two transporttimes through the MTS: In order to avoid dangerous delays the two transporttimes should not exceed the computers' reactiontime. This means that - the transporttime has to be very short there must be a guaranteed absolute maximum transportetime which should not just be determined through statistical evaluation. Both requirements restrict the possible solutions for the MTS-problem. The capacity of the transportsystem must be rather high since a lot of sensors and actuators may request simultaneous message-transmission - it is typical for technical processes that in critical situations important messages originate at many points simultaneously. Additionally the MTS has being to provide logical means to coordinate and svnchronize tasks that are being perform~d bv different processors. This is not so trivial since different processors use different time bases and the transport of synchronisation messages requires a considerable amount of time. It can be seen that the MTS fulfils an important coordinating function for the decentralized computer system. The Aim of Graceful Degradation Graceful degradation in its ideal form
r Decentralized Process Control Computer Systems
means that a system having n processors would still execute all tasks even if m processors fail (n > m) - wi th the reduced performance 100 x (n-m)/n%. The realization of this requirement is only possible in the reduced form in that all tasks from one failing processor are taken over by one or more back-up-systems. The processor that has to take over needs at least the following data: - Information about the state of the task - Information about the state of the process - After the fail: Access to the processperipherals of the failed system. The first two types of information may be obtained by "checkpointing"j at well defined points in the program (e.g. controlled by a timer) an unequivocal state has to be fixed (to maintain data integrity) and prepared for transmission to the back-upsystem that contains a copy of the program. The time-interval between checkpoints should be as short as possible since the take-over by the back-up-processor should be as smooth and continuous as possible. A high MTS-capacity supports this requirement. The third condition requires an additional expenditure: The process peripherals have to be connected not only to one but to several processors. This is not difficult for signals from sensors but some what involved for the output lines. Perhaps the failed processor does'nt recognize its' failure and still tries to control the output lines probably in switching incorrect information. There must be a way to safeguard against this possibility. Now the question arises who should actuate this safeguard and who should define generally which processor has failed and when: How are failures detected? Again one comes to an important demand on the properties of the MTS: It must be used by all processors to check their neighbours in an intensive way as possible. Moreover it has to transport (especially the negative) results to all computers in the system. Therefore the back-up-processors receive the information on failed systems by several independent ways before they take over by activating the back-up-tasks based on process- and task-states from the last checkpoint. This means: The detection of failures is a distributed function. The MTS again plays an important rale performing status exchange-services as intensivly as possible. Additional Requirements Even if no back-up-system philosophy is implemented very effective diagnostic tools and methods for error recovery become necessary. Consider a computer system with
nodes distributed over an area of some squaremiles - if anywhere a component fails it must be possible to localize it exactly and to define the type of error otherwise the meantime to repair becomes too large. These methods may be integrated in the Message Transport System. Another requirement is the access to central resources that cannot be completely dispensed with even in decentralized systems. Examples are: - Central data bases. Mass memories will be (because of environmental conditions) concentrated at one or a few places. This means a lot of traffic for the MTS and some additional coordination tasks
/3/. - Fast specialized processors like array processors may just be installed once. - Operator consoles have to be connected to one or perhaps 2 processors - all messages about the state of the process and its control system have to be routed to these processors. Central resources are responsible for additional MTS-traffic and may require large transport-capacities. Finally the organizational basis of decentralized systems has to be discussed. Usually a hierarchical organization is used but this can cause reliability problems since a single failure may stop a central ro-ordinator. Generally the functions performed by the operating system are considered to be central functions that are difficult to distribute to several computers. But if one analyses the functions in detail only a few remain - the trend in operating systems for single computers points also in the direction of decentralization (e.g. the definition I/O-handlers as tasks). The remaining central functions are required mostly by general purpose systems that have to allocate and schedule new jobs entered into the system on a dynamical basis. In process control applications this is not a stringent condition: The tasks are defined by the process and can be installed statically. Only the development of new programs and linking them to the existing system may necessitate a central co-ordinator but these functions must not be decentralized since reliability problems during the development phase do not arise in this sense. The Message Transport System (MTS) The pr~vious sections indicated the central role of the Message Transport System (MTS) for decentralized computer systems /4, 5/. The most important requirements can be summarized as follows: - The MTS must have a high extensible transport capacity.
- The transport delay induced by the MTS must be short and have a guaranteed maximum value. The reliability of the MTS must be high: One failing link should not stop communication between computers. - Mechanisms have to be provided in the MTS that allow an uninterrupted monitoring of all resources (computers and links) and the dissemination of the resulting status information. Fortunately there are also special conditions in local networks that allow to these extensive requirements to be realized on an economical basis: - High capacity-lines (e.g. optical fiberlinks) may be installed: Line capacities are not a limiting factor. - The transmission error rates are very low - much lower than in the public phone network. - Propagation delays are negligible. The MTS may be built up in different ways. Bus-systems are extensivly discussed as a central transport medium for decentralized process control systems and some standa~ dization proposals have been made on thlS basis /6/. But since only one common line is used to connect all computers, capacity- and reliability-Problems arise from this approach. Bus-systems are very well suited for connecting single sensors or actuators to a local computer and probably local duplex-bus-systems with "intelligent" sensors and actuators are a fine solution for the local instrumentation problem (Fig. 2).
a) fully connect,.d network: n x
b) switched network: alternate routes, e . g. A to C : 1. A-B- C (Bsw l t ches) 2. A-F_C <"'switches)
Fig. 3. Full y Connected and Switched Network There is a choice of two me s sage s witching principles within the nodes: - The circuit switching behaves like the switching system in the phone network: In a first phase a connection between two participant s is made, the second phase is used to trans port mess age s ~nd the 3rd phase terminates the connectlon. The problem is that most messages in a process control system are very short (e.g. a digital value or an alarm identification) therefore the 1. and the 3. phases generate a considerable overhead. - The packet switching beha ves like mail or telegrams : The message itself is extended by the address of the destination and put into the network where it is trans ported independently from node to node. Here the overhead is much smaller but the delays in the network may become larger since a packet normally i s completely a s sembled at each node (before it is transmitted to the next one store and forward, Fi g . 4a).
I nt e l ligent se n sor I nt e lli ge n t act u ator
l oca l dup l exbus
trans .. i ssio n system
a."Store and Forward"
r- - -~
Fig. 2. Structure of Local Duplex-Bus-Systems Another solution to the MTS-problem is to implement a general network structure. A fully connected network is not economical and not even reliable (Fig. 3): If one line fails the communication between 2 computers would stop. If the network is not fully connected messages have to be switched in the network nodes and by choosing alternate routes through the network very reliable logical connections arise.
II ___ • __ I
I I I
b. II Store whi l e Forward"
--+- - ~
............... __ '""
Fig. 4. Store- and Forward Switching
Decentralized Process Control Computer Systems
To shorten the total transit delay in the network one may leave the store-and-forward-principle and start to forward the packet as soon as the address of the destination arrives and the route to the next node can be selected (Fig. 4b). The disadvantage of this procedure is the fact that errors that occur between 2 nodes are not immediately detectable if the (wrong) message is forwarded to the next node link capacities may be wasted. But since the error rates are very low the chances of this rare occurence can be accepted in order to reduce transit delays. Fig. 4 shows the optimal case where the links between nodes are free for the message: Consider the fact that in a network that isn't fully connected each physical link is shared by several logical connections which results in waiting times for use of the physical lines which in turn extend the transit delay in an unpredictable way.
prob a b i 1 i t y th a t a lI e88 age i s no t yet 0.\ a rriv ed
0 .0 1
q t (d
I I I \ I \ I \ I I ~---~
----, I I
0 . 000 1
I \ I " "..... I I - - - - - j - - - - - - - ::..,.-I
I T .. ax
Fig. 5. Effect of Double-Route Transmission Fig. 5 shows (upper line) the probability over time that a message has not yet arrived at the destination. With a certain probability the message will not be received at all (e.g. because of a missing node or link). The only way to improve on this probability is to use two completely independent routes through the network for each message: If the failure probabilities for each route are independent values, the lower curve of Fig. 5 becomes valid and the probability that a message arrives within a maximum time now becomes very high - possibly higher than for a single physical line. To ensure that the two copies of a message use really exclusive routes new routing algorithms have to be developed. In a general network the only way to come to exclusive routes is to preselect both routes already at the messages'source - every
computer has to know the topology of the whole network. The messages do not carry a destination address but a sequence of digits which select - in each node they pass - one of several possible lines. As in the phone network one digit is stripped away at each node. One should not forget that a duplication of each message also duplicates the network traffic and lengthens waiting queues. The only way to shorten the queues is to provide high transmission capacities. But that requires not only high line capacities but also high processing capacities within the nodes. If one would use standard interrupt-driven serial interfaces and provide about 50% of the total processor capacity for switching purposes the total data rate crossing one node could not be higher than about 50-100 Kbit/sec - much less than the realizable line capacities. The only way to improve the performance is to install very fast specialized communication processors /7/ which offer sum-data-rates up to some M bit/sec. The development of such processors is a prerequisite for decentralized process control systems. Another essential requirement is the need for a standard-interface in order to connect computers of different suppliers to one decentralized system. All distributed systems offered on the market today have only one common feature: They are incompatible with all others! The reason is not an interest in preventing the coupling of different systems but the difficulties in defining a common standard. The definition not only has to include the physical interface but also the logical one - in the computer network terminology the word "protocol" is used for the mutual agreement on the communication structure. Protocols are defined on several hierarchical levels and standards should also be fixed for the highest levels possible (e.g. for common access method to data base systems). If one tries to implement a distributed system within different processors, n*(n-l) logical interfaces have to be provided before all systems can communicate. With a networkstandard only n logical interfaces have to be implemented. The ideal MTS in a plant should not only be installed for one purpose - very different applications like process control, acquisition production or personal data, access control etc. could use the same facilities. Therefore the MTS may be considered as an independent service: A local data network consisting of communication processors and transmission lines can be independently installed and amongst others - decentralized process control applications may use it.
APPLICATIONS The difference between theory and practical applications in decentralized processing is still very large. Systems offered on the market may be classified as follows: General purpose systems. They are offered by several suppliers and provide a transport system and (some of them) the feature of relocatable resources (i.e. a small computer can use a disk connected to another machine). Most applications today are transaction-oriented and not typical for process control: The MTS-performance is not yet sufficient. Systems, covering a class of applications. Here some successful I systems are available for process-control, one of them will be described in more detail later on. Systems for one special application. Normally the costs for the implementation are too high to be carried by one installation. However some restricted applications have been realized. One of the best-known application classes is control. Much theoretical work has been done to design hierarchically structured control systems under the minimal interface-condition /S/. They may be filled by decentralized computer systems and consequently some supplies offer such systems (e.g. Honneywell TDC 2000). The MTS here is bus-oriented and especially adapted to the requirements of control. A specialized software packet eases the implementation of individual applications. Another actual example of distributed process control applications is traffic control. Centralized computer systems for traffic control are very common and now decentralized systems with decentralized control are offered. One may use one computer system for each cross-roads which measures and controls the traffic locally. By communication with its' neighbours information about the traffic in the next section can be exchanged - the influence of the logical connection decreases with the distance between nodes: Here the functional optimum is identical with the operational one and the subprocesses have a well defined small interface to their neighbours /9/. A typical application class are supervisory systems for large buildings (e.g. Universities). They have the aim of minimizing energy consumption by supervising and controlling different subsystems like heating/cooling systems, air conditioning, elevators, electrical energy, illumination, security systems etc. The subsystems are spread over a large area and this is therefore a good application for distributed systems.
Today central computers typically manage these processes, sometimes supported by unintelligent local substations. The arrival of microprocessors started the development of systems with "decentralized intelligence". Again, some special conditions simplify the realization for these applications (like only limited requests for re s ponse times, moderate requirements on availability and a fixed task-distribution to the single computers). However large numbers of sensors and actuators have to be controlled (e. g. 10 000 - 20 000 in one installation). The following section describes such a system that is available on the market /10/. It is a hierarchical 3-levelsystem (Fig. 6).
I ntell igent Operator Console IOC
global control and opti_azition
data acquisation execution and supervisio n of cOllllllands III/lX.
To sensor., and
32 x 24
( . n . 32 x 24 }[ 96)
Fig. 6. Decentralized Supervisory System with Fixed Task-Distribution No homogenous MTS-network is used but a multibus-system corresponding to the logical level of the connected computers (no computer of level 2 may communicate directly with the level ~-system). The logical functions are delegated to the 3 levels on a fixed basis: Level 2-system. On the lowest level very basic functions are realized which are independent of the s ingle application. Tasks for analog and digital data acquisition are performed and commands to the outside world are executed and supervised. Some event-counters are realized by software-routines and the communication on a 4-wire-bus is controlled. The main difference in comparison with a hardware implementation with the same functions exists in very elaborate diagnostics that test all peripherals, processor and memory power supply (failsafe because of a battery-backup - the system is all-CMOS) and the communication lines. Therefore the level 2-system is also able to deliver exact diagnostic
Dece ntra li zed Process Contro l Comp uter Sys t ems
information to the controling level 1system which is very important because one installation may have some hundreds of level 2-systems. The hardware-realization consists of a 2-board-sandwich placed in a box that provides all conventional means for electrical installation. The system has a fixed number of I/O-points to and from the process. Level 1-system. The medium-level-systems are more intelligent and locally autonomous (a level-0-system is not always absolutely necessary). They control the buses to the level-2-system (max. 24 each) as masters and analyse and control all their status-messages. By limit checking of analog signals and analysis of digital signal "events" are generated that may be repeated to a system on the next level or that initiate a reacting sequence of commands; such command-sequences may also be started by predefined time-marks. Again a set of diagnostic routines is provided on that level in order to allow fast error detection and recovery. Programs for level 1-systems are defined by a table-driven "fill in the blancs"language that may be used by non-EDP planning engineers. The hardware is built with CMOS too therefore the same cheap fail-safe power supply can be used. No cooling-fans are necessary for the hermetically encapsulated sys tem that can thus be installed in a severe environment. The communication to the level-0-systems is again realized by a bussystem on which level 1-systems act as slaves. Level 0-system. On the highest level a larger process control computer (AEG 80/20) is used. It has to optimize globally (e.g. reduce heating power if the lights are switched on) and to process all accepted messages from the lower level systems. It prepares messages for operator-interaction and decodes operator commands to be transmitted to the executing systems. Also the programs for level 1-systems are prepared, translated and loaded down by the level 0-system that has a comfort a ble operating environment. The ~ys tem collects and analyses all diagnostic information concerning the system as well as the process. Because of reliability problems a small stand-by-unit (!ntelligent Qperators Console IOC) is provided that normally acts as a slave on the level 0-bus until the level 0-system fails. At this point the small unit becomes master on the bus, all messages arrive there and the commands are then given from there. For small applications the level 0-system may be substituted completely by an IOC. Since the computers on the lower levels have no operator-consoles or other peripherals some additional tools are necessary to
get and keep the system operational. It may be of interest that the expense for the development of these tools was about as high as for the basicsystem itself. Finally some results bear mentioning that originate from the decentralized structure (experiences gained with an installation of 250 level 2-, 15 level 1- and one level 0-system and about 15 000 1/0points). Naturally most of the usual cabling costs have been saved; it was very useful I in that it was possible to start with a small sUbsystem since some buildings had not been completed in time. During the installation phase many initial specifications and I/O-point-positions were changed - it was only a small job to correspondingly adapt the system topologically and functionally. To add new I/O-points or even complete level 2-systems did not effect other system-ports, the same being true for functional extensions. The elaborate diagnostics not only supported the installation of the computer system but also the work of the people that installed the lines to or from the process - errors could be found much faster.
CONCLUSION Al lot of theoretical work has been done on distributed processing for process control. Applications are still far behind the theoretical solutions. It may take another 5 years to get general purpose concepts for decentralized systems with distributed control into operation. One important prerequisit~ is the availability of efficient message transport systems and a better understanding of their manifold functions. Decentralized processing will be forced from two sides: Firstly by the experience with conventional process control computer systems that start to be decentralized as reported in this paper and secondly from the area of classical industrial electronics: More and more systems and even components are equipped with microcomputers that are able to communicate with higher-level-systems. Provided interface-standards are available both approaches (top-down and bottom-up) will meet and form the basis for future automation systems.
*) The author would like to express his gratitude to the "Bundesministerium fur Forschung und Technologie" Projekt Prozefilenkung mit DV-Anlagen (PDV) for the support of the re se arch reported in this paper.
G. F1!rber REFERENCES
Chang, S.K. (1976). A model for distributed computer design. IEEE transactions in system, man and cybernetics, SMC-5 No.6, 344.
/2/ Farber, G. (1978). Probleme der hierarchischen Rechner- und Prozessorstrukturen in der Automation. To be published in E + M-Journal· /3/ Holler, E. (1974). Koordination kritischer Zugriffe auf verteilte Datenbanken in Rechnernetzen bei dezentraler Vberwachung. GfKBericht Nr.1967, IDT Karlsruhe. /4/ Farber, G. (1977). Prozefirechnernetze flir Echtzeitanwendungen. In M. Syrbe and B. Will (Ed.), Automatisierungstechnik im Wandel durch Mikroprozessoren, Fachberichte Messen Steuern Regeln, Springer Verlag Berlin Heidelberg New York, pp. 410-428. /5/ Farber, G. (1977). Process control with minicomputer-networks. Proceedings of the COMN~T'77, Vol.2, John-Neumann Society.
/6/ Walze, H. (1977). Sammelleitungssysteme als Schllissel flir dczentralisierte Prozefiautomatisierung. In M. Syrbc and B. Will (Ed.), Automatisierungstechnik im Wandel durch Mikroprozessoren, Fachberichte Messen Steuern Regeln, Springer Verlag Berlin Heidelberg New York, pp. 429-445. /7/ Anonymous, (1976). Study "Kommunikationselement". Made by pes GmbH, Munich 1976. Available from GfK, Karlsruhe. /8/ Mesarovic, M.D., Takahara, Y., and Macko, D. (1970). Theory of hierarchical multilevel systems. Academic Press, New York and London. /9/ Sommer, D.A. (1976). Teilprozefibildung und Koordination der Teillenkungsaufgaben am Beispiel einer verkehrsabhangigen Signalsteuerung in StadtstraBennetzen. Diplomarheit, Institute for process control computer, Technical University Munich. /10/ Anonymous (1977). Ein Leitsystem mit dezentraler Intelligenz. Elektronik, 10, 54.