Software Architectures and Tools for Computer Aided Process Engineering B. Braunschweig and R. Gani (Editors) 9 2002 Elsevier Science B.V. All rights reserved.
C h a p t e r 4.3: S T E P for the P r o c e s s I n d u s t r i e s R. Murris
4.3.1 THE CHALLENGE Although the discipline of process engineering is well supported by computer-enhanced tools and software architectures, they are mainly focused and thus applicable to only a small part of the process plants lifecycle. The average 30 years process plant life span of a typical refinery generally outlives all operating systems, software and databases designs, which were in use when the first design drawing was conceived. The longevity of process plants and the ongoing functional and physical changes during its life thus set out a real challenge for systems and software designers alike. During the conceptual engineering phase process plant design information flows intensively between specialized departments and once the plant is built operational data is recycled to be used as input for plant optimization and revamps. So data from many sources, created by a multitude of applications must be exchanged not only within the company but also increasingly with supporting companies like equipment suppliers, engineering and maintenance contractors. In the 1980's the exchange data consisted mainly of non-intelligent 2D schematics and 3D-shape representation and was dominated by the de-facto standards like AutoCAD's dxf/dwg formats and the more official but ambiguous standard called IGES. During the mid 1980's Computer Aided Design support tools became mainstream in the process plant design cycle and the requirements for data exchange between the variety of implementations grew outside the scope and capabilities of the mentioned (de-facto) graphics data exchange standards.
4.3.2 HISTORY In the 1980's the US department of defense (DoD) was one of the firsts to identify the growing data exchange problem. Efficient logistics around defense material and spare-parts for military weapons systems is one of the pre-requisites for fast military deployment. In 1980 the DoD relied on approximately 3500 third parties supplying equipment and parts. CALS (Continuous Acquisition and Logistics Support) was founded to reduce the huge amount of paper-based documents, the poor communication between
294 the suppliers and to reduce the cycle time between order and supply. The CALS exchange requirements stretched beyond the boundaries of graphics alone. It also included business and product data like purchase orders, invoices, product specification, sound, images etc. Later additional requirements like 'openness' and 'software vendor independence' were added. In 1986 these became the founding principles of the ISO-10303 (STEP)standard.
4.3.3 ISO-10303 o r S T E P The STandard for the Exchange of Product model data (STEP) is the acronym for the full ISO-10303 standard. STEP consists of parts each having a distinct function within the formalization of exact data exchange within a predefined set of business activities (the scope). All the steps from the outset of the data exchange scope until t h e validation of an application protocol in the form of data test-suites are supported by this standard. The methods for creating an application protocol are all part of the STEP standard and are validated and quality assured through the ISO technical sub committee TC184/SC4. The acceptance of an application protocol as an International Standard (IS) although supervised by ISO is done by the p-members. These p e r m a n e n t members (countries) of ISO take part in several balloting procedures carrying the protocol from a NWI (new work item) to the final IS. Due to the formal and lengthy procedures it can take up to seven years for a particular application protocol to acquire the IS status.
4.3.4 A P P L I C A T I O N P R O T O C O L 221 (AP221) In 1995, subsidised by the EC the ESPRIT III project ProcessBase started with the development of the ISO-10303 Application Protocol 221. The title "Functional data and 2D schematic representation" relates to the conceptual and detailed design and engineering stages of the process plant lifecycle. During these activities functional design data and their representations on 2D drawings are frequently passed within the plant-owner's organisation and between contractor and client. Both roles of client and contractor are played by plant-owners, engineering contractors and equipment suppliers in any combination. Due to the increasing variety in software applications and the difference in data models owned and operated by these organisations, standardised exchange of data within the scope of AP221 became an urgent requirement. When the ProcessBase project ended in 1997 there was still no AP221 s t a n d a r d but it was embodied and further developed by a consortium of European companies. The development of an Application Protocol such as AP221 has turned out to be far more difficult t h a n initially ex-
295 pected. Due to the enormous amount of different data types (meta-data) within the AP's scope it deemed impossible to create one single exchange datamodel to cover it all. The second challenge was the r e q u i r e m e n t of full lifecycle support. Thus within a single data exchange multiple versions of e.g. the same equipment must be passed without having r e d u n d a n t data. Especially the combination of Data Explicitly Defined Once (DEDO) and versioning within the same exchange scenario resulted in a new modelling concept, which will here be referred to as the Associative Paradigm. This modelling concept has been implemented, with some exceptions, throughout the full requirement model (Application Reference Model) of AP221. The methodology to handle huge amount of process p l a n t functional data and schematics using this Associative P a r a d i g m is an unexplored area. However implementations with traditional relational data tables or objects oriented counterparts have proven until today to be either too inflexible or just impossible.
4.3.5 T H E A S S O C I A T I V E P A R A D I G M Traditionally we find requirements of data models expressed in languages like EXPRESS (ISO-10303-11), NIAM, ERD and lately in UML. These descriptive methods, sometimes accompanied with a strict syntactic language all use a basic set of axioms to model models. In some cases they also define a symbol library to conveniently communicate the results of the modelling process to the end-user or developer. In the case of EXPRESS the symbolic notation capability (EXPRESS-G) is far less t h a n what is supported by the modelling language itself. However a clever subset has been defined which combines powerful, compact presentation of the model without the end-viewer losing overview. F u r t h e r details when required can be extracted from the EXPRESS code. So when modelling languages fulfil the same requirements, is there a common meta-model for these languages as well? Probably there is and in the case of the EXPRESS language this meta-model is quite compact when we only consider the data requirement part and exclude the syntactic rules t h a t deal with rules, procedure, functions and uniqueness constraints. Basically the EXPRESS language defines three relation types and five basic type of modelling objects. The basic five objects are 'schema', 'entity', 'type', 'attribute' and 'data type'. The three basic relation types are 'specialisation', 'composition' and 'possession'. A schema can be composed of 0..n entities and types, an entity can possess 0..n attributes and an attribute possesses exactly one data type. Each entity can have 0..n specialisations to other entities. The following table expresses the relations between the association types and the object types.
Table 1 M, ;ta model definitions for the E X P R E S S language Object Type Association Type Role Object type R o l e Entity Whole Composition Part Schema Type Whole Composition Part Schema Data Type Whole Composition Part Schema Possessor P o s s e s s i o n Possessed Attribute Entity Possessor P o s s e s s i o n Possessed Data Type Attribute Entity Supertype Specialisation Subtype Entity
In 'natural' (English) language the first record under the header reads as: "'An instance o f ' 'object type' 'encoded as' 'schema' 'plays the role as' 'an instance of 'role' 'encoded as' 'whole' 'in relation with' 'an instance of 'association type' 'encoded as' 'composition' 'with' 'an instance of 'object type' 'encoded as' 'entity' 'playing the role as' 'an instance of 'role' 'encoded as' 'part'". This includes all the implicit relations made explicit between the table header information (meta-data) and the records represented in lines under the header. When we omit all this implicit information we get more readable sentences, however information is lost when only these sentences are transferred to any receiver who is not aware of this table header and structure. The information in the records of table 1 are made more readable and are given in table 2:
Table 2 Natural language equivalent of the information in table 1 'Schema' 'can be composed of 'entities' 'Schema' 'can be composed of 'types' 'Schema' 'can be composed of 'data types' 'Entity' 'can possess' 'attribute' 'Attribute' 'can possess' 'data type' 'Entity' 'can be specialised as' 'entity'
Clearly the described cardinality constraints are missing from the previous table although defined in the explanatory text. This means that the n a t u r a l language equivalents of the association types all begin with 'can...' whereas some relations require 'must...'. An example is the relation between 'attribute' and 'data type'. In table 1 this could be achieved by inserting two extra columns (cardinality 1 and 2, abbreviated as C1 and C2) with the possible values 1 and range value 0...n.
Table 3 ex 9anded with Object type Role Whole Schema Whole Schema Whole Schema Possessor Entity Possessor Attribute Supertype Entity
cardinalities C1 AssociationType 1 Composition 1 Composition 1 Composition 1 Possession 1 Possession 1 Specialisation
C2 0..n 0..n 0..n 0..n 1 0..n
Role Part Part Part Possessed Possessed Subtype
Object Type Entity Type Data Type Attribute Data Type Entity
The explicit s e n t e n c e s will really become u n r e a d a b l e b e c a u s e of t h e size, however the n a t u r a l l a n g u a g e e q u i v a l e n t w h e r e t h e s a m e implicit inform a t i o n is o m i t t e d become:
Table 4 Natural language equivalent of table 3 'One' 'schema' 'can be composed of 'zero to many' 'entities' 'One: 'schema' 'can be composed of 'zero to many' 'types' 'One' 'schema' 'can be composed of 'zero to many' 'data types' 'One' 'entity' 'can possess' 'zero to many' 'attributes' 'One 'attribute' 'must possess' 'one' 'data type' 'One' 'entity' 'can be specialised as' 'zero to many' 'entities'
The definitions a n d r e l a t i o n s in t a b l e s 1-4 all describe w h a t can be i n s t a n t i a t e d a n d not w h a t is i n s t a n t i a t e d . So the i n f o r m a t i o n e x p r e s s e d in t h e s e tables can be u s e d as a ' t e m p l a t e ' for i n s t a n t i a t i o n of i n d i v i d u a l records in some i m p l e m e n t a t i o n . Or the o t h e r w a y around: i n d i v i d u a l s m a t c h i n g the c o n s t r a i n t s can be classified as v a l i d a t e d m e m b e r s of t h a t t e m p l a t e . One can e i t h e r explicitly r e l a t e a n i n d i v i d u a l w i t h a n i n s t a n c e of association type 'classification' or allow software a u t o m a t i c a l l y e s t a b l i s h this association by i n v o k i n g a m a t c h i n g procedure. W h e n we r e - r e a d the t a b l e s we p r o b a b l y infer t h a t t h e records all belong to a specific i n f o r m a t i o n set. So implicitly t h e y are g r o u p e d to convey some m e a n i n g . W h e n the word ' t e m p l a t e ' w a s m e n t i o n e d p r o b a b l y some grouping w a s m e a n t . In this case the inclusion of the i n f o r m a t i o n e x p r e s s e d by the six different sentences. So w i t h o u t w r i t i n g it out explicitly we p r o b a b l y i n f e r r e d t h a t the scope or view of the t e m p l a t e i n c l u d e d only those six expressions. One can a s s u m e h o w e v e r t h a t the t a b l e s 1...4 c a n c o n t a i n millions of records a n d w i t h o u t a m e c h a n i s m to group t h e s e e x p r e s s i o n s no s e g r e g a t i o n can be m a d e a n d its use would be limited. Let's a s s u m e t h a t the definition of this t e m p l a t e is p a r t of the context of s o m e t h i n g we h a v e defined as ' E X P R E S S modelling' a n d r e c r e a t e t a b l e 4 w i t h t h i s information: 9
Table 5 Expanded with context 'One' 'schema' 'can be composed of 'zero to many' 'entities' 'is included in' 'EXPRESS modelling' 'One' 'schema' 'can be composed of 'zero to many' 'types' 'is included in' 'EXPRESS modelling' 'One' 'schema' 'can be composed of 'zero to many' 'data types' 'is included in' 'EXPRESS modelling' 'One' 'entity' 'can possess' 'zero to many' 'attributes' 'is included in' 'EXPRESS modelling' 'One 'attribute' 'must possess' 'one' 'data type' 'is included in' 'EXPRESS modelling' 'One' 'entity' 'can be specialised as' 'zero to many' 'entities' 'is included in' 'EXPRESS modelling' ..........
W h a t does t h e association type i n s t a n c e e x p r e s s e d as 'is i n c l u d e d in' actually include? All the s e p a r a t e concepts e x p r e s s e d as words w i t h i n s i n g l e
298 quotes or t h e explicit a s s o c i a t i o n i n s t a n c e s b e t w e e n t h o s e concepts as described above in the v e r y long s e n t e n c e or both? To include all t h e explicit a s s o c i a t i o n s b e t w e e n t h e concepts we first m u s t m a k e t h e m visible. I n t h e n e x t table t h e first record (sentence) of table 5 is t a k e n a n d t h e explicit associations are m a d e visible (excluding the association w i t h t h e context):
Table 6 M a k i n g the sentence explicit Explicit Sub phrase 'One' 'is count of 'schema' 'One' 'schema' 'counter' 'is role of 'object' 'is count of 'counted' 'is role of'subject' 'is count of 'can be composed of 'whole' 'is role of 'object' 'can be composed of 'part' 'is role of 'subject' 'zero to many' 'enti- 'entities' 'must be in range' 'in between' 'zero' and 'many' ties' 'subject to range' 'is role of 'object' 'must be in range' 'range' 'is role 'must be in range' of 'subject' 'minimum count' 'is role of 'object' 'in between' 'maximum count' 'is 'in between' role of 'subject'
Now we b r i n g all the a s s o c i a t i o n s to t h e left a n d use a different n o t a t i o n form. H e r e b y we i n t r o d u c e t h e t e r m 'FACT' a n d m a k e use of line or record i d e n t i f i e r s to allow for e a s y referencing. T h e p r e s e n t e d n o t a t i o n form is i d e n t i c a l to t h e d a t a section body defined in I S O - 1 0 3 0 3 p a r t 2 1 :
Table 7 Fact table for one sentence without context Facts # I=FACT('is count of,#2,#3); #2=FACT('is role of,'counter','object'); #3=FACT('is role of,'counted','subject'); #4=FA CT(# 1,'one',' schema'); #10=FACT('must be in range',#11,#12); # 11=FACT('is role of,'subject to range','object'); # 12=FACT('is role of,'ran~e','subject'); #13=FACT('in between',#14,#15); # 14=FACT('is role of,'minimum count','object'); # 15=FACT('is role of,'maximum count','subject'); #16=FACT(#13,'one','many'); # 17=FACT(# 10,'entities',# 16); #20=FACT('can be composed of,#21,#22); #2 I=FACT('is role of,'whole','object'); #22=FACT('is role of,'part','subject'); # 100=FA CT(#20,#4,# 17);
By following ~ 1 0 0 = F A C T ( # 2 0 , # 4 , # 1 7 ) ' one c a n fully r e c o n s t r u c t the sentence by t r a v e l l i n g t h r o u g h the table following t h e references. F o r comp l e t e n e s s we now also a d d the context a n d the i n c l u s i o n of the j u s t described # 100=FACT:
299 Table 8 Fact table for one sentence with context Facts # 100=FACT(#20,#4,#17); #200=FACT('is included in',#201,#202); #201=FACT('is role of,'includer','object'); #202=FACT('is role of,'included','subject'); #300=FACT(#200,'EXPRESS modelling',#100);
This whole procedure could be repeated for the next five sentences of table 4 and results in a similar listing found in table 7 and 8. Although we have reduced the r e d u n d a n c y of d a t a considerably, there are still multiple instances of the same terms. This is in conflict with the DEDO r e q u i r e m e n t . Thus we have to define all the facts a n d strings only once. For this purpose we introduce a new entity type t h a t can facilitate this r e q u i r e m e n t . Table 8 is further normalised using DATA as the placeholder for the text encoding. Table 9 A n almost fully normalised dataset Facts and data # 100=FACT(#20,#4,# 17); #200=FACT(# 1000,#201,#202); #20 I=FA CT(#1001,# 1002,#1003); #202=FACT(#1001,#1004,#1005); #300=FACT(#200,#1006,#100); #1000=DATA('is included in'); #1001=DATA('is role of); # 1002=DATA('includer'); # 1003=DATA('object'); # 1004=DATA('included'); # 1005=DATA('subject'); # 1006=DATA('EXPRESS modelling');
All the t e r m s within single quotes are encoding of concepts w i t h i n the scope of a p a r t i c u l a r language (English). Another r e q u i r e m e n t of n e u t r a l d a t a exchange is t h a t any receiving party, irrespective of the l a n g u a g e b a c k g r o u n d can reconstruct the information. Although ISO-10303 allows encoding of the strings in almost any c h a r a c t e r set this still won't satisfy the DEDO r e q u i r e m e n t . Suppose we w a n t to use two l a n g u a g e s (English and Dutch) in the same exchange set a n d e.g. use the Dutch t r a n s l a t i o n 'is rol van' which has the same m e a n i n g as 'is role of. W h e n table 9 is used and added with the extra Dutch t e r m this could become:
300 Table 10 Dutch term 'is rol van' a d d e d Facts and data #201=FACT(# 1001,#1002,# 1003); #202=FACT(#1007,#1004,#1005); l! old one was #202=FACT(#1001,#1004,#1005); # 1007=DATA('is rol van'); Although there is no duplication of t e r m s in this set there seems to be a duplication of concepts. Because 'is role of and 'is rol van' are the same concepts but in a different language. To overcome this, a possible solution is t h a t #1001=DATA('is role of) and #1007=DATA('is rol van') are referring to a language independent 'handle'. A simple but effective solution is the introduction of a unique identifier to which both DATA s t a t e m e n t s refer by using an identification relation: Table 11 A d d i n g identification of the concept Facts and data #201=FACT(# 1009,# 1002,#1003); #202=FACT(#1009,#1004,#1005); #1001=DATA('is role of); #1007=DATA('is rol van'); # 1008=DATA('is identification of); #1009=DATA(' 1'); #3000=FACT(#1008,#1001,#1009); #3001=FACT(#1008,# 1007,#1009);
The association type of 'is identification of is not modelled the same way as is done with other association types in the previous examples because one would rapidly r u n into recursion. Therefore the object and subject side roles of the FACT 'is identification of are directly pointing to DATA instances instead of other FACTs as in the examples above. The same identification relation and FACTs based on the same principles can be added for all DATA terms. In the end all FACTs attributes will refer to unique identifiers expressed as DATA('I').. DATA() instances, except for 'is identification of. All information expressed above seems to be time independent. No explicit expressions can be found t h a t refer to any time concept. In traditional data models and implementations t i m e s t a m p s are found to record either the conception data/time of the record or a t i m e s t a m p for w h a t is represented by the record or both. Thus it would be logical to add t i m e s t a m p s attributes to both FACTs a n d DATA entities. At least for the DATA entity this is not necessary because they r e p r e s e n t strings t h a t are considered to be always valid. The information is not captured with the DATA but with FACTs. Assume t h a t all the FACTs records are stripped from the examples above or from a hypothetical large dataset. W h a t r e m a i n s is a long list of strings, r e p r e s e n t i n g all words, n u m b e r s etc. It would resemble the
301 index of a book b u t w i t h o u t the content or any page references. In principle it doesn't convey a n y information. The i n f o r m a t i o n in this associative p a r a d i g m is stored as FACTs. So can we add the time recording to FACTs a n d t h e r e b y s u p p o r t versioning a n d time d e p e n d e n t v a r i a n c e s on e.g. design data? It would implicate t h a t w h e n a c e r t a i n fact becomes obsolete at a c e r t a i n point in time we m u s t have a m e t h o d to t e r m i n a t e its existence. This r e q u i r e s a n extra times t a m p a t t r i b u t e to be able to record its t e r m i n a t i o n date/time. One could also simply delete this fact from the i n f o r m a t i o n set. H o w e v e r t h e n there would be no way to c a p t u r e history a n d go back to e a r l i e r versions of e.g. design data. It is clear t h a t i m p l e m e n t i n g this solution will not fulfil the lifecycle r e q u i r e m e n t . In principle versioning i n f o r m a t i o n a n d the a c c o m p a n i e d t i m e s t a m p series are not aspects of the real or a b s t r a c t world we are t r y i n g to r e p r e s e n t w i t h a d a t a set. The time c o n t i n u u m is an i n t e g r a l p a r t of existence because w i t h o u t it t h e r e is no existence. T h u s if we w a n t to record information about w h e n e.g. the design of p a r t i c u l a r piece of e q u i p m e n t came into existence we are a d d i n g date/time i n f o r m a t i o n a b o u t its creation process. So let's a s s u m e t h a t we w a n t to record t h a t a c e r t a i n fact from table 11 s t a r t e d its life at M a r c h 21 st 2001 at 12:34:23.000 t h e n t h e table could be e x p a n d e d w i t h the following records:
Table 12 Time information added I Facts and data ~ I - = ~ - C T---(~009,# 1002,# 1003); L#4000=FACT('starts life at',#4001,#4002); L#4001=FACT('is role of,'object','object'); ~_#4002=FACT('is role of,'time', subject'); [ #4003=FACT(#4000,#201,#4004); l,i#4004=DATA('2001-03-21T 12:34:23.000');
In table 12 the old n o t a t i o n (see table 7 a n d 8) is u s e d to m a k e it more readable. H o w e v e r full n o r m a l i s a t i o n is a s s u m e d . W i t h a n o t h e r set of FACTs a n d DATA one could also m a d e explicit w h a t type of c a l e n d a r a n d time encoding has been used for DATA record #4004. The existence of the # 2 0 1 = F A C T has now been recorded a n d a s i m i l a r scheme can be used to explicitly record its t e r m i n a t i o n date/time. To m a k e it explicit t h a t s t a r t a n d end of life FACTs are p a r t of a specific s n a p s h o t (or version) the same m e t h o d is u s e d as w i t h inclusion of FACTs in a context. Table 8 shows this for the context of ' E X P R E S S modelling' b u t s i m i l a r a context for a particular version or even v a r i a n t can be constructed. W h e n t h e r e is a requirem e n t to record the deletion of a FACT b e t w e e n two versions or v a r i a n t s one could achieve t h a t u s i n g one of two methods. You can e i t h e r decide to create a d a t a / t i m e t e r m i n a t i o n FACT in the context of the last vers i o n / v a r i a n t or exclude t h a t FACT using a n explicit 'excluded from' FACT in the new v e r s i o n / v a r i a n t . In both cases no i n f o r m a t i o n is lost, history
302 r e m a i n s intact and can be fully reconstructed. It seems a contradiction but in the associative p a r a d i g m deletion m e a n s addition of facts about the deletion process. So even with this fully normalised model and the fulfilment of multilingual expression a n d versioning capability without violating the DEDO principles can we now actually convey information without any ambiguity? This is surely depending on the way the receiver i n t e r p r e t s the association types a n d the d a t a within the context given. This methodology of full normalisation will result in facts and data defined only once. One can also imagine t h a t descriptions can be added following the same methodology. Although it could support more u n a m b i g u o u s i n t e r p r e t a t i o n of the received information one can never rule-out errors. In the examples above we try to explicitly define information about a subset of the EXPRESS modelling language functionality. However w h e n we have to deal with the explicit definition of all information in a process plants design cycle, including its full design history it would be more t h a n useful to have some kind of reference dictionary. A dictionary t h a t can be used as lookup table for terms and references to minimise the differences b e t w e e n the intended m e a n i n g and the actual perceived meaning by the receiver. The upcoming AP221 s t a n d a r d in its current state has only about 200 entities a n d 200 different association types. The model has been designed to be very flexible and describes high-level concepts such 'individual', 'class', 'activity', 'physical object', 'event' a n d 'aspect'. In the AP221 s t a n d a r d however one cannot find e.g. entities called 'centrifugal pump', 'vessel' or 'column', pieces of e q u i p m e n t regularly found in process plants. The a m o u n t of e q u i p m e n t and p a r t types for which we w a n t to record the existence is currently exceeding 5.000. It would be impossible to add all these 'classes' as explicit entities into the AP221 model. ISO procedures only allow updating of the s t a n d a r d every three years. So i n t e r m e d i a t e updates and additions m u s t be put in a queue until the next release of the standard. In the m e a n time however new e q u i p m e n t and p a r t s are invented and introduced on a daily basis. To allow faster update and distribution of the dictionary of process plant classes it has been decided to m a i n t a i n the dictionary as a separate register. A set of process p l a n t related classes and definitions have been created in the past six years using the knowledge of 100+ design experts all over the world and currently we recognise more t h a n 17.000 classes. An I n t e r n e t portal (www.STEPlib.com) facilitates the m a i n t e n a n c e and updates of this dictionary better known as STEPlib. The combination of the associative paradigm, a high level model such as AP221 at its base and the solid definition of almost any piece of e q u i p m e n t m a i n t a i n e d in STEPlib should facilitate u n a m b i g u o u s exchange of almost any information between p a r t n e r s in the process industries.