Designing an object—relation hybrid database for chemical process engineering

Designing an object—relation hybrid database for chemical process engineering

Compur. FOR AN OBJECT-RELATION CHEMICAL PROCESS Y. W. HUANG and $3.00 + 0.00 009%1354/88 C 1988 Pergamon Press plc HYBRID ENGINEERING L. T. FAN...

1MB Sizes 1 Downloads 80 Views

Compur.
in Great

Britain.

All

rights

Copyright

reserved

DESIGNING DATABASE> FOR

AN OBJECT-RELATION CHEMICAL PROCESS Y. W.

HUANG

and

$3.00 + 0.00 009%1354/88 C 1988 Pergamon Press plc

HYBRID ENGINEERING

L. T. FAN+

Laboratory for Artificial Intelligence in Process Engineering, Department of Chemical Engineering Durland Hall, Kansas State University, Manhattan, KS 66506, U.S.A. (Received

3 1 March

1988)

Abstract-Chemical process engineering activities, such as process and plant design, usually rely on a variety of information. A systematic representation of such information in a database is highly desirable. The relational data model has been recognized as the most popular database model amid the database community, due to its simplicity and sound mathematical foundation. Nevertheless, the simple relational scheme tends to generate a semantic gap between the data extracted from the database and a process engineer’s perception of these data. Advances in artificial intelligence (AI) have given rise to the notion of object in knowledge representation and AI programming. This notion has been amalgamated into the database design The resultant object-relation hybrid database can preserve semantics to the maximum extent possible; moreover, its integrity is easy to maintain. This paper illustrates an effective application of this database model in designing a prototype database for chemical process design. A survey of the available databases, mentioned above, has led to the following observations:

INTRODUCTION Effective data processing has always been a key to accomplishing tasks involving a massive amount of information. Chemical process engineering (CPE) activities typify such tasks; a variety of data, required in these activities range from a few-hundred for a simple equipment design to several-hundredthousands for a complete plant design. In spite of this, techniques widely the data management employed in the commercial field were not recognized to be useful for CPE applications until the mid- 1970s. The first implementations of database systems for CPE applications can be attributed to Chiyoda Chemical Engineering & Construction, Co. Ltd, Japan. Two database systems, CHEIS (Niida ef al., 1977) and DPLS were created based on the relational database technology (see e.g. Benayoune and Preece, 1987). Nevertheless, the projects were quickly terminated, allegedly due to the developments being ahead of the availability of suitable systems technology and to financial reasons (Angus and Winter, 1985; Benayoune and Preece, 1987). Other known CPE-oriented databases include PEDB by ICI (Cherry et al., 1982), DesignMASTER by ChemShare (Craft, 1985) and Prodabas by Prosys Technology (Angus and Winter, 1985). Among them, PEDB is the predecessor of DesignMASTER. Both arc fundamentally ad hoc distributed databases; no uniform data structure is reported to exist among nodes of individual distributed databases. The last, Prodabas, is one of the latest generalpurpose databases specially developed for chemical engineering process design; it is based on an extended relational database technology. For further review on these and other engineering database systems, readers are referred to Benayoune and Preece (1987).

On surface, the relational data model is useful for engineering applications. As stated earlier, the first CPE database design project was based on this model; the model is also employed in one of the latest CPE database designs. In reality, the relational data model is ineffective for engineering applications. Most successful CPE database systems are not based on the relational model; instead, they rely ,on ad hoc approaches. The deficiencies of the relational model arise from its incapability of preserving semantics of the stored data. Specifically, the mode1 cannot accommodate a wide spectrum of data types, intricate relationships the data, and the representation of among “knowledge” in addition to the storage of data, all of which are essential characteristics of an engineering database in general and a CPE database in particular (Hartzband and Maryanski, 1985). It is worth noting that these same deficiencies are found in other commercially popular database models, such as the hierarchical and the network models. when employed in designing an engineering database. To reduce or eliminate deficiencies of the available database models, a novel database model has been proposed recently (Huang, 1987). Two distinct notions, namely relation and object, are amalgamated in the model for database design. The former is desirable for its simplicity, popularity and sound mathematical foundation; the latter for its power to encapsulate various forms of data, information and knowledge in a uniform “frame” data structure. The amalgamation has been achieved by designing an object-relation hybrid database in an object-oriented multiparadigm programming environment.

tTo whom all correspondence should be addressed. 973

974

Y.

W.

HUANG

and

This paper presents an application of the novel object-relation hybrid database model in designing a prototype database for process design. The notions of relation and object are first reviewed briefly; their similarities and differences are identified. This is followed by elaboration on a two-step approach to the generation of a prototype database for chemical process design; the approach is originally proposed in this work. The capability of the resultant database to store complex data, e.g. a complete process flowsheet, is demonstrated. Sample data are presented in parallel with a discussion of the important issues concerning the integrity of a database, such as dependencies, legal value constraints and integrity rules. An object-oriented relational algebra, designed to work with the objects in the hybrid database, is also proposed originally. The algebra contains operations of projection, selection, Cartesian product, union and difference; thus, it is relationally complete (Osborn and Heaven, 1986). RELATION

AND

Feature

of relution

A relation is a subset of a Cartesian product of domains (Codd, 1970). Let R be a relation on (A,, A,). We call (A,, Al, . . . , A,) the intension or A Z,...r scheme of R where the A, represent the attributes which are not necessarily distinct. The extension of R(A) comprises a set of tuples in a 2-D table format; each tuple manifests itself as a row and each attribute as a column (see Table 1). The domain associated with Ai is denoted by Di. The value appearing in any tuple in column i must be drawn from the set D,. Thus, R is a subset of D, x D, x . . x D,. In addition, within any given relation: I. Tuples are not duplicated. This property follows from the fact that each relation is a mathematical set. Each tuple in a relation can be uniquely identified by a selected, but minimal,

Table

Although these constraints do not limit the power of the relational data model, they do imbue the model with the following three intrinsic weaknesses: 1. Explicit natural bindings, either at the intension

level or at the extension level, do not exist amid relations. 2. Complex information cannot be stored in relations due to the first normal form constraint. 3. The database integrity is difficult to maintain because keys are composed of attributes, whose values may change.

CHEM.LF HOCHZCHZOH HOCHZCHZOH HOCHZCHZOH HOCH2CHZOH CH30H CH30H CH30H CH3OH

Ethykne.glycol Ethancdiol Ethylenc.glycol Ethanediol Methanol Methyl.alcohof Methanol Mcthyl.alcohol

I

For illustration, a sample database storing partial information for plant design is represented in the forms of a relation in Table 1. Relation CHEMICALS comprises a single tuple (row) for each distinct chemical with one of its line formulas (CHEM.LF), names (CHEM.NAME) and process flowsheets (PFS) and the associated process types (PROCESS). Additional discussion on the subject of relational databases can be found in numerous texts and publications (see e.g. Codd, 1970; Fagin, 1977; Date, 1986; Plouffe, 1987). Feature

qf object

While no formal definition is available, an object in a fully-fledged object-oriented multiparadigm programming environment is often. found to exhibit the following features (see e.g. Stefik and Bobrow, 1985):

1. Relation

Coiumn 2 CHEM.NAME

Colllmn

Row I Row 2 Row 3

set of attributes from the relation known as a key. 2. All attribute values a;e atomic. This property follows from the fact that each tuple is also a mathematical set. The relational algebra as well as most other data manipulation languages for the relational model have adopted the classical set theory as their foundation. Operations such as union and intersection defined in classical set theory simply cannot effectively process sets within a set. The atomic domain restriction is known as the first normal form constraint. 3. Tuples are unordered. 4. Attributes are unordered.

OBJECT

The notions of relation and object are apparently similar; nevertheless, a casual examination will immediately reveal that they are hardly compatible. The purpose of this review is to identify a common ground between the two notions to achieve a successful amalgamation.

L. T. FAN

1. Objects are divided into two categories, namely, class and instance. A class is a description of one or more similar objects sharing attributes; an

CHEMICALS Column 3 PFS PFSOOOl PFSOOO1 PFSOOO2 PFSOOO2 PFSOOO3 PFSOO03 PFSOOO4 PFSOO04

PROCESS Ethylene.oxide.hydration Ethvlene.oxide.hvdration Eth;lene.chloroh;dnn Ethylcne.chlorohydrin CO.H2.synthesis CO.HZ.synthesis Wood.distillation Wood.distillation

Designing a hybrid database for CPE instance class.

is a specific

example

or instance

of

a

975

Table 2. Comparison between the objects and relations Object

2. An object class or instance is uniquely identified by a name. 3. An object manifests itself as a frame, accommodating slots as the frame’s attributes. Slots in an object are not ordered. 4. The domain of a slot is specified by facets (slot’s attributes). The value for a slot can be an atomic value, a list, or a variable pointing to another data structure, depending on the domain being defined. 5. Slots are usually defined in a class; they will be inherited across the class-subclass and class-instance hierarchies. Various inheritance mechanisms can be specified by slot facets. 6. Procedures, known as metho&, can be defined in an object to manage the data stored in slots. Objects interact by calling each other’s methods through the mechanism of message sending. 7. Additional computations may be activated when a slot is created or deleted or when its data are fetched or stored by attaching active values, or demons, to the slot (Stefik et a/., 1986). Additional discussions on the features of an object from a chemical engineering perspective are available in the recent paper on DESIGN-KIT (an objectoriented environment for process engineering) by Stephanopoulos et al. (1987). Comparison of these features with those of the relation indicates that an object is a more desirable vehicle to represent real-

Data structure Intension (scheme) Communication among intensions Extension Extension ID Intension attributes Atlribute specifications Queries and operations

Slot: NAME value: unknown Slot: PFS Value: unknown

.'

\

\

Table Relation NA Tuple Vallles Simple domain attributes NA Separately defined DML

world information; it offers a richer framework to encapsulate complex information. Figure 1 depicts an object representation of the same information stored in Table 1. A comparison of objects and relations is summarized in Table 2 from a database standpoint. INITIALIZING THE HYBRID DATABASE DESIGN-CONSTRLJCI-ING A SCHEMA

Two methods have been proposed to initiate an object-relation hybrid database design. These methods are based on the concepts of synthesis and decomposition widely employed in the relational database theory (see e.g. Ullman, 1982). Relations obtained through either method can be guaranteed in the third normal form (3NF). These relations comprise semantically coherent attributes, thereby minimizing difficulties such as redundancy and anom-

Slot:

/ I

Relation

NA = not applicable.

bject: ember

slot; LINE-FORMULA Value: unknown

FKlllle Object class Message sending, inheritance, active-values Object instance Object name Slots containing values or functions Slot attributes (facets) Stored as methods within objects

EOOOl of: CHEMICALS LINE.FORHULA HOCH2CH2OH

Slot: PFS Value: ((Ethylene.oxide.hydration PFSOOOl) (Ethylene.chlorohydrin PFSOOO2))

lot: PFS Value: ((CO.HZ.Synthesis PFSOO03) (Wood.distillation PFSOOOI))

Instances

Fig. 1. A possible

object-oriented

representation of the relation values as a list.

CHEMICALS-representing

multiple

Y. W. HUNG

976

alies encountered in the database management. proposed methods are outlined as follows:

The

1. Synthesis method. Given a set of data, classes are defined in accordance with the 3NF relations attained synthetically, as through the Bernstein algorithm (Bernstein, 1976). 2. Decomposition method. Given a class mapped from a real-world entity, the class, when needed, will be decomposed into subclasses and/or independent classes. The class decomposition will follow the same guideline employed in decomposing relations. Using either method, it has been proposed that the definition of primary key he modified after the initial database scheme is established. A primary key is a key selected as the primary direct retrieval mechanism. The notion of primary key should not be overstretched, which could damage the correspondence between relations and entities (objects). Each relation in a database should represent a certain type of entity as generally in the case of 3NF relations; a primary key then serves as a unique identifier to the tuples (instances) of the relation. We suggest that a semantically artificial variable name be used as the key to instances of an object class; advantages of this modification will become clear in the following section. The present application resorts to the second method in initializing the database design. Three major object classes, CHEMICALS, PFSs (process flowsheets) and PROCESS.LJNITS, are defined; all are created by direct mapping from real-world entities. Figure 1 depicts objects class CHEMICALS along with two of its instances; each instance is uniquely identified by an artificial variable name. Figure 2 depicts object classes PFSs and PROCESS.UNITS. The definitions indicate that each flowsheet refers to a process specific chemical and it contains a number of process units (COMPONENTS). Each process unit, in turn, is an object with specific characteristics such as dimension and a governing equation.

FINALIZING

THE HYBRID ASSURING THE

DATABASE INTEGRITY

DESIGN-

A database is not considered well-designed unless its integrity can be guaranteed. Integrity refers to the accuracy or correctness of the data in the database wherein a set of constraints is given. Dependencies are examples of such constraints. In a relational database, functional dependencies are expressed implicitly by a set of relations obtained through synthesis or decomposition. In other words, a functional dependency exists between every key and any of other attributes in an individual relation. Apart from functional dependencies, at least five of constraints are frequently additional types imposed on a relational database to assure its

and L. T. FAN bbject: UeMberlX:

PFSB PFSOOOl PFSOO03

PFSOOOZ PFSOOOQ

Slot:

COMPONENTS Inheritance: CWERRIDE."AL"E Valueclass: PROCESS."NIT Value: UnknOwn

L

PROCESS.UNITS REACTORS SEPARAT?RS

Slot:

OIMENSION Value:

WTXERS

I

unknown

Slot:

POSITION Value: unknown

Slot:

INPUTS Value:

Slat:

OUTPUTS Value: ""know"

Slot:

GOVERNING.EQN Value: unknown

unknown

J Fig. 2. Object-oriented representations of process flowsheets (PFSs) and the process unit contained in the PFSs. integrity. They can be categorized into multivalued dependencies, legal-value constraints, entity integrity rules, key integrity rules and referential integrity rules (see e.g. Ullman, 1982; Date, 1986). The first two, the multivalued dependencies and legal-value constraints, are concerned with the actual values stored in the database. The last three, the entity integrity, key integrity and referential integrity rules, are intrinsic properties of the relational model. It should be emphasized that hardly any current database system, engineering or business oriented, actually operates with all these constraints taken into consideration. Multivalued

dependencies

Multivalued dependencies (MVDs) are a generalization of functional dependencies. They provide a necessary and sufficient condition for a relation to be decomposed into a set of fourth normal form relations for a relational database (Fagin, 1977). An MVD occurs when the values of one set of attributes determine multiplely those of a second set of attributes, and the dependency is orthogonal to the balance of the attributes in the relation. Examples of MVDs are: CHEM.LF

+-+CHEM.NAME multidetermines

(read CHEM.LF CHEM.NAME)

Designinga hybrid database for CPE and CHEM.LF

+-r{PFS,

Between these two approaches, it is recommended that the first be employed when multipIe values can be represented by a simple list, and the second otherwise. Accordingly, the representation illustrated in Fig. 3 is adopted in the current prototype database design.

PROCESS},

in the relation appearing in Table 1. The MVDs imply that a chemical with a certain line formula can have different names, but how it is called is irrelevant to the design of its flowsheets. Two approaches are proposed to express MVDs in an object-relation hybrid database: I. Representing multiple values as a list. Figure 1 depicts a possible frame structure to represent CHEMICALS as an object. The NAME and PFS slots of an instance of CHEMICALS could have a value of (Methanol Methylalcohol) and of ((CO.H2.synthesis PFS0003) (Wood.distillation PFS0004)), respectively. Notice that a nested list is needed to portray appropriately a process flowsheet of a specific process type. 2. Representing multiple values as an object class. As exhibited in Fig. 3, an instance of CHEMICALS has a class, e.g. EOOOl.PFSs, as the value for its PFS slot; this class, in turn, contains, among other slots, slots PROCESS.TYPE and COMPONENTS. Each instance of the class EOfJOl.PFSs then represents a process flowsheet of a specific process type in manufacturing methanol. Note that this representation yields a natural integration for relevant data items such as PROCESS and PFS. bject: embera:

CRWICALS EOOOl

Legal-value

constraints

A legal-value constraint is concerned with a restriction on an attribute (slot) value to be in a range, to be in a specific data structure, or to express a relationship among various attributes. In a slot of the object, legal-value constraints are expressed by the combined use of inheritance mechanisms, facets and active values. Inheritance is a concept that arises in all object-oriented systems (Steflk and Bobrow, 1985). It helps define a prototype for similar objects. Different inheritance mechanisms specify various allowances for an object to deviate from its prototype. Facets and active values haved their roots in an access-oriented programming paradigm (Stefik et al., 1986). They allow additional computation to be activated when a slot is created or deleted or its data are fetched or stored; the computation can be used to validate the legal-value constraints. As an example, the PFS slot of the CHEMICALS object class depicted in Fig. 3 has a number of facets, namely Inheritance, ValueClass and Active Value. The inheritance OVERRIDE.VALUE indicates that the

,

HO001

bjeet: ember

Slot: LIIE.FOR""LA Inheritance: OVERRIDE.VALUE Value: unknown

.

Slot: NAME Inheritance: OVERRIDE.VALUE Value: unknown

977

.

EOOOl of: CHEMICALS

Slot: LINE.FORM"LA Inheritance: OVERRIDE.VALUE Value: HOCHZCRZOH

Slot:

NAME

Inheritance: OVERRIDE.VALUE Value: (Ethylene.glycol Ethanediol)

_=z

Slot: PFS Inheritance: OVERRIDE.VALUE ValueClass: PFSs Activevalue: S"BCLASS.0Nr.Y Value: unknown

slot:

PFS

Inheritance: OVERRIDE.VALUE Valueclass: PFSa Activevalue: S"!aCLASS.ONLY Valuer EOOOl.PFSs--__ --_

embers:

PPSOOOl

PFSO002

I

bb ject Member

Slot: PROCESS.TYPE Inheritance: OVWRIDE.VALUE Value: unknown Slot: CO&wONENTS Inheritance: OVERRIDE.VALUE Value: unknown

Fig.

3.

Another

possible

: PFSOOOl of:

EOOOl.PFSs

Slot: PROCESS.TYPE Inheritance: OVERRIDE.VALUE Value: Ethylene.oxide.hydcation t

Slot: COMPONENTS Inheritance: OVERRIDE.VALUE Value: REACTOR1

SEPARATOR1

object-oriented representation of the relation multiple values as an object class.

...

CHEMICALS-representing

Y. W.

978

HUNG

and L. T. FAN

slot value for a specific instance can override an inherited value. The value-class PFSs constrains the slot value to be a subclass or instance of the class PFSs. The active value SUBCLASS.ONLY attached to the slot will be activated when a value is put into the slot; the activated procedure ensures the value to be a subclass of PFSs. Further discusion on inheritance, facets and active values to control slot values is found elsewhere (see e.g. Stefik and Bobrow, 1985; Bobrow et al., 1985; Stefik et al., 1986).

same line formula. The active value is of the type AVPUT, indicating that it is triggered whenever a new value is assigned to the slot. The template defining the active vaiue is written in a pseudo Lisp-based, object-oriented language.

Entity

Referen tiai integrity

procedure assures that values of the chosen slot are not repeated amid all instances within the scope. Figure 4 illustrates an example of using an active value to maintain the key integrity, thereby assuring that no two chemicals in the database will have the

integrity

The entity integrity rule in the relational data model requires ah attributes in the primary key of a relation to be nonnull (Date, 1986). The rationale for this rule follows the proposition that a tuple in a relation corresponds to an entity in the real world, which must have a unique identification of some kind. Since primary keys perform the unique identification in the relational model, any null value in them would indicate the existence of nonexisting entities; this is a contradiction. Maintaining the entity integrity remains a problem without a straightforward solution in the relational model, in spite of its simplicity and significance. The problem becomes trivial when the definition of the primary key is modified as suggested. By adopting variable names to be the key of instances, which

The referential integrity rule is directly concerned with the existence of foreign keys in the relational model. A foreign key is an attribute or attribute set in one relation Rl; the value of the attribute at any moment is required to either be wholly null, if permissible, or match that of the primary key of some relation R2. RI and R2 need not be distinct. Due to the nature of engineering data, the foreign keys are ubiquitous in an engineering database. In the light of the definition, a database designer is required to respond to the following questions of a foreign key in the database (Date, 1986). 1. Can the foreign key accept null values? 2. What will happen if the primary key of the target of the foreign key is updated? 3. What will happen if the target of the foreign key is deleted?

correspond to tuples, maintaining a unique definition for each instance can be effectively enforced. A variable is either defined or not defined; it cannot be both.

-EOOOl

:

Key integrity The key integrity rule demands that no two tuples of a relation can have the same values for every attribute in the primary key. The rule ensues from the uniqueness property of the primary key. It assumes a new meaning when the object concept is amalgamated into relations and the definition of the primary key is modified. In an object-relation hybrid database, they key integrity rule becomes significant when duplicate values in a certain slot are not permissible among object instances, e.g. replicate line formulas are unacceptable. A possible way to maintain the key integrity in the object-relation hybrid database involves a two-step process. First, identify the scope of integrity, i.e. the scope within which the value of a selected slot is not allowed to repeat. This is accomplished by identifying the class whose instances and its subclasses’ instances cannot have duplicate values in the slot. The identified class should be in the highest possible inheritance lattice. position in the hierarchical Second, attach an active value to the slot at the this active value is inherited identified class; downward along the inheritance lattice. Hereafter, a procedure defined by the active value is invoked whenever a new value is assigned to the chosen slot in any instance within the scope. Execution of the

CtlENICAL.~-- - ‘\

MO001

: -1

c_____________~_______----~. ‘-._

-_

Jbbject:

CHEMICALS

... ...

sl.ot :

AN (LAMBDA I*

(SELF

(COND

AVPW

No.Duplicate.Value

SLOT

NEWVALUE

OLDVALUE

Define a local function to a a speciEied slot of different

(DEPINEQ

I

LINE.MHULR

(GEP.SLOT.VALUE (GET.VALUE

((MEMBER NEWVALUE (MAPCAR

OBJ

Y

Template OBJECT, get value objects)

Erom

(OQJ) SLOT)))

/I (REMOVE

OBJECT 'INSTANCE))

(PRONPTPRINT OLDVALUE) (T N!MVAL"E)))

'GET.SLOT.VALUE)) "Duplicate value

is

rejected")

Fig. 4. Example of using an active-value to maintain the key integrity; the scope of the integrity is the only variable in the template.

Designing

a hybrid

The updating (question 2) and deleting (question 3) invoke one of the following responses: (a) CASCADE-the operation cascades to update or delete those matching tuples containing the foreign key; (b) RESTRICT-the operation is restricted to cases where no matching tuples contains the foreign key; and (c) NULL-the value of the foreign key is set to null, if allowed, in all matching tuples. Any of these responses necessitates an extensive monitoring of interactions between relations. A foreign key in an object manifests itself as a slot whose value is required to be wholly null or match the variable name (key) of a second object. The second object can be a class or an instance. As depicted in Fig. 5, the PFS slot is a foreign key whose values either are wholly null or match those of the primary key of the class PFSs. Note that in the figure, variables are employed to serve both as the linkages between primary and referenced objects and as the buffer between the identifier and attributes. The three questions raised in the above paragraph have been resolved recently. The response to the first question is affirmative. In fact, whether a foreign key should accept null values or not is no more than a legal-value constraint. A response to the second question and that to the third depend on the desired behavior of the database, e.g. CASCADE, RESTRlCT or NULL. In Fig. 6, we present an example of using active values to implement a

CHEMICALS-

,.*EOOOl N' - - -.n0001-

:

;

I

__’

____---------__________________ e*

,/’ ________________________________~____.

EOOOl.PFSs;:_ '\_

~HoOO1.PFSs‘ &,PFSOOO3

PFSB

,:-

PFS0004

_____----* ______~______r--------,'

EOOOl.PFSs 1

___ Fig. 5. Example

of a foreign

key.

: ,

I’

database

for CPE

979

cascade deletion. The cascade deletion removes a process flowsheet from the database when the corresponding chemical is removed. In the example the PFS slot, a foreign key, has an active value REF.TO of the type AVPUT. The active value cross-references a primary and a referenced object whenever a new value is assigned to the slot. This is necessary if the result of deleting the referenced object need be noticed instantaneously by its primary object or objects (before the content of the foreign key is dereferenced). An additional slot REF.BY is automatically created in the referenced object class to store the outcome of cross-referencing. This slot has an active value of the type AVREM. The active value deletes all matching primary objects when the referenced object is deleted. The REF.BY slot is designed to maintain the referential integrity in the background; it hides from other operations of the database. MANIPULATING

THE DATABASE

We are now in a position to discuss the manipulative aspect of the object-relation hybrid database. The emphasis will be on shedding light on the characteristics of this database from two perspectives, namely the perspective of “object” and that of “relation”. Object-oriented

programming

The notion of objects has resulted from research on the semantics of programming languages (Stefik and Bobrow, 1985). In fact, the word “object” almost “object-oriented programming”. always implies It should be emphasized that object-oriented programming per se is a discipline tedious to master. It is impractical to expect an engineer to manipulate a database through extensive object-oriented programming. Fortunately, most of the recent objectoriented programming environments provide userfriendly interfaces to facilitate the manipulation of information in a complex object world. We expect such a formal language, underlying an environment in which the hybrid database resides, to be exploited in manipulating the proposed hybrid database. While an extensive discussion of object-oriented languages is beyond the scope of this paper, the major strength and weakness of these languages should be delineated in terms of database management. The strength lies in that these languages are especially effective for retrieving information on the infrastructure of a database. i.e. the logical relationships between an object class (relation) and its subclasses or instances (tuples). A conventional relational database does not have such an infrastructure. Table 3 highlights some object-oriented programming primitives effective for performing database. queries. The weakness arises from a fact that the languages have not been specifically designed for database management. While offering an effective framework for knowledge-

Y. W. HUANG and L. T. FAN

980 .' CHEM~CCALS'\ \

eEOOO1

_'- - -MoooI

_ - -.PFSOOOl E0001.PFSe'-= - - -.PFSOOOZ

/‘: 1

:

_ - -~PFsOoo3 PEsS/ /r

"0001.PFS8-=__ -

,'

PFSOOO4

‘.._

‘,,

--I__ ----_A____

CHEMICALS

bject:

------______

>(Ob ject

Slot:

I PFSe

I

REF.BY

hidden: YES Activevalue:

ANA-

CASCADE.DEtL.ETION

Template LAnBDR (SELP SLOT /MAPCAR

Fig. 6. Example

of using active-values

based programming; they provide only a mediocre structure for database management. Our extensive experiences with two of the most popular objectoriented programming tools, LOOPS (Bobrow et al., 1985) and KEE (IntelliCorp, 1986) indicate that caution need be exercised in using the languages or tools built on them. An effective and appropriate exploitation of the languages or tools requires judicious scrutiny of their conceptual foundations and limits. Relational

algebra

Though important

object-oriented the in terms of enhancing

Table 3. Object-oriented Object-oriented

programming

pronramminn

infrastructure

is

the semantics

and

prunitives

to implement

(OBJECT.SLOTS Object Inherited?) (PUT.VALUE Object Slot Value Trigger.ActiveValue?) (SLOT.FACET Slot) (WHEREKOBJECT Obiect)

CASCADE

deletion.

integrity of the current database, often it is not of concern to an end user of the database. To this end, an object-oriented relational algebra is proposed. The merit of this algebra lies in its service as a mathematically sound vehicle for rapid execution of simple queries. Simple queries do not involve mixed relational algebraic operations; they require little knowledge about the infrastructure of the database, at least from the end user’s viewpoint. The proposed object-oriented relational algebra differs from the classical one in two respects. First, the operands of this algebra arc object classes instead of relations. Second, the algebraic operations are defined within their operands. To be exact, each algebraic operation

effective for retrieving information database

urimitive

(COPY.OBJECT Object Copy.lnstances? N‘XSupers New.Instances) (COPYXLOT FranObject From.Slot TaObject To.Slot Facets.lncluded) (GET.VALUE Object Slot Triggcr.ActiveValue?) (OBJECT-SUPERS Object) (OBJECT.CHILDREN Object Type)

OBJECT) (GET.VALUE OBJECT SLOT) 'DELETE."NIT))

regarding

the infrastructure

of an object-relation

hybrid

Descriotion Makes and/or

a copy of Object; have new ones

the resultant

object

can retain original

Copies the From.Slot of From.Object 10 To.Object; and contains facets specified by Facets.Included

supers/instances

the slot is renamed

to ToSlot

Gets the value from the Slot of the Object; the operation may or may not trigger the active values attached to the Slot Returns the sutxr classes of the Obiect Returns the children of the Object; ihe Type of the children may be SUB-CLASS, INSTANCE. or ROTH Returns the slots of the Object; inherited slots may or may not be included Puts the Value into the Slot of the Object; the operation may or may not trigger the active values attached to the Slot Returns the facets of the Slot Returns the nosition of the Obiect in the inheritance lattice

Designing

a hybrid

database

for CPE

981

Table 4. Classes of objects used to construct data models for process engineering activities labridPed from Stenhanotwulos ef al.. 1987)

-

A.

B. C.

i.

(REACTION) List of attributes Subclasses:

CHEMICALS

(NAME

. Attributes

(SINGLE-STEP-IRREVERSIBLE-REACTION) (SINGLE-STEP-REVERSIBLE-REACTION) (TWO-SEQUENTIAL-IRREVERSIBLE-REACTIONS) (TWO-PARALLEL-IRREVERSIBLE-REACTIONS)

appears as a method of any database object, and thus it accommodates the peculiarities of the object. The proposed set of relationally complete operations includes projection, selection, Cartesian product, union and difference. Projection. Two mechanisms, namely project and deep-project, have been conceived for this operation. Projection of an object class with the mechanism “project” yields a new object class; the derived class has the same structure as that of the original except that it contains only the selected slots and no duplicate instances in terms of the slot vaIues. On the other hand, projection of an object class with the mechanism “deep-project” involves a more complex operation. It unfolds a slot upon projection, provided that the domain of the slot is an object class. Upon unfolding, the slot reveals a set of slots in the succeeding (deeper) layer, which are attributes of the object designated by the slot value. This mechanism enables the projection operation to traverse the logical linkage between objects; it performs a projection in the depth direction from the perspective of the original class. “Deep-projection” yields a new object class containing selected slots from all layers and no duplicate instances in terms of the slot values. For example, a projection defined by: (project

1

(GENERIC-PROCESS-UNIT) Input-output topology State characterization: (intensive variable) Extensive state: (characterization of size) Model: Physical description: ICOIl: Design methodology: RUICS. Assumptions: (GENERIC-EQUIPMENT) List of attributes (INPUT-OUTPUT-TOPOLOGY) List of attributes Subclasses: (SISO-TOPOLOGY) (TTTO-TOPOLOGY) (SEPARATOR-TOPOLOGY) (MIXER-TOPOLOGY)

PFS)),

yields: (((Ethylene.glycol Ethanediol) (EOOOl .PFSs)) ((Methanol Methyl.alcohol) (MO001 .PFSs))

. I. The projection retrieves all the names of the chemicals stored in the database along with the pointers to their respective process flowsheets. On the other

hand,

a deep-projection

(project CHEMICALS CESS.TYPE))),

defined

by:

(NAME

(PFS

PRO-

yields: (((Ethylene.glycol Ethanediol) (Ethylene.oxide.hydration Ethylene.chlorohydrin)) ((Methanol Methyl.alcohol) (CO.H2 synthesis Wood.distillation))

1. This projection traverses a pointer and retrieves all the names of the chemicals along with the process types producing the individual chemicals. Selection. The object selection operator selects ail instances (objects) from a base class such that some slots in each instance satisfies a certain selection criterion. Conventionally, a selection criterion is of expressions involving arithmetic composed comparison operators such as >, = and -z ; the operands of these operators must be atomic values. This convention is now imposed with the task of coping with the fact that slots can hold an atomic value, a variable name (acting as a pointer), or a list, simple or nested. However, if nested lists are excluded from use, the convention for setting selection criteria can be retained. A possible solution is to define a special selection method, e.g. “hybrid-selection”; it is explained as follows. A selection criterion, after asserted in a conventional form, is stored as a facet of the selected slot at a class level; this facet is inherited by all instances of the class under consideration to pose constraints on selection. By asserting selection criteria as facets of slots, a selection operator can be interpreted differently in different slots, depending on other

982

Y. W. HUANG and L. T. FAN

facets of the slot. For example, a selection: (select CHEMICALS

(> NAME

methane))

can have two interpretations depending on whether NAME is a single-value slot or multivalued slot. If it is a single-value slot, the selection will retrieve all instances of CHEMICALS, having a name with the name (alphabetically) greater than “Methane”. In contrast, if NAME is a multivalued slot, the same selection retrieves all instances of CHEMICALS that as one of their names. In other have “Methane” words, the arithmetic > operator is interpreted as the set operator 2. The switching between arithmetic and set selection is totally hidden inside the method “hybrid-selection”. Other possible selection mechanisms may include “selection” and “deep-selection”. The former treats all slot values as atomic in testing a selection criterion. The latter bears the same connotation of “deep-projection”; it traverses a logical link between objects for deep selection. Cartesian product. The product of two object classes, 0, with m slots and 0, with n slots, yields a new object class consisting of a set of instances; values of the first m slots of each instance come from those of an instance in 0, and those of the last n slots come from those of an instance in 0,. The position of the derived object class in the inheritance lattice is determined by the definition of classes 0, and 0,. Cartesian product is rarely used by itself in relational database; manipulating a traditional instead, a combination of the product, selection and (possibly) projection yields the more frequently-used join operation. Nevertheless, this operation emulates only inefficiently the function of a pointer to link two relations. With the pointer being a standard feature in the object-relation hybrid database, the mixed algebraic operation such as that involved in join For example, the deepbecomes redundant. projection, capable of traversing the logical linkage between objects, involves the join operation from a viewpoint of the traditional relational algebra. This example demonstrates the merit of the objectoriented relational algebra: a mixed algebraic operation can be represented by a simple query. Union. The union of two object classes is formed by combining the instances of one class with those of a second class to produce a third class. For this operation to make sense, the classes must be union compatible. As a definition, objects are considered union compatible if their lowest common super-class exists and contains at least one slot. Based on this definition, object union operation yields an object class whose structure is the same as that of the lowest common super-class of the operand classes. The derived object class will have all instances of its operands, but the instances have only the slots that are found in the lowest common super-class. Duphcate instances in terms of slot values are eliminated. [email protected] The difference of two object classes

yields a third object class containing instances which are found in the first object class but not in the second. The classes must be union compatible. If permanent effect of the operation is desired, the derived object class will be a peer class of the lowest common class of the operands; its instances have only the slots that are found in the lowest common super-class. EXTENSION We have presented a general framework for designing a novel database for chemical process design. Based on this framework, the following efforts are being undertaken to complete the design of the database. I. Collect well-established “standard” or conventional chemical process flowsheets. This is to recognize previous knowledge accumulated by process design domain experts. 2. Identify products, both final and intermediate, defined by the process flowsheets. Categorize the products. Define subclasses for the object class CHEMICALS based on the result of the categorization. 3. Identify processing units in the process flowsheets. Categorize the units, e.g. REACTORS, SEPARATORS, MIXER and PUMPS. Define subclasses for the object class PROCESSUNITS based on the result of the categorization. The characteristics, e.g. the governing equations, heuristic or algorithmic, of each category of the units are stored at the subclass level. Notice that more than one set of governing equations may be defined for a processing unit. Additional methods can be defined to control the “firing” of the equations to solve the “routing problem” (Seider et al., 1979). For example, while both a rigorous plate-to-plate and the Fenskecalculation algorithm plate calculation algorithm and the FenskeUnderwood equation can be defined within the object class DISTILLATION.UNIT, an independently-defined method will store the instruction so that the latter is fired when a quick result is required. Represent each end product as well as each important intermediate product as an object subclass of instance of an appropriate CHEMICALS. Represent each flowsheet as an object instance of the class PFSs. To be specific, a flowsheet is a composite object comprising a variety of processing units (reactors, separators, etc.) as subobjects. A composite object is related to its subobjects via foreign keys. A flowsheet is connected to the chemicals it produces via foreign keys. Represent each processing unit as an object instance of an appropriate subclass of PROCESS. UNITS

Designing a hybrid database for CPE 7. Check the integrity of the database. It is worth mentioning that Stephanopoulos er al. (1987) have tabulated a set of object classes defined in DESIGN-KIT; the resultant table is partially reproduced as Table 4. The object classes in the table constitute one possible result that may be attained from steps 3 and 5. While the object classes such as those in Table 4 can be defined by a direct mapping of real-world abstract or physical entities as discussed in Constructing a Schema, we wish to reiterate that their definitions will not be complete until their integrity is assured. The work outlined above coincides with the first two phases of the work involved in building a knowledge base (see, e.g. Barr and Feigenbaum, 1981); it encompasses tasks of knowledge acquisition and representation. At this juncture it becomes obvious that when the notion of objects is amalgamated into a database design, the spirit of knowledge engineering is actually infused into a database. The resultant database preserves not only the data but also the knowledge of a certain domain. To render the present hybrid database effective in process design, an intelligent user interface is under construction for the database. The interface is built on top of the proposed hybrid algebra. The database will urovide to the process designer a comulete flowsheet in a graphic form if the process to be designed has a standard flowsheet. Otherwise, the database will respond with some possible configuration. The designer then can interact with each process unit (an object from the database standpoint) to modify the design. Clearly, this step requires the incorporation of fuzzy set notions. CONCLUDING

REMARKS

The benefits of amalgamating the notions of objects and relations in designing a database for CPE applications have heen discussed. It has been demonstrated that these two notions complement each other. On one hand. the normalization theorv of the relational mode1 lays a mathematically-so&d foundation to categorize process engineering data items. The resultant categorization is represented in a oreliminarv obiect inheritance lattice. which can be f&e-tuned in accbrdance with the integrity rules. The relational algebra suggests a different perspective to perform effectively simple queries concerning the objects. On the other hand, semantics of process engineering data can be preserved maximally, if not completely, in a database when relations are reincarnated and various features of objects are the database. The resultant amalgamated into database preserves not only the data but also the knowledge; it is ready to interact with a process designer. Separately, each of the two notions has its limits; jointly, their power increases synergistically. Acknowledgements-The

A.

Unger,

Department

authors thank Dr Elizabeth of Computing and Information

983

Sciences, Kansas State University, for her inouts to the manuscript. This is contribution 88-116-J, Department of Chemical Engineering, Kansas Agricultural Experiment Station, Kansas State University, Manhattan, KA 66506,

1lCA

-.-., _.

REFERENCES Angus C. J. and P. Winter, An Engineering Database for Process Design. PSE ‘85: The Use of Computers in Chemical Engineering, pp. 593405. Pergamon Press, New York (1985). Barr A. and E. Feigenbaum, The Handbook of [email protected] Intelligence. Vol. 1. Kaufmann, Los Altos (1981). Benayoune M. and P. E. Preece, An Engineering Data Model for Computer-Aided Design Databases. PSE ‘85: The

Use

of

Comouters

in Chemical

EnPineerine.

DD.

581-592. Pergamon Press, New York (1985). -’ - . Benayoune M. and P. E. Preece, Review of information management in computer-aided engineering. Comput. them. Engng 11, l-6 (1987). Bernstein P. A., Synthesizing third normal form relations from functional dependencies. ACM Trans. Database System. 1, 277-298 (1976). Bobrow D. G., K. M. Kahn, S. M. Lanning, S. Mittal and M. J. Stefik, LOOPS Buttress Release Notes. Xerox PARC, Calif. (1985). Cherry D. H., J. C. Grogan, G. L. Knapp and F. A. Perris, Use of data bases in engineering design. Chem. Engng Prog. 78, 59 -67 (1982). Codd E. F., A relational model for large shared data banks. ACM Commun. 13, 377-387 (1970). Craft J.. The Impact of CAD and Database Techniques in Process Engineering. PSE ‘85: The Use of Computers in Chemical Engineering, pp. 565-579. Pergamon Press, New York (1985). Date C. J., An Introduction IO Database Systems, 4th Edn, Vol. 1. Addison-Wesley, Reading, Mass. (1986). Fagin R., Multivalued dependencies and a new normal form for relational databases. ACM Tram Database System 2, 262-278 (1977). Enhancing Hartzband D. J. and F. J. Maryanski, knowledge representation in engineering databases. IEEE Comput.

18, 39-48

(1985).

Huang Y. W., Amalgamating objects and relations in database design. Master Project, Department of Computing and Information Sciences, Kansas State University (1987). IntelliCorp, KEE Software Development System User’s Manunl. Vol. 3.0. IntelliCoro.x Mountain View. Calif. (1986). ’ Niida K.. H. Yaai and T. Umeda, An application of data base management system (DBMS) to- Process design. Comput. them. Engng 1, 3340 (1977). Osborn S. L. and T. E. Heaven, The design of a relational database system with abstract data types for domains. ACM

Tram Database

Svstem.

2. 357373

(1986).

Plouffe W., Databases in a distributed plant knvirbnment. presented at the FOCAPO Con/., Park City, Utah (1987). Seider W. D., L. 0. Evans, B. Joseph, E. Wong and S. Jirapongphan, Routing of calculations in process simulation. Ind. Engng Chem., Process Des. Deo. 18, 292-297 (1979). Stefik M. and D. G. Bobrow, Object-oriented programming: theme and variation. AI Mug. 6, 4%62 (1985). Stefik M., D. G. Bobrow and K. M. Kahn, Integrating access-oriented programming into a multiparadigm environment. fEEE Software Jan, IO-18 (1986). Stephanopoulos G., J. Johnston, T. Kriticos, R. Lakshmanan, M. Mavrovouniotis and C. Siletti, DESIGNKIT: an object-oriented environment for process engineering. Comput. them. Engng 11, 655674 (1987). Ulllman J. D., Principles of Database Systems, 2nd Edn. Computer Science Press, Rockville (1982).