Collaborative visualization of engineering processes using tabletop augmented reality

Collaborative visualization of engineering processes using tabletop augmented reality

Advances in Engineering Software 55 (2013) 45–55 Contents lists available at SciVerse ScienceDirect Advances in Engineering Software journal homepag...

2MB Sizes 0 Downloads 7 Views

Advances in Engineering Software 55 (2013) 45–55

Contents lists available at SciVerse ScienceDirect

Advances in Engineering Software journal homepage:

Collaborative visualization of engineering processes using tabletop augmented reality Suyang Dong a,⇑, Amir H. Behzadan b, Feng Chen a, Vineet R. Kamat a a b

Department of Civil and Environmental Engineering, University of Michigan, 2340 G.G. Brown, 2350 Hayward, Ann Arbor, MI 48109-2125, USA Department of Civil, Environmental, and Construction Engineering, University of Central Florida, 4000 Central Florida Blvd., Orlando, FL 32816-2450, USA

a r t i c l e

i n f o

Article history: Received 27 June 2012 Received in revised form 30 August 2012 Accepted 23 September 2012 Available online 9 November 2012 Keywords: Tabletop augmented reality Visualization Simulation Planar tracking library Validation Collaboration

a b s t r a c t 3D computer visualization has emerged as an advanced problem-solving tool for engineering education and practice. For example in civil engineering, the integration of 3D/4D CAD models in the construction process helps to minimize the misinterpretation of the spatial, temporal, and logical aspects of construction planning information. Yet despite the advances made in visualization, the lack of collaborative problem-solving abilities leaves outstanding challenges that need to be addressed before 3D visualization can become widely accepted in the classroom and in professional practice. The ability to smoothly and naturally interact in a shared workspace characterizes a collaborative learning process. This paper introduces tabletop Augmented Reality to accommodate the need to collaboratively visualize computer-generated models. A new software program named ARVita is developed to validate this idea, where multiple users wearing Head-Mounted Displays and sitting around a table can all observe and interact with dynamic visual simulations of engineering processes. The applications of collaborative visualization using Augmented Reality are reviewed, the technical implementation is covered, and the program’s underlying tracking libraries are presented. Ó 2012 Elsevier Ltd. All rights reserved.

1. Introduction During the past several years, engineering systems have been rapidly growing in terms of complexity, scale, uncertainty, and interdisciplinary. In this regard, construction systems and projects are no exception. Most construction projects involve parallel assembly-like complex processes that interact in a dynamic environment. The circumstances in which these processes take place often become more complicated due to unforeseen conditions, deviations from project plans, change orders, and legal issues. Continuous planning and control of construction thus occurs throughout the lifecycle of construction projects. Construction planning and control can take place at the project level and/or the operation level [1]. At the construction project level, a planned facility is broken down into activities, each of which maps to a physical project component (e.g., Floor 6 Plumbing) or to a major time-consuming process (e.g., order and install compressor). At the operation level, planning and control are concerned with the technological methods, number and type of resources, and logical strategies that are needed to accomplish an activity or a group of related activities (e.g., install ventilation ducts and connect to compressor). Since projects must ultimately be constructed, success in ⇑ Corresponding author. Tel.: +1 734 764 4325; fax: +1 734 764 4292. E-mail addresses: [email protected] (S. Dong), [email protected] (A.H. Behzadan), [email protected] (V.R. Kamat). 0965-9978/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved.

projects is entirely dependent on the success of construction at the operation level. Proper operations planning is therefore a necessity and a challenging task that can be substantially improved by using techniques such as Discrete Event Simulation (DES), which is a powerful objective function evaluator that is well suited for the design of construction operations [2]). DES entails the creation of models that represent how construction operations will be performed. Once the models are created, the modeled operations can be simulated and yield numerical measurements which typically include time, cost and production rates. The measurements usually indicate important parts of the operations with potential for improvements that may result in cost or time savings [3].

1.1. Description of the problem The developer of a computer simulation model often has misconceptions about how an actual operation takes place in the field [4]. The process of recognizing and correcting such errors is termed validation, which determines whether simulation models accurately represent the real-world system under study [5]. Validation exercises are typically carried out by consulting field experts or decision makers who are intimately proficient at the operations of the actual system being modeled and simulated. The participating field experts often do not have the means, the training and/or the time to validate simulation models based solely


S. Dong et al. / Advances in Engineering Software 55 (2013) 45–55

on numerical and statistical output typically produced by simulation software [6]. Potential practitioners are therefore always skeptic about simulation analyses and have little confidence in their results. This lack of credibility is a major deterrent hindering the widespread use of simulation as an operations planning tool in the construction industry [7–9]. Operations-level visualization explicitly represents the interaction between equipment, labor, materials, and space (e.g. laying bricks, lifting columns). This visualization approach is especially powerful when there is a need for model developers to communicate the simulated processes to field experts by highlighting operational details such as the maneuverability of trucks and backhoes in excavation areas and the deployment of cranes and materials in steel erection. This visualization paradigm can thus help model developers and field experts in validating operational concepts, checking for design interferences, and estimating overall constructability [10]. Besides its usefulness in validating simulated processes, operations-level visualization is also helpful for educating engineering students about construction processes. According to a student survey (Fig. 1), more than 90% of those who responded indicated that

they learn better when the instructor uses 3D representations or visualization to teach engineering concepts and theories. Although instructional methods that take advantage of visualization techniques have been around for several decades, the methods still rely on traditional media and tools. For example, students who take a course in construction planning may use drawings, scheduling bar charts, sand table models, and more recently, 3D CAD models. However, none of these techniques is capable of effectively conveying information on every aspect of a project. For instance, 2D or 3D models do not reflect temporal progress, while scheduling bar charts do not demonstrate the corresponding spatial layout. There are two major categories of 3D visualization that can be used to reconstruct engineering operations for validation, education and training purposes, and to facilitate the study of environments in which such operations take place: Augmented Reality (AR) and Virtual Reality (VR). AR is the superimposition of computer-generated information over a user’s view of the real world. By presenting contextual information in a textual or graphical format, the user’s view of the real world is enhanced or augmented beyond the normal experience [11]. The addition of such contextual information spatially located relative to the user can assist

Fig. 1. A survey of undergraduate civil, environmental, and construction engineering students revealed that a large percentage of students support the prospect of reforming current instructional methods.

S. Dong et al. / Advances in Engineering Software 55 (2013) 45–55


Fig. 2. AR supplements the real world with relevant synthetic information.

in the performance of several scientific and engineering tasks. For this reason, AR-enabling technologies have been researched in an increasing number of studies during recent years. AR is different from VR, a visualization technology that has been around for several decades. As shown in Fig. 2, unlike VR, AR does not completely replace the real world, rather the real world is supplemented with relevant synthetic information, and thus real and virtual objects coexist in an augmented space [12]. 1.2. Main research contributions One of the distinct advantages of AR is that it can enable the communication and discussion of a validation analysis or educational training using a collaborative environment, where the field experts or students can quickly appreciate the visual simulation displayed and are able to interactively participate in a discussion that helps understand, validate and improve the simulated engineering processes. From the point of view of collaboration, however, there are still outstanding challenges that need to be addressed in the context of realizing the full benefits of graphical visualization in engineering education and practice. These challenges and the associated knowledge gaps have been the major motivation for the presented research. The developed ideas are implemented in ARVita (acronym for Augmented Reality Vitascope) in which multiple users wearing HMDs can observe and interact with dynamic simulated construction activities overlaid or ‘‘placed’’ on the surface of a table. Users, such as model developers, field experts, or students can work across the table face-to-face, shift the focus of shared workspace interactively, and jointly analyze and validate dynamic engineering operations. In the following Section, a summary of previous efforts to incorporate AR visualization techniques into construction and civil engineering education and practice will be presented. In each case, the methodological benefits and limitations will be also described in an effort to highlight the significance of the presented work. 2. Importance of the research 2.1. Virtual reality visualization of engineering operations Visualization of simulation operations can be of significant help in the validation of discrete-event processes models [4]. This is especially true in construction where field experts and decision makers are supposed to participate in the validation procedure but are not necessarily familiar with simulation principle and have great difficulty in interpreting numerical outcomes. The design and analysis of construction operations using simulation is of value only if the insights gleaned are used in making decisions and increasing understanding (i.e., they are credible). Visual simulation at the operations level can be an effective means of achieving this [5].

The potential of VR visualization in animating simulated construction and civil engineering operations has been investigated before. Kamat and Martinez [2] designed the VITASCOPE (acronym for VIsualizaTion of Simulated Construction OPErations) visualization system for articulating operations-level construction activities using a comprehensive and extensible authoring language for depicting modeled processes along a simulated project timeline. The VITASCOPE authoring language is an abstract layer to make the visualization engine (VE) independent of any particular driving processes (e.g. a specific simulation system). Incoming operation events and data generated by simulation models, hardware controls, or real-time sensors can be adapted to conform to the VITASCOPE syntax, and can be fed into the dynamic VE [10]. It is used to interpret the instruction sets and render the depicted activities sequentially in the virtual environment, and thus makes VITASCOPE capable of visualizing simulated construction operations in smooth, continuous, and animated 3D virtual worlds.

2.2. Collaborative validation through AR visualization Despite its efficacy of graphically validating simulated construction processes, additional challenges need to be overcome to make VITASCOPE user-friendly in a collaborative environment. This work is critical because efficient validation requires model developers and decision makers to interactively communicate details on the simulation model and the actual operation. Moreover, collaborative learning in which individuals are the cornerstones of the learning process has proven to be one of the most effective instructional methods. This is also evident from Fig. 1 where, on average, 76% of students surveyed named collaborative learning as their most preferred method of learning. However, as far as collaborative communication is concerned, the convenience of traditional paper-based discussion is somewhat lost in computer-based VR environments, where users’ discussion is restricted to the scale of the screen. On the other hand, tablebased media is a natural collaboration platform that allows people to promptly exchange ideas. One kind of table-based media is tabletop AR, which by definition better supports the prospect of collaborative learning and discussion compared to VR. Several research studies have been conducted to compare the effectiveness between AR and VR on collaboration environment. For example, users made significantly less mistakes in inspection and assembly tasks [13,14], gained stronger spatial cognition and memory [15], and thus experienced less mental workload [16] within a collaborative AR environment compared with VR. Group discussion cultivates face-to-face conversation, where there is a dynamic and easy interchange of focus between shared workspace and speakers’ interpersonal space. The shared workspace is the common task area between collaborators, while the interpersonal space is the common communication space. The former is usually a subset of the latter [17]. Practitioners can use a


S. Dong et al. / Advances in Engineering Software 55 (2013) 45–55

variety of non-verbal cues to quickly shift the focus of shared workspace accordingly, and thus work more efficiently. 3. Related research in collaborative AR The collaborative features of AR visualization have been previously explored. In this context, researchers have investigated the potential of AR-based education and training in three areas: localized collaborative AR, remote collaborative AR, and industrial collaborative AR. 3.1. Localized collaborative AR Some early works in localized collaborative AR are found in [17–19]. The TRANSVISION system developed by Rekimoto [18] is a pioneering work in collaborative AR. In it, multiple participants use palmtop handheld displays to share computer-generated graphics on a table. Collaborative Web Space [17] is an interface for people in the same location to view and interact with virtual world wide web pages floating around them in real space. The Studierstube system [19] mainly targets presentations and education. Each viewer wears a magnetically tracked see-through HMDs, and walks around to observe 3D scientific data. Other related works, like collaborative AR game- and taskoriented collaboration, then followed this trend. The Art of Defense [20] is a typical AR board game, in which gamers use handheld devices to play social games with physical game pieces on a tabletop. Nilsson et al. [21] did a comparison experiment on cross-organizational collaboration in dynamic emergency response tasks. Actors hold positive attitudes toward AR, and would like to use it for real tasks. Besides the traditional Head Mounted Display (HMD) and Hand Held Display (HHD), a number of other AR media exist (e.g. projection table and multi-touch table). The augmented urban planning workbench [22] is a multi-layered luminous table for hybrid presentations like 2D drawings, 3D physical models, and digital simulation overlaid onto the table. The system was used in a graduate course supporting the urban design process. Multi-Touch Mixed Reality [23] allows designers to interact with a multi-touch tabletop interface with 2D models, while 3D models are projected onto their 2D counterparts. 3.2. Remote collaborative AR In remote collaborative AR systems, avatars are the most necessary elements of the visualization environment. WearCom [17] enables a user to see remote collaborators as virtual avatars in multiparty face-to-face AR conferencing [24]. The system recreates each participant’s facial appearance in real time, and represents each participant’s upper body and hands above the table as a deformed billboard [25], thus inventing an interactive metaphor, termed ‘‘god-like,’’ for improving the communications of situational and navigational information between outdoor and indoor AR users. The gestures of indoor users are captured by video-based tracking and shown as ‘‘god-like’’ style guidance to the outdoor users. 3.3. Industrial collaborative AR Industrial collaborative AR is mainly used in product design and factory planning. The MagicMeeting system [26] is used in concrete test cases in which experts from the automotive industry meet to discuss the design of car parts. The collaboration is powered by a tangible AR interface. Fata Morgana [27], on the other hand, also demonstrates car design, but uses a life-sized model in a BMW show room. At Siemens Corporate Research, a fully implemented system called CyliCon [28] enables users to move around the envi-

ronment and visualize as-built reconstruction on real sites and in industrial drawings. The Roivis project is another successful example of factory design and planning at Volkswagen Group Research [29]. This project puts strict demands on system accuracy, e.g. interfering edge analysis, aggregation verification, etc. AR has also been widely studied in construction in areas including but not limited to operations visualization, computer-aided operations, project schedule supervision, and component inspection. However, there are few examples in the collaborative AR domain. For instance, Wang and Dunston [30] developed an AR face-to-face design review prototype and conducted test cases for collaboratively performing error detection. Hammad et al. [31] applied distributed AR for visualizing collaborative construction tasks (e.g. crane operations) to check spatial and engineering constraints in outdoor jobsites.

3.4. Scientific and technical challenges To the authors’ best knowledge, none of the previous work in this domain allows users to validate simulated processes and learn from the results by collaboratively observing dynamic operations animations. There are several scientific and technical challenges to be addressed for realizing the collaborative validation. (1) Since VITASCOPE and ARVita share the same usefulness in validating simulated processes, it is plausible to loosely couple the visualization of simulated processes and the tracking of planar marker, so that ARVita can reuse the visualization functionality powered by VITASCOPE and focus on building robust tracking libraries and interaction features. (2) Given the existence of a variety of planar marker tracking libraries, ARVita should be scalable enough to accommodate different schools of tracking libraries, and make the change in the tracking libraries transparent to the rest of the software structure. (3) Despite its fast tracking speed, fiducial makers’ sensitivity to video coverage gives rise to the necessity of building a natural marker tracking library which is robust to partial occlusion. (4) In the presence of critical events happening in visual simulations, ARVita should grant users the ability to switch to a synchronized workspace to avoid any misunderstanding in communication caused by the distortion of time or viewpoint. The rest of this paper includes descriptions and discussions about the design requirement and technical implementation of ARVita and is organized as follows: (i) software scheme and its realization in OpenSceneGraph (OSG) to reflect the design requirement about modularity (loose coupling between the visualization module and the tracking module) and scalability (tracking libraries) (Section 4), (iii) tracking libraries that include both fiducial and natural planar markers (Section 5), and (iv) synchronization amongst multiple views for analyzing critical events (Section 6).

4. Technical implementation of ARVita 4.1. Model-view-controller software architecture of ARVita The software architecture of ARVita conforms to the classical Model-View-Controller (MVC) pattern (Fig. 3), which helps to separate the visualization module powered by VITASCOPE, the tracking libraries and the interaction functionality. Since the VITASCOPE visualization engine exposes a list of APIs (Application Programming Interface), granting developers full control of the underlying animation process (e.g. open and close files, start and pause animation, and alter animation speed and timestamp). The VITASCOPE APIs are wrapped up as a scene node and affiliated with the Model component, which is responsible for initializing, archiving, and updating the VITASCOPE scene node.

S. Dong et al. / Advances in Engineering Software 55 (2013) 45–55


Fig. 3. The software architecture of ARVita conforms to the model-view-controller pattern. The arrow indicates a ‘belongs to’ relationship.

A Controller enables interactions by communicating users’ input commands to the VITASCPE scene node affiliated with the Model. The communication channel is powered by FLTK (acronym for Fast Light Toolkit) that translates and dispatches mouse/key messages to the Model and the View. The View contains the tracking libraries, and is also responsible for correctly setting up projection and ModelView matrices of the OpenGL cameras. It can thus align the updated Model content with the planar marker laid on the table. First, a camera projection matrix is populated at the start of the program based on the camera calibration result; this is to make sure that the OpenGL virtual camera and real camera share the consistent view volume. Second, the ModelView matrix—the pose of the OpenGL camera—is updated every frame based on the marker tracking results so that CAD models are transformed to the correct stance relative to the camera. 4.2. Implementation of model-view-controller using OpenSceneGraph OSG is chosen for the implementation of the MVC pattern described above. OSG uses an acyclic directional graph (tree) to express the scene holding geometry, state, and transformation nodes. The graph is traversed at each frame for updating, drawing, and culling purposes [32]. More importantly, its update and event callbacks mechanism makes it convenient for driving the simulated construction operations and performing the tracking procedure (Fig. 5). For example, tracking and transforming logic can be predefined as callbacks, and then executed upon tree traversal at every frame. 4.2.1. Model The VITASCOPE scene node—the core logic of the model—resides at the bottom of the tree (Fig. 4). The vitaProcessTraceFile() function is called up every frame to update the animation logic. Above the scene node is a coordinate transformation node. Since all of the tracking algorithms used in ARVita assume the Z axis is up and use the right hand coordinate system, this transformation converts VITASCOPE’s Y axis up and right hand coordinate system to ARVita’s default system, so that the jobsite model is laid horizontally above the marker. 4.2.2. View The core of the View is the FLTK_OSGViewer node that inherits methods from both the FLTK window class and osgViewer class,

and thus functions as the glue between the FLTK and OSG. Under its hood are the ModelView transformation node and video stream display nodes. The ModelView matrix is updated frame to frame by the tracking event callbacks. This approach follows osgART’s example (OSG ARToolkit) that uses the ‘Tracker and Marker’ updating mechanism to bundle the tracking procedure, e.g. ARToolkit, and OSG together. Both Tracker and Marker are attached as event callbacks to the respective node in the graph (Fig. 5). The Tracker reads updated video frames and stores the detected physical marker descriptor in the Marker. Consequently, the Marker calculates the camera’s pose in the world coordinate system based on the descriptor, and updates the ModelView transformation node. ARVita chooses to comply with this ‘Tracker and Marker’ mechanism because it is an abstract layer to separate the tracking and rendering logic. For example, as will be shown later, this mechanism is versatile in accommodating new tracking procedures, and making the change in the Tracker transparent to the rest of the software structure. The video resource is pasted as dynamic texture on the background and the eagle window. Despite the stability of the trackers used in ARVita, all of them require the majority of the marker, or even the entire thing, to be covered in the video frame. For example, in the ARToolkit, the CAD models could immediately disappear as soon as a tiny corner of the marker is lost. This limitation is much more severe when the animated jobsite covers the majority of the screen, which makes it very difficult to cover the marker within the camera’s view volume. The eagle window is thus valuable for mitigating these flaws. It can be toggled on and off by the user. When the user moves the camera to look for a vantage point, the eagle window can be toggled on such that the user is aware of the visibility of the marker. When the camera is set to static, and the user is paying attention to the animation, the eagle window can be toggled off so it won’t affect the observation. 4.2.3. Controller FLTK plays the role of the Controller in ARVita, and it translates and dispatches a user’s interaction/input messages to the system. The motivation behind ARVita is to allow multiple users to observe the animation from different perspectives, and promptly shift the focus of shared working space in a natural approach. These natural interactions include rotating marker to find vantage points, and pointing at the model to attract others’ attention (Fig. 6). Given that the scale of the model may prevent users from getting close


S. Dong et al. / Advances in Engineering Software 55 (2013) 45–55

Fig. 4. The realization of the model-view-controller model with OpenSceneGraph.

enough to interesting regions, ARVita provides users with basic zooming and panning functionalities. The focus of shared working space cannot only be switched spatially, but also temporally. Users can choose to observe the animation at different speeds, or jump instantaneously along the timeline (Fig. 7). The ARVita Controller wraps the existing VITASCOPE APIs like vitaExecuteViewRatioChange() and vitaExecuteTimeJump() in a user-friendly interface as most media players do—using fast-forward buttons, a progress bar, etc. 5. Synchronized view mode in ARVita

flected in all of the other users’ augmented spaces, so that a consistent dynamic model is shared among all users. The OSGCompositeViewer class is the key to upgrading the individual view mode to the synchronized view mode. Composite Viewers is a container of multiple views; it keeps the views synchronized correctly and threaded safely. Each view plays the same role as the FLTK_OSGViewer does in Fig. 4, and independently maintains its own video, tracker, marker resources, and ModelView matrix. However, these views share the same VITASCOPE scene node (Fig. 8) for two reasons: (1) to synchronize animation across different views, and (2) to save memory space by only maintaining one copy of each scene node.

5.1. Implementation of synchronized view ARVita allows users to launch several instances simultaneously, and each user can thus individually explore, interact and validate a visual simulation. However, if users expect to collaborate on validating a visual simulation, then a synchronized view mode should be turned on. This can happen, for example, when a user identifies a critical event happening during a certain time period, and s/he can draw the attention of partners by switching to the synchronized view mode and asking everyone to look at the same part of the model. This will ensure that all of the participants share a consistent view of the ongoing visual simulation, and eliminate any potential misunderstanding caused by the chronological or spatial misalignment. In the synchronized mode, when one user interacts with the model—rotating the marker, zooming, or dragging the progress bar—all of these spatial or temporal updates will be re-

5.2. Limitations of synchronized views on a single computer The current version of ARVita supports running multiple synchronized views on a single computer, indirectly limiting the maximum number of participants. As more ‘‘viewers’’ join, the computer gets quickly overloaded by maintaining too many video resources and tracking procedures. The authors are currently pursuing an alternate distributed computing approach to overcome this limitation. As a generic architecture for distributed computer simulation systems, HLA (acronym for High Level Architecture) cannot only integrate heterogeneous simulation software and data sources, but also communicate between computers, even platforms. HLA thus presents itself as a promising solution for a distributed ARVita.

S. Dong et al. / Advances in Engineering Software 55 (2013) 45–55

Fig. 5. osgART’s tracker and marker updating mechanism.

Fig. 6. Two users are observing the animation lying on the table.

Fig. 7. Steel erection activity at different timestamps.



S. Dong et al. / Advances in Engineering Software 55 (2013) 45–55

Fig. 8. All of the views possess their own video, track, and marker objects, but point to the same VITASCOPE scene node.

Fig. 9. The taxonomy of tracking methods.

6. Planar tracking methods for collaborative AR As noted earlier, the ‘Tracker and Marker’ mechanism makes ARVita versatile in accommodating different tracking procedures. It currently comes with two available trackers. The first one, ARToolkit [33], is a widely used fiducial marker tracking library. The second one, KEG [34], was developed at the University of Michigan, and is a natural marker tracking library. This section will articulate the importance of tracking in AR, and describe mainstream tracking approaches and those used in ARVita. It will also highlight the KEG tracking algorithm.

6.1. Principle of tracking Tracking, which is also referred to as registration in the AR community, is the procedure of aligning real and virtual objects properly in an augmented world to create the illusion that they coexist across time and space. To be more specific, since real objects are shown by the physical camera, and virtual objects are shown by the OpenGL virtual camera, the coexistence of heterogeneous objects implies that these two cameras share the same pose. Pose is

the translation and rotation of the camera relative to the origin of the world coordinate system. A physical camera’s pose needs to be tracked continuously and used to alter the ModelView matrix of the OpenGL camera accordingly. Tracking is acknowledged as the fundamental challenge in the AR community, and as such, has been well studied for decades [12].

6.2. Taxonomy of tracking methods There are a variety of tracking methodologies depending on the application. For example, in the outdoor environment, a GPS and electronic compass are usually employed together to track a camera’s position and orientation [35]. However in the indoor environment, where a GPS signal is blocked, visual tracking methods are usually preferred (Fig. 9). Based on visual tracking libraries’ assumptions about the environment, they can be classified as known and unknown environment tracking methods. SLAM (acronym for simultaneous location and mapping) [36] is the representative of an unknown environment tracking method that imposes few assumptions about the environment. The other school of tracking method assumes a known environment and can be further di-

S. Dong et al. / Advances in Engineering Software 55 (2013) 45–55

vided into non-planar and planar environments. The former works with 3D structures that have known visual features (usually CAD models) [37]. Despite both SLAM and non-planar requiring a high computational budget, they can be listed as promising candidates for the future version of ARVita, as their loose restriction on the environment could grant users more flexibility when observing animations. The planar marker tracking method is simpler compared to the previous tracking methods, but is sufficient given the application context of ARVita, where the working space is usually a large flat table laid with markers; users can seat themselves around the table and search for vantage points. The planar marker’s branches consist of a fiducial marker and natural marker tracking method, and tracker options in ARVita include both. 6.3. Trackers available in ARVita 6.3.1. The fiducial marker tracking method, ARToolkit as an example The fiducial marker tracking method is efficient and fast. This is because the fiducial marker is usually composed as an artificial picture that contains ‘easy to extract’ visual features like a set of black and white patterns. The extraction of these patterns—straight lines, sharp corners, and circles, for example—is fast and reliable. ARToolkit is one of the oldest fiducial marker tracking methods, and is widely considered a benchmark in the AR community. ARToolkit’s fiducial marker is a logo bounded by a thick black frame (Fig. 4). The four corners of the frame are used to compute the camera pose, and the center logo is used to interpret the identity of the marker. Because of its simplicity and fast tracking speed, ARToolkit has been popular for over a decade in numerous applications. However, it also suffers from the common shortcomings of a fiducial marker, and requires all of its four corners to be visible to the camera so that the camera pose can be computed. This could cause frustration when the user navigates through the animated jobsite only to find the animation graphics blinking on and off due to loss of tracking. This disadvantage motivated the authors to look into natural markers, which are more flexible about the coverage of the marker.


6.3.2. The natural marker tracking method, KEG as an example Besides the advantage of partial coverage (Fig. 10), the natural marker offers the advantage of not depending on special predefined visual features, like those found in the fiducial marker. In other words, the features can be points, corners, edges, and blobs that appear in a natural image. The extraction of these features is vital in establishing correspondence between the observed image by the camera and the marker image, and in estimating the camera’s pose. Therefore, robust estimation usually requires the establishment of ample correspondence, which is a challenging issue. There are two schools for tracking the natural markers, depending on whether they treat observed images independently or consecutively. The former is referred to as a detection-based method, such as SIFT (Scale Invariant Feature Transform) [38] or FERNs [39]. The latter is addressed as a tracking-based method such as Kanade–Lucas–Tomasi (KLT) [40,41]. Each school has its own pros and cons. A robust detection-based method often demands a high computational budget, and can hardly meet real-time frame rates. On the other hand, accumulated errors in the tracking-based method can be carried forward along consecutive frames, and thus compromise the tracking quality. The KEG tracker used in ARVita inherits merits from both the detection-based and tracking-based methods. The following sections will briefly cover these two methods, and how they are combined by the proposed global appearance and geometric constraints. Detection-based method. The correspondence relationbetween the marker image and the captured image can be expressed as a transformation matrix, called homography, that maps points on the observed image to their corresponding points on the marker image. H is a 3  3 matrix that represents the homography matrix, and s is a scale factor. p0 (x0 , y0 ) and p (x, y) is a certain pair of corresponding points on the marker and observed images. Furthermore, H encodes the camera’s pose, i.e. rotation and translation, because (x, y) can also be transformed to (x0 , y0 ) through rotation (R), translation (T), and

Fig. 10. The natural marker tracking library is flexible on marker visibility.


S. Dong et al. / Advances in Engineering Software 55 (2013) 45–55 Table 1 Comparison between two natural marker approaches. Approach




Identify matching feature points on each new frame Independent

Follow up the existing feature points from frame to frame

Failure of estimation in one frame will not affect the next one at all Time-consuming


Relation between consecutive frame Pros Cons

Current frame is correlated with previous one

Error of estimation in one frame will be carried forward, and the accumulated error will eventually lead to loss of track

Fig. 11. The algorithm flowchart of the KEG tracker.

camera calibration (K) matrices. In other words, if K can be found through camera calibration, R and T can be decomposed from H. Notice that Eq. (1) can be converted to p0i  Hpi ¼ 0, and thus H can be solved given multiple matching pairs of < p0i ; pi >. Because most existing algorithms for finding matching points like SIFT and FERNs only inspect the local appearance, mismatching is inevitable in some cases. Accounting for the existence of outliers/mismatching, RANSAC (RANdom Sample Consensus) [42] is used to estimate parameters of H. In the KEG tracker, one of RANSAC’s variants [43] is used. Tracking-based method. A tracking-based method is based on the premise that the difference between two consecutive observed images is small, and thus the feature points can be tracked by checking the points’ local neighborhood patches. Apparently this approach costs less than the detection-based method, where prior knowledge is discarded in comparing the current captured image with the marker image. Once the correspondence is established at the initial stage by using one of the detection methods mentioned in Section, those feature points can be tracked on the following observed images, and the initial detection cost is amortized in the tracking stage. The tracking library used in KEG is Kanade–Lucas–Tomasi (KLT). Global appearance and geometric constraints. Table 1 summarizes the profiles of detection-based and tracking-based methods. The framework of the KEG tracker integrates these two methods and enhances the overall performance by using global appearances and geometric constraints. In Fig. 11, black boxes imply feature points on the marker image is identified only once when the program starts, and grey boxes and dashed lines mean the detection-based method is only employed at the initial phase or the loss of track case. Once matching feature points are found, the path is shifted to the tracking-based method (solid lines), where the known feature points are tracked by KLT. The estimated homography using RANSAC is addressed as coarse H, and may still contain systematic and random errors.

The first enhancement introduced by KEG is called the global appearance constraint, which processes coarse H under ESM [44] as a second-order approximation. The yield refined H can thus rectify the captured image similar to the appearance of the marker image. The second enhancement is called the global geometric constraint. Since the refined H is assumed to be theoretically free of systematic and random errors, the updated feature points  xupdated in Eq. (2) to be tracked at the next frame do not inherit the accumulated errors. Here xmarker refers to the feature points found on the marker image. These two enhancements not only boost the tracking accuracy and stability, but also increase the tracking speed, which is attributed to the global refinement that reduces the iterations required by the KLT algorithm. The introduction of AprilTag. Besides tracking the pose of the camera, another necessity is to recognize the identity of the natural marker. Currently, KEG takes advantage of the coding system in the AprilTags [45] tracking library to identify the associated information with a certain marker. Even though AprilTags, itself, is part of the fiducial tracking family, it does not need to be fully covered by the camera once the identity of the marker is confirmed (Fig. 10). 7. Conclusion and future work 3D visualization is one of the most effective tools for exposing engineering students to the latest trends of emerging technology advancements in the classroom. Statistics also indicate that students learn better when their instructors use 3D representations or visualization to teach engineering concepts and theories. Even though 3D visualization methods in the form of 3D/4D CAD models have existed for decades, they have not yet largely replaced traditional media and tools. One reason for this is that traditional visualization tools offer less opportunity for collaboration [19]. As an effort to enable collaborative learning through 3D visualization, we introduce a software program termed ARVita for collaboratively visualizing dynamic 3D simulated construction operations. Users sitting across a table can have a face-to-face discussion about

S. Dong et al. / Advances in Engineering Software 55 (2013) 45–55

3D animations ‘‘laid on’’ the table surface. Interaction functionalities are designed to assist users in moving smoothly between the focus of shared workspaces. A live demo of ARVita can be found on the authors’ website . The next generation of ARVita will focus on offering students more flexibility and space when they observe a 3D animation model. Two thrusts of improvements can be made. The first is to enable the tracking library to function in the unknown environment, a.k.a SLAM, so that a user’s observation range is no longer limited by the visibility of the marker. In other words, any flat table with an ample amount of features could be a tracking plane. The second area for improvement relates to the efforts that are being made to make ARVita comply with the rules of HLA. This compliance will allow ARVita to be distributed and synchronized across computers. When this happens, students can run multiple instances of ARVita on their own computers but collaborate on the synchronized model. The current version of ARVita software and its source code can be found on the authors’ website .

References [1] Halpin DW, Riggs LS. Planning and analysis of construction operations; 1992. [2] Kamat VR, Martinez JC. Visualizing simulated construction operations in 3D. J Comput Civil Eng 2001;15:329–37. [3] Martinez J. STROBOSCOPE state and resource based simulation of construction processes. Ann Arbro: University of Michigan; 1996. [4] Law A, Kelton DW. simulation modeling and, analysis; 2000. [5] Rohrer MW. Seeing is believing: the importance of visualization in manufacturing simulation. In: Proceedings of the 2000 winter simulation conference, San Diego; 2000. p. 1211–6. [6] Henriksen JO. General-purpose concurrent and post-processed animation with proof TM. In: Proceedings of the 1999 winter simulation conference; 1999. p. 176–81. [7] Huang R-Y, Grigoriadis AM, Halpin DW. Simulation of cable-stayed bridges using disco. In: Proceedings of the 1999 winter simulation conference; 1994. p. 1130–6. [8] Tucker SN, Lawrence PJ, Rahilly M. Discrete-event simulation in analysis of construction processes. CIDAC simulation paper; 1998. [9] Halpin DW, Martinez LH. Real world applications of construction process simulation. In: Proceedings of the 1999 winter simulation conference. San Diego, CA: Society for Computer Simulation; 1999. p. 956–62. [10] Kamat VR, Martinez JC. Automated generation of dynamic, operations level virtual construction scenarios. Electron J Inf Technol Construct (ITcon) 2003;8:65–84. [11] Behzadan AH, Kamat VR. Visualization of construction graphics in outdoor augmented reality. In: Proceedings of the 2005 winter simulation conference Piscataway. NJ: Institute of Electrical and Electronics Engineers (IEEE); 2005. [12] Azuma R. A survey of augmented reality. Teleoper Virtual Environ 1997:355– 85. [13] Baird KM. Evaluating the effectiveness of augmented reality and wearable computing for a manufacturing assembly task; 1999. [14] Wang X, Dunston PS. Comparative effectiveness of mixed reality-based virtual environments in collaborative design. IEEE Trans Syst Man Cybern 2011;41:284–96. [15] Biocca F, Tang A, Lamas D, Gregg J, Brady R, Gai P. How do users organize virtual tools around their body in immersive virtual and augmented environment? An exploratory study of egocentric spatial mapping of virtual tools in the mobile infosphere. East Lansing: Media Interface and Network Design Labs, Michigan State University; 2001. [16] Tang A, Owen C, Biocca F, Mou W. Comparative effectiveness of augmented reality in object assembly. In: Proceedings of the SIGCHI conference on Human factors in, computing systems; 2003. p. 73–80. [17] Billinghurst M, Kato H. Collaborative mixed reality. In: Proceedings of the 1999 international symposium on mixed reality; 1999. [18] Rekimoto J. Transvision: a hand-held augmented reality system for collaborative design. In: Proceeding of 1996 virtual systems and multimedia, Gifu, Japan; 1996. [19] Szalavári Z, Schmalstieg D, Fuhrmann A, Gervautz M. ‘Studierstube’ an environment for collaboration in augmented reality; 1997.


[20] Huynh TD, Raveendran K, Xu Y, Spreen K, MacIntyre B. Art of defense: a collaborative handheld augmented reality board game. In: Proceedings of the 2009 ACM SIGGRAPH symposium on video games; 2009. p. 135–42. [21] Nilsson S, Johansson B, Jonsson A. Using AR to support cross-organisational collaboration in dynamic tasks. In: Proceedings of the 2009 IEEE and ACM international symposium on mixed and augmented reality Orlando, FL; 2009. p. 3–12. [22] Ishii H, Underkoffler J, Chak D, Piper B. Augmented urban planning workbench: overlaying drawings, physical models and digital simulation. In: Proceedings of the international symposium on mixed and augmented reality; 2002. [23] Wei D, Zhou ZS, Xie D. MTMR: a conceptual interior design framework integrating mixed reality with the multi-touch tabletop interface. In: Proceedings of the 2010 IEEE and ACM international symposium on mixed and augmented reality, Seoul, Korea; 2010. p. 279–80. [24] Minatani S, Kitahara I, Kameda Y, Ohta Y. Face-to-face tabletop remote collaboration in mixed reality. In: Proceedings of 2007 IEEE conference on computer vision and pattern recognition, Nara, Japan; 2007. p. 43–6. [25] Stafford A, Piekarski W, Thomas HB. Implementation of God-like interaction techniques for supporting collaboration between outdoor AR and indoor tabletop users. In: Proceedings of the 2006 IEEE and ACM international symposium on mixed and augmented reality, Santa Barbara, CA; 2006. p. 165– 72. [26] Regenbrecht HT, Wagner MT, Baratoff G. MagicMeeting: a collaborative tangible augmented reality system. Virtual Reality: SpingerLink; 2006. [27] Klinker G, Dutoit AH, Bauer M. Fata Morgana – a presentation system for product design. In: Proceedings of the 2002 IEEE and ACM international symposium on mixed and augmented reality, Darmstadt, Germany; 2002. p. 76–85. [28] Navab N. Industrial augmented reality (IAR): challenges in design and commercialization of killer apps. In: Proceedings of the 2003 IEEE and ACM international symposium on mixed and augmented reality, Tokyo, Japan; 2003. p. 2–6. [29] Pentenrieder K, Bade C, Doil F, Meier P. Augmented reality-based factory planning – an application tailored to industrial needs. In: Proceedings of the 2007 IEEE and ACM international symposium on mixed and augmented reality, Nara, Japan; 2007. p. 1–9. [30] Wang X, Dunston P. User perspectives on mixed reality tabletop visualization for face-to-face collaborative design review. J Autom Construct 2008;17:399–412. [31] Hammad A, Wang H, Mudur SP. Distributed augmented reality for visualizing collaborative construction tasks. J Comput Civil Eng 2009;23:418–27. [32] Martz P. OpenSceneGraph quick start guide; 2007. [33] Hirokazu K, Billinghurst M. Marker tracking and HMD calibration for a videobased augmented reality conferencing system. In: Proceedings of the 1999 IEEE and ACM international workshop on augmented reality (IWAR 99); 1999. [34] Feng C, Kamat VR. KEG plane tracker for AEC automation applications. The 2012 international symposium on automation and robotics in construction, mining and petroleum; 2012. [35] Feiner S, Macintyre B, Hollerer T. A touring machine: prototyping 3D mobile augmented reality systems for exploring the urban environment. In: Proceedings of 1997 international symposium on wearable computers; 1997. p. 74–81. [36] Klein G, Murray D. Parallel tracking and mapping for small AR workspaces. In: Proceedings of the 2007 IEEE and ACM international symposium on mixed and augmented reality, Nara, Japan; 2007. [37] Drummond T, Cipolla R. Real-time visual tracking of complex structures. IEEE Trans Pattern Anal Machine Intell 2002;24:932–46. [38] Lowe DG. Distinctive image features from scale-invariant keypoints. Int J Comp Vision; 2004. [39] Ozuysal M, Fua P, Lepetit V. Fast keypoint recognition in ten lines of code. In: Proceedings of 2007 IEEE conference on computer vision and pattern recognition minneapolis, MN; 2007. p. 1–8. [40] Lucas BD, Kanade T. An iterative image registration technique with an application to stereo vision. In: Proceedings of the 7th international joint conference on, artificial intelligence; 1981. [41] Shi J, Tomasi C. Good features to track. In: Proceedings of 1994 IEEE conference on computer vision and pattern recognition; 1994. p. 593–600. [42] Fischler M, Bolles R. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 1981;24:381–95. [43] Simon G, Fitzgibbon AW, Zisserman A. Markerless tracking using planar structures in the scene. In: Proceedings of the 2000 international symposium on augmented reality; 2000. p. 120–8. [44] Benhimane S, Malis E. Real-time image-based tracking of planes using efficient second-order minimization. In: Proceedings of 2004 international conference on intelligent robots and systems INRIA, France IEEE; 2004. p. 943–8. [45] Olson E. AprilTag: a robust and flexible visual fiducial system. In: Proceedings of the IEEE international conference on robotics and automation (ICRA); 2011.