Autonomous grasp and manipulation planning using a ToF camera

Autonomous grasp and manipulation planning using a ToF camera

Robotics and Autonomous Systems 60 (2012) 387–395 Contents lists available at SciVerse ScienceDirect Robotics and Autonomous Systems journal homepag...

1MB Sizes 0 Downloads 40 Views

Robotics and Autonomous Systems 60 (2012) 387–395

Contents lists available at SciVerse ScienceDirect

Robotics and Autonomous Systems journal homepage:

Autonomous grasp and manipulation planning using a ToF camera Zhixing Xue ∗ , Steffen W. Ruehl, Andreas Hermann, Thilo Kerscher, Ruediger Dillmann FZI Forschungszentrum Informatik, Haid-und-Neu-Str. 10-14, 76137 Karlsruhe, Germany



Article history: Available online 27 August 2011 Keywords: Manipulation planning Grasp planning Time-of-flight sensors Impedance control

abstract A time-of-flight camera can help a service robot to sense its 3D environment. In this paper, we introduce our methods for sensor calibration and 3D data segmentation to use it to automatically plan grasps and manipulation actions for a service robot. Impedance control is intensively used to further compensate the modeling error and to apply the computed forces. The methods are further demonstrated in three service robotic applications. Sensor-based motion planning allows the robot to move within dynamic and cluttered environment without collision. Unknown objects can be detected and grasped. In the autonomous ice cream serving scenario, the robot captures the surface of ice cream and plans a manipulation trajectory to scoop a portion of ice cream. © 2011 Elsevier B.V. All rights reserved.

1. Introduction A domestic service robot should help people to handle daily tasks in the household environment. Autonomous grasping and manipulation of household items are key functions that a robot should have. To accomplish grasping and manipulation tasks, a robot needs the ability to perceive the 3D environment and the ability to interact with it adaptively. For the former ability, range sensors have been developed, which can provide depth information at each pixel instead of gray or color information as their 2D counterparts do. For the latter ability, soft robots with passively or mechanically compliant joints have been developed, which cause less impact on the environment and yield better compliant behavior. In this article, we present our approach to combine these two new technologies for autonomous grasping and manipulation. We use a time-of-flight camera to capture the object models and environment information. Grasp planning, motion planning and manipulation planning are involved to plan grasps, collision-free paths and manipulation actions, respectively. 1.1. Range sensors In the past 25 years, range sensors have been intensively developed and commercialized and now enable robots to capture their 3D environment. Depending on the applied measurement principles, they can be categorized into four sensor technologies [1]. The laser scanner measures the time delay between an emitted, reflected and received laser signal to infer the distance to a target point. By mechanically deflecting the laser signal in azimuthal and longitudinal directions and sequentially acquiring

Corresponding author. E-mail address: [email protected] (Z. Xue).

0921-8890/$ – see front matter © 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.robot.2011.07.012

multiple measurement points, a 3D point cloud can be obtained. Since the laser scanners can measure ranges over hundreds of meters, they are widely applied in applications for both indoor and outdoor environments, especially navigation [2], 3D SLAM [3], indoor modeling and collision avoidance for mobile robots. The slit scanner projects a laser line onto the object and simultaneously detects the laser profile in a single video frame. It is the most widely used triangulation-based 3D laser camera because of its optical and mechanical simplicity and low cost. Instead of a laser line, multiple stripes or patterns can also be projected on the object. This is the pattern projection principle. These projection techniques work only for a short range. Since they need multiple projections of lines or patterns to acquire the whole scene, the scan time is relatively long and not suited for dynamic scenes. For large structures and dynamic scenes, time-of-flight cameras are by far the preferred choice. They use the time-of-flight principle like a laser scanner. They combine a standard camera and an imager capable of measuring the time delay for each pixel to acquire the entire 3D point cloud at a single instant. The camera is described by the well understood intrinsic camera model and an extrinsic model that parametrizes the distance measurement for each pixel. Depending on the used phase or pulse modulated signals, short or long ranges can be measured. These properties enable the time-of-flight cameras to quickly capture any changes in the environment. For grasping and manipulation, several disadvantages of the range sensors have to be considered. The main disadvantage is that the sensor can only provide partial geometry information from a certain point of view, whereas most of the planning algorithms require complete geometry information. Furthermore, occlusion is a major problem for the recognition and segmentation of objects in a cluttered environment. If the robot is mobile, it can use its mobility to capture the 3D information from different viewing points and fuse them together to acquire a whole 3D scene.


Z. Xue et al. / Robotics and Autonomous Systems 60 (2012) 387–395

Fig. 1. An autonomous ice cream serving robot with a 3D camera.

Since most of the commercialized products are still somehow prototypical, the measured results are inaccurate and noisy. Complex calibration procedure has to be taken to obtain usable sensor values. The range sensor used in this work is a Mesa Imaging Swiss Ranger SR4000 with a resolution of 176 × 144. The sensor emits modulated infrared light, which is reflected by objects and projected onto a custom-designed CMOS chip. Although this principle of phase-shift measurement is well understood and the distance can be determined, the error may be bigger than 10 cm. The distance errors are influenced by different factors such as thermal noise, propagation delay in the chip’s circuits, the exact form of the diode’s signal, lens distortion, internal and external temperature, ambient light, reflective properties, viewing angle, complexity of the observed scene, scattering due to objects close to the sensor. Different calibration approaches have been tested and reported in the literature. Our calibration procedure is introduced in Section 2.1. The sensor is mounted directly above the center of the robots workspace, as show in Fig. 1. The field of view of the camera covers the whole workspace and reduces occlusion effects. 1.2. Grasp and manipulation planning The sensed geometrical models of the objects and the environment are those, which the grasp and manipulation planning algorithms need to plan actions for the robot. One characteristic of these planning problems is the high dimensional configuration space. Our robot with two KUKA Light Weighted Robotic (LWR) Arms (with 7 joints each) [4] and two DLR/HIT Five-Finger Hands (each finger with 3 joints) [5] has a total of 44 Degrees of Freedom (DoF). The goal of the planning algorithms is to find suitable configurations in this high dimensional space, that enable the robot to grasp or manipulate the objects. Because of the high DoF, the configuration space cannot be analyzed exhaustively and most methods rely on sampling it randomly or analytically to evaluate the sampled configurations with some criterion [6,7]. Collision-freeness, for instance, is the most important criteria for motion planning problems. For grasping and manipulation, also the forces and torques that can be applied onto an object by the robot are of interest. Force-closure and Form-closure properties [8] are well studied criteria for grasp planning. The planning algorithms use internal (physical) models to simulate the real world. In the literature, many work has used vision information to estimate the shape of the object [9], and use it for grasping and manipulation tasks [10–13]. To execute the planned grasps and manipulation actions, two problems are considered in this article. One major problem to overcome is the inconsistency between the internal models used for computation and the physical real world. Because of the

complexity of the grasping and manipulation problems, most of the algorithms require complex and precise models of the objects and the environment. If we only use partial, imperfect geometrical information from the range sensors to plan joint positions for the robot, it could fail during the execution because of measurement errors. Also the planning assumes that the robot model is accurate. And the commanded position can be reached without consideration of the uncertainty and inaccuracy during the real execution. Although the planned manipulation actions work perfect in the modeled world, they could fail in the real physical world because of unforeseen physical effects occurring at execution or the deviation from the planning models. One solution to compensate these errors and to improve the stability of the execution could be the impedance control. Another problem in the literature of grasp and manipulation planning is that the algorithms only try to numerically optimize the forces and torques that should be applied between the robot and the object, but overlooked the uncertainty during the execution and ignore the implementation on a real robot. 1.3. Impedance control The two above mentioned problems can be tackled elegantly using the compliant control, which provides a compliant behavior, if the robot comes into contact with its environment. One possible approach to achieve compliant behavior is given by impedance control, as introduced by Hogan in the seminal work [14]. Impedance control has the goal to realize a particular desired dynamical relationship between the robot motion and the external torques. To take advantage of the impedance control for a concrete application, one has to feed the planning results into a framework of impedance control to achieve the desired physical and dynamical effects [15–18]. In this work, the impedance control developed at German Aerospace Center (DLR) is used for grasping and manipulation. The robot as shown in Fig. 1 has two KUKA lightweight robotic arms and two five-finger dexterous robot DLR-HIT II [5] hands. The robot is modeled as a flexible joint system. Each arm joint and finger joint is equipped with a strain-gauge-based torque sensor and motor position sensing based on magnetoresistive encoders combined with link side position sensors based on potentiometers. The measurement is performed at 3 kHz in the arm joints and 1 kHz in the finger joints. 1.4. Contribution and outline In this article, we combine two developments: range sensors together with impedance controlled soft robotics for grasping and manipulation. A time-of-flight camera is used to capture the unknown objects from a partially-known environment. This range information is used by different planning algorithms to generate grasping and manipulation actions for the robot. These actions contain not only position information, but also desired force and torque information that the robot should apply. Finally, the position, force and torque information are used to command the impedance control. That way it achieves compliant and safe execution, even for partially known, inaccurately perceived environment, as acquired by the time-of-flight camera. To show that the proposed approach is general for the three quite different applications, we categorize tasks by a distance function d(q1, q2) between the robot and the manipulated object, whereas q1 is the local coordinate column vector of a colliding link of the robot and q2 of the object. This general distance function is defined as follows. At d(q1, q2) > 0 the robot and the object are separated. It is the goal of the motion planning to plan collision free paths. We monitor the sensed torque values in each joint to detect

Z. Xue et al. / Robotics and Autonomous Systems 60 (2012) 387–395

the collision during execution. At d(q1, q2) = 0, the robot and the object are touching. The grasp planning algorithm computes the forces between the robot (mostly at the fingertips) and the object to immobilize the object in the hand. These grasping forces are applied by the finger joint impedance control. At d(q1, q2) < 0, the robot and the object are in penetration. A possible application as introduced in this article is scooping the ice cream, where the ice cream scoop is inserted into the ice cream mass. This is not widely studied in the literature. We take advantage of the Cartesian impedance control of the arm to execute steady forces along a computed trajectory for scooping ice cream. 2. Methods 2.1. Calibration of a time-of-flight camera The CMOS chip of the Swiss Ranger sensor can provide both intensity image and depth image. This allows one to apply conventional, well studied camera calibration methods to determine the intrinsic parameters of the camera. We use the calibration method from the Open Source Computer Vision library (OpenCV) [19], which computes the principal point, focal lengths and distortion parameters of the lens from several views of a known chessboard. After these intrinsic camera parameters are determined, the same routine is used again to determine the extrinsic parameters of the camera. The position and orientation of the camera with respect to the arm are determined by estimating the transformation between a set of points in two different coordinate systems, one in the camera coordinate system and the other in the arm coordinate system. This is done by placing a known checkerboard at different poses within the arm workspace that can be well viewed by the camera. Using the calibrated intrinsic camera parameters, the corners of the chessboard can be computed. The tip of a measurement tool mounted on the arm’s Tool Center Point (TCP) is moved to the position of a chosen corner. Taking the size of the measurement tool into account, this position in the arm coordinate system can be computed by the forward kinematic algorithm. In contrast to the 6D localization of the chessboard from the camera, the measurement tip method yields only a 3D position. For further calculations, we drop the orientation information of the checkerboard from the camera localization. This results in a set of two clouds of three-dimensional points where two associated points from each set correspond to the same point in different coordinate systems. We use the method presented in [20] to calculate the transformation between the two coordinate systems, which minimizes accumulated squared distances between the resulting point coordinates using the singular value decomposition. This allows one to compute the relative coordinate transformation between the camera coordinate system and the arm coordinate system, so that the measured point cloud from the ToF camera can be transformed into the coordinate system of the arm. A ToF camera usually possesses considerable variability across its depth sensing range mainly due to varying intensity and amplitude values. As a consequence, planar surfaces are usually displayed vaulted. This kind of error can be reduced by a linear function for each pixel [21,22]. A lookup table was built by taking several depth maps of a planar surface with the sensor mounted in various defined distances pointing at it. At each pixel, the nominal and the actual distance are compared. These are used to estimate the offset oi,j and the multiplier mi,j of the following linear regression: ci,j = mi,j zi,j + oi,j


where the indexes i, j specify the pixel in the depth map, zi,j the measured and ci,j the corrected distance. This calibration


procedure needs to be performed only once to build up the parameter lookup table. The depth map provided by the ToF camera at the runtime is corrected using the distance lookup table. However, this implicitly assumes that the internal and external conditions at runtime are the same as during calibration. Under different conditions at runtime, like changed temperature and surface reflectance, the distance errors after the lookup table correction may be insufficient for grasping and manipulation. We try to overcome this problem by taking known environment information as ground truth and recalibrate the depth map against a planar surface in the robot’s environment (like a table) at runtime. The points of the table are segmented and approximated by a least-square plane fitting algorithm. An optimal 4 × 4 transformation matrix M between the estimated plane and a plane describing the known table can be computed. This transformation is then performed to the depth map after the lookup table correction: c˜i,j = M · ci,j .


After these steps of calibration, the distance errors of c˜i,j can be reduced below 3 mm, which allows precise grasping and manipulation. 2.2. Segmentation of known scene from the measurement In grasping and manipulation tasks, parts of the environment such as the tables, walls and the robot body are a-priori known. Object localization algorithms can be used to recognize the objects and determine their Cartesian poses. These parts will also be detected by the ToF camera. Thus the first important step to understand the sensed environment is to segment these known parts from the depth information. This way, the unknown objects can be identified. The mesh of the unknown objects that is generated from the depth information can be considered as an obstacle by the motion planning algorithm to avoid collision, or may be used as an object model by the grasp planning algorithm to plan feasible grasps for it. The working principle of the ToF camera (emitting modulated light to determine the distance) is very similar to the ray shooting problem in geometric computation area, which calculates the distance of a ray from a starting position to the intersection with another object. The Z -buffer rendering provided by the OpenGL rendering pipeline can compute the distances of the rays from a virtual camera to a given scene. This is used in our work to generate an artificial view of the scene observed by the camera. The same intrinsic and extrinsic camera parameters of the camera obtained from the calibration step are used by the Z -buffer rendering. The resulted depth map is compared pixel-wise with the actual depth map sensed by the sensor. At each pixel, if the rendered point is behind the sensed point, the object is covered by an unknown object at this point. Otherwise, the point belongs to a known object and will be eliminated. The remaining point cloud results from unknown parts of the scene. Using their coordinates from the depth map, a mesh can be generated by connecting every three neighbored points as a face. From the calibrated depth c˜i,j , the coordinates of the measured point can simply be computed using the cameras’ intrinsic parameters. With F (˜ci,j ) = (x, y, z )T denoting this transformation function, two faces can be built for a remained point c˜i,j : {F (˜ci,j ), F (˜ci,j−1 ), F (˜ci−1,j )} and {F (˜ci,j ), F (˜ci,j+1 ), F (˜ci+1,j )}. The generated mesh may be further separated into small meshes, which do not connect with each other. These resulting meshes represent an unknown object or several unknown objects, which cannot be segmented further using the depth information alone. We use the collision checking library PQP, that uses triangulated polygon soups to represent the geometrical models. After the segmentation and triangulation steps, the unknown objects and


Z. Xue et al. / Robotics and Autonomous Systems 60 (2012) 387–395

Fig. 2. Segmentation of unknown objects from depth information.

a-priori known objects can be handled in the same way by collision checking algorithms. This allows collision free paths to be computed, which also avoid collisions with the unknown objects in the scene. They also enable computation of grasps for the unknown objects. This procedure is visualized in Fig. 2. The scene around the robot is captured by a stereo camera system. Two objects can be localized using offline trained SIFT descriptors as described in [23]. The localization results in the object information with its 6D pose relative to the robot. These two objects together with the table model are used by the Z -buffer rendering. The unknown objects can be segmented by pixel-wise depth comparison, as shown in the rightmost picture. 2.3. Impedance control with torque sensing Impedance control tries to achieve a desired dynamic relationship between external forces and movements of the robot [14]. It is a very powerful tool in grasping and manipulation applications. A short introduction to joint level and Cartesian level impedance control is given below, as introduced for the robots with flexible joints with torque sensing [16,17,5]. The flexible joint model that is assumed was proposed by Spong in [24]: M (q)¨q + C (q, q˙ )˙q + g (q) = τ + τext


Bθ¨ + τ = τm .

(4) n

Herein, q ∈ ℜ represents the vector of link side joint angles and θ ∈ ℜn the vector of motor angles divided by the gear ratio. M (q) ∈ ℜn×n , C (q, q˙ )˙q and g (q) ∈ ℜn denote the inertia matrices, centrifugal term and gravity term, respectively. The joint torques τ ∈ ℜn are determined by the linear relationship τ = K (θ − q). K ∈ ℜn×n and B ∈ ℜn×n are diagonal matrices denoting joint stiffness and the rotor inertia. The generalized motor torque vector τm is considered as the input signal for the controller, whereas τext is the external torque vector for the robot model. A joint torque feedback is used to scale the apparent rotor inertia from B to Bθ :

τm =

1 BB− θ u

+ (I −

1 BB− θ )τ


where u is a new input variable, as for the following motor position based PD-controller: u = −Kθ (θ − θs ) − Dθ θ˙ .


The desired joint level impedance behavior is defined by a positive definite stiffness matrix Kθ , damping matrix Dθ as well as a desired configuration θs . This results in the following closed loop equations: M (q)¨q + C (q, q˙ )˙q + g (q) = τ + τext


Bθ θ¨ + Dθ θ˙ + Kθ (θ − θs ) + τ = 0.


In the case of Cartesian level impedance control, the desired impedance behavior is defined with respect to a Cartesian coordinate x ∈ ℜm , which describes both the position and orientation of the robot end effector. It is assumed in the following that the forward kinematic function xθ = f (θ ) and the Jacobian matrix ∂ f (θ ) J (θ ) = ∂θ are known. Cartesian velocity and acceleration of the end effector can then be written as x˙θ = J (θ )θ˙ x¨θ = J (θ )θ¨ + J˙(θ )θ˙ .

(9) (10)

The goal of the Cartesian impedance control is to alter the system dynamic (3) to achieve the dynamic relationship between a position error x˜ (θ ) and the external forces and torques F ∈ ℜm at the end effector:

Λd x¨˜θ + Dd x˙˜θ + Kd x˜θ + F = 0.


Wherein Λd , Dd and Kd are the symmetric and positive definite matrices of the inertia, damping and stiffness, respectively. x˜ (θ ) = f (θ ) − xd is the error between the real position x and a virtual equilibrium position xd . Then the feedback law u = −J (θ )T (Kd x˜ (θ ) + Dd x˙ )


generalizes (6) to Cartesian coordinates. This together with (3), leads to the closed loop system: M (q)¨q + C (q, q˙ )˙q + g (q) = τ + τext


Bθ θ¨ + J (θ )T (Kd x˜ (θ ) + Dd x˙ ) + τ = 0.


Z. Xue et al. / Robotics and Autonomous Systems 60 (2012) 387–395

2.4. Impedance control for grasping and manipulation

The corresponding finger joint in equilibrium can be computed by solving the inverse kinematic function:

As given in (7) and (8) for joint level impedance control and in (13) and (14) for Cartesian level impedance control, the dynamic relationship between the robot and its environment is determined by defining the position, forces and torques, damping factor and stiffness factor. To accomplish a specific manipulation task, the robot should apply suitable forces and torques at certain positions. In our work, the motion of the robotic arm and fingers is planned using the geometrical information sensed from the 3D camera. In the following, it is supposed that the robot is in general always gravity compensated, thus τ excludes torque values generated by gravity force. For the motion planning problem, the planned via points along the trajectory in the configuration space are interpolated and given to the robot as the desired position θs in (8). As it should be executed without any collision with the environment, τext = 0 should be kept during the execution. The values of this external torque vector without the gravity term is observed. If it exceeds a pre-defined threshold, it indicates an impact between the robot and the environment and the movement will be stopped. Further reaction strategies can be implemented using the torque sensing for collision detection and safe human robot interaction, as introduced in [25]. For grasping, the forces at the contact points are optimized, so that the grasp can resist maximum disturbance forces. For manipulation actions, the forces and impedance factors suitable for the task can be experimentally determined. In our grasping and manipulation applications, if the robot is in contact with the object, the arm and hands move only with low velocity. In these situations, keeping the desired forces at planned position steadily is more important than achieving a higher velocity. This leads to the quasistatic assumption that linearizes the relationship between the external forces and the position error. For joint level impedance control in quasi-static case, θ˙ = θ¨ = 0 and (8) reduces to

θs = θ + Kθ−1 τ .


And similar for Cartesian level impedance control: (11) with x¨˜θ = x˙˜θ = 0 reduces to xd = f (θ ) + Kd−1 F


θs and xd are the desired set points, which in the impedance control literature are called a virtual desired set point or equilibrium point. It should only be reached in the case of free motion, i.e. when no external forces act on the robot. We use these simplified impedance control formulas (15) and (16) to execute planned quasi-static grasp and manipulation actions. Grasps can be classified into power grasps and precision grasps, as analyzed by Cutkosky and Howe [26]. The hand collides with the object at multiple contact points during a power grasp, whereas a precision grasp with higher dexterity has contact points only at the fingertips. For a power grasp with optimized forces and torques ω at the contact points, the torque values τ that should be applied by the fingers can be computed using the hand Jacobian matrix with θ0 denoting the finger joint positions at contact found by the grasp planning algorithm: τ = J (θ0 )T ω.


This can be achieved using the joint level impedance control (15) with the quasi-static assumption

θs = θ0 + Kθ−1 J (θ0 )T ω (18) wherein θs is the commanded target position. For a precision grasp with contact points at fingertips, with ω given in the same coordinate system as the fingertip as the end effector of the finger, Cartesian level impedance control can also be used: xd = f (θ0 ) + Kd−1 ω.



θs = f −1 (f (θ0 ) + Kd−1 ω).


As a finger is also a serial manipulator, (19) formulates also the impedance relationship between the position, force and torque for manipulation. For redundant manipulators with multiple solutions from the inverse kinematic function, direct use of (19) maybe insufficient. Besides the possible singularity problem, undesired collision may occur. Furthermore, as the controller internal inverse kinematic function does not have the information about the application, an inverse kinematic function as part of the planning algorithm with (20) should be used. With (18) and (20), a planning algorithm can unify the position, force and torque values together for grasping and manipulation. Based on the impedance control, the computed force and torque values can be given as a fraction of the joint position. In the real execution however, the robot would not stay at the planned position θ0 with contact, but θ = θ0 + θe for joint level impedance and x = x0 + xe for Cartesian case, with an offset of θe and xe , respectively. The offset may be caused by inaccuracy of the used 3D camera, unknown material properties of the object, disturbance forces, etc. It is impossible to model all of the forces acting on the robot to compute this offset exactly. However with torque sensing, we can estimate these position deviations after the robot reaches equilibrium in quasi static case:

θe = θs − θ0 − Kθ−1 τ −1

x e = x s − x 0 − Kd F

(21) (22)

with τ the measured external torque values and F the measured external forces and torques. With these two equations, the effect of the external forces and the sensing inaccuracy can be estimated together and expressed in the form of a position offset, which again can be easily handled by the planning algorithm. After a manipulation action, the estimated θe can be given as part of the target set point of the next manipulation action such that θs = θ0 + θe + Kθ−1 τ . To achieve stable execution and to keep the quasi static assumption, in the pre impact phase, the planned position in contact θ is commanded. At this position, the robot touches the object but applies no forces or torques. θs is then commanded with low velocity, in ideal case, the robot stays in θ and θs will not be achieved. The computed forces and torques will be exerted. In the case of manipulation, further values as θs may be given for e.g. applying forces along a given trajectory. In the post impact phase, such as placing the object down elsewhere, θ is commanded again to reduce the forces, while the object remains immobilized, until the robot and the object do not collide with each other. 3. Applications The depth information acquired by the ToF camera is used in three different grasping and manipulation tasks. The first usage is sensor-based motion planning. The 3D model of the environment is converted to a collision model used by the motion planning algorithm to plan robot motions that avoid collisions. The second usage is grasping of unknown objects. The captured 3D model of an unknown object is used by a grasp planning system, which tries to find force-closure grasps for a robotic hand. The last usage we introduce here has not been investigated intensively before: manipulation of cream-like mass. The surface of the cream is captured using the ToF camera and for a given task, we plan manipulation actions to manipulate the mass. In an ice cream serving scenario for instance, the robot can compute a manipulation motion for an ice cream spoon as tool, to spoon out the right amount of ice cream, as shown in Figs. 1 and 6.


Z. Xue et al. / Robotics and Autonomous Systems 60 (2012) 387–395

Fig. 3. Results of sensor-based motion planning. The table in the scene is a-priori known and segmented from the depth map sensed by the Swiss Ranger, as shown in the leftmost picture. The unknown objects are treated as obstacles and displayed in red. The arm is able to move from the same initial position to the same target without any collision with the environment. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

3.1. Sensor-based motion planning The planning of a collision-free path for the robot is one of the most important problems in robotic research. There exist a large number of algorithms for path planning in the literature. The major limitation of most of the methods is the assumption, that the full knowledge of the world’s geometry is a-priori known to the robot. Although the planned results in this internal world model could work perfectly, the execution on real robotic systems may fail due to the inconsistency between the real world and the assumed model. A ToF camera is used in this work to acquire the information of the environment to overcome this problem. The used motion planning algorithm is the Probabilistic Roadmaps algorithm [27]. Four types of models are used by the motion planning: the static environment model, the robot model with all the arm links and finger links, the localized objects in the scene and the unknown, segmented objects sensed by the ToF camera. The static environment model consists of static and large objects like tables and walls. The 3D models of the links are updated by using forward kinematic with the actual joint positions. After the objects are localized using the stereo camera system, the previously modeled geometries of these known objects are updated after each localization routine, so that collisions between the robot and these objects can be avoided. Before the motion planning algorithm is called, the depth map from the sensor is updated and segmented. The sensed unknown objects are represented by the triangulated meshes from the depth map and used by the collision checking algorithm. Some results of the sensor-based motion planning are shown in Fig. 3. The goal of the experiments is to move the arm from the same initial position to the same target position, with different scenes on the table. The arm may not collide with the environment during its motion. Execution sequences of three experiments are shown in the three rows. The leftmost pictures show the robot model and the table model, used by the motion planning, with detected and segmented unknown environment models displayed in red. In the first experiment, the arm moves above the table to avoid the collision with it. In the other two experiments, the arm is able to avoid the collision with the environment by taking into account the sensed environment model. Although the used Swiss Ranger sensor can capture 3D information at video frame, the motion planning method reported here is only suited in a planning-act fashion and not for realtime applications. The computation needs a few seconds for the

triangulation of the point cloud, detection of the known objects and the segmentation of the unknown objects as introduced in Section 2.2. 3.2. Grasping of unknown objects Automatic grasp planning systems are very important for service robots, which compute the forces that should be exerted onto the object and how those forces can be applied by robotic hands. The developed grasp planning system consists of four parts of the grasp planning process: grasp synthesis, grasp quality measurement, grasp force optimization and grasp execution [28]. Grasp synthesis tries to find the connection between a hand configuration and the contact points on the object surface. This is done in a forward manner, where the hand movement and finger closing processes are computed in the simulation. The continuous collision detection technique [29,30] is used to efficiently find the contact points. Depending on the forces and torques the fingers can apply onto the object, the stability of the grasp can be determined within the grasp wrench space. The system tries to find forceclosure grasps with high grasp quality, and optimizes the grasping forces at the contact points using the determinant maximization as a linear matrix inequality problem [31,32]. These forces are applied using finger joint based impedance control. The developed system follows the forward simulation approach as used in the grasp simulator ‘‘GraspIt!’’ [33]. Required are the geometric model of the object to be grasped and its physical properties like material for friction force estimation, mass and center of mass for grasp stability analysis. Although all of this information can be acquired before the grasp planning is involved, this information is not available for unknown objects. A ToF camera can only capture part of the object surface up to occlusion as shown in Fig. 2. The material can only be assumed, mass and center of mass of the object be approximated. The material is assumed to be like metal, with a low friction coefficient of 0.2 for grasp planning. This way, the grasping forces contribute more than the friction forces for a so computed force-closure grasp. The mass of the object is assumed to be 100 g and the center of mass to be the geometric center of the captured object surface. At the runtime, after the scene is captured by the ToF camera and an unknown object is selected to be grasped, the point cloud of the object is decomposed using superquadrics [34] and minimum volume bounding boxes [35] to generate the grasping directions. In

Z. Xue et al. / Robotics and Autonomous Systems 60 (2012) 387–395


Fig. 4. Automatically planned grasps for unknown objects in the scene. The objects are modeled using a ToF camera.

Fig. 5. Left: Motorized ice spoon mounted on the DLR/HIT 5-finger hand. Right: Circular segment used for computation of intrusion depth.

this grasp simulation, the modeled robotic hand is set to a preshape with finger postures at a starting position and along a grasping direction toward the object. The triangulated models from the ToF camera are converted to internal collision models for the collision checking algorithm. The other objects are treated as obstacles, to avoid collision during the grasping process. The planned grasps for unknown objects are presented in Fig. 4. 3.3. Manipulation of cream-like mass Manipulation of cream-like mass is an extension of the skill set for a service robot beyond the pick-and-place domain. An exemplary scenario is an autonomous ice cream serving robot as shown in Fig. 5(a). In this task, it creates equally sized ice cream scoops from an ice cream pan. The ice cream scoop as depicted in Fig. 5(a) has the shape of a hemisphere. It can be used to create an ice cream scoop by moving it along the surface while keeping the tool tip in the ice cream. To perform this task, we use a 3D camera to capture the surface of the ice cream and use impedance control to smoothly scoop the ice cream. The main challenge is that the material to be manipulated is not rigid but deformable, with variable surface and hardness. Further, the viscosity of different ice cream recipes varies in a broad range with the temperature. It should be noticed that the robot will not stay at the contact point while applying forces onto the object, as it is the case with rigid bodies. Instead, the tool will be inserted into the cream-like mass, and apply less force. The surface of the ice cream is segmented from the geometry data captured by the 3D camera. It is further filtered using a median filter, which smooths the observed surface and removes outliers. The resulting data can be seen in Fig. 7. The trajectory along the surface is generated for the tip of the ice cream scoop. The orientation of the tool tip is set to be perpendicular to trajectory, with the handle of the scoop points in the up direction and the opening into the planned direction of the motion. The beginning part of the trajectory has a penetration angle to easily insert the scoop into the ice cream mass, as a retract angle for the end part of the trajectory to keep the ice cream in the scoop. Cartesian impedance control is used to apply force along the trajectory along the captured surface for manipulation of creamlike mass. We have analyzed the volume that the inserted ice cream scoop as a segment of a circle (Fig. 5(b)) to determine the intrusion depth, so that desired amount of ice cream can be created after

the ice cream scoop moves along the trajectory. By planning of the first execution, the forces caused by the ice cream are ignored, as denoted in (15). During the execution of the trajectory, the position offset is computed using the real torque values and the averaged position offset is used by the next trajectory planning, as described in (22). This way, modeling of the affecting forces onto the ice cream scoop is avoided. It is estimated between two trajectory executions and in the form of a position offset, compensated using the impedance control. The impedance control should move the scoop through the ice on the calculated path while being compliant enough to not cause any damage when reaching the bottom or sides of the ice cream pan. Therefore the stiffness in the upward direction is chosen only as big as it needs to be to intrude into the ice cream, while being hard in the direction of the movement. Also the stiffness on the axis orthogonal to the movement trajectory is set hard, so that the movement forms a straight line even when there are inconsistencies in the rigidity of the ice. The stiffness parameter Kd used for the ice cream serving scenario is {3000 N/m, 1000 N/m, 1000 N/m, 50 N m/rad, 300 N m/rad, 300 N m/rad}T . The computed trajectory from 3D data has to be transformed to arm joint values for execution. To keep the resulting arm motion steady, linear interpolation in Cartesian space is performed to avoid large distances between the resulting via points. The used arm is redundant and the inverse kinematic is under-determined with infinite solutions. The problem to find a continuous arm motion along this trajectory, with the scoop applying forces onto the ice cream surface, is formulated as an optimization problem. An arm motion trajectory in configuration space is searched in all sampled inverse kinematic solutions while minimizing the sum of the distances between joint configurations. At each joint configuration, the collision between the robot and the environment is also checked. Only if the arm is collision free at a configuration, it will be considered by the optimization step. This guarantees the collision-freeness during the whole resulting arm motion. The distance between two configurations is defined as the maximum difference between a joint in the two configurations. This approach avoids major changes in joint values and keeps the generated motion close enough to the desired Cartesian trajectory. The execution of the manipulation trajectory is presented in Fig. 6. The system as depicted in Fig. 1 has been demonstrated at the Automatica 2010 trade fair in Munich. Within four days, approximately 250 scoops of different kinds of ice cream have been served to visitors during the fair. 4. Conclusion In this article, our approaches for sensor-based motion planning, grasp planning and manipulation planning are introduced. The used sensor is a ToF camera, mounted directly above the robot workspace. The ToF camera helps the robot to sense its 3D environment. It is shown that a calibration procedure is necessary which is capable of reducing the measurement errors of the sensor. An efficient segmentation step involving OpenGL rendering allows the modeled scene to be fused with the a-priori known information. Impedance control is intensively used to compensate the remaining sensor errors and to execute computed forces for grasping and manipulation.


Z. Xue et al. / Robotics and Autonomous Systems 60 (2012) 387–395

Fig. 6. Execution of ice cream scooping, the manipulation trajectory is planned from the sensed ice cream surface.

Fig. 7. Three sorts of ice cream, and their surfaces sensed by a ToF camera after segmentation and calibration.

Acknowledgments The research leading to these results has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 216239. References [1] F. Blais, Review of 20 years of range sensor development, Journal of Electronic Imaging 13 (1) (2004) 231–243. [2] P. Newman, G. Sibley, M. Smith, M. Cummins, A. Harrison, C. Mei, I. Posner, R. Shade, D. Schröter, L. Murphy, W. Churchill, D. Cole, I. Reid, Navigating, recognising and describing urban spaces with vision and laser, The International Journal of Robotics Research 28 (2009). [3] P. Newman, D. Cole, K.L. Ho, Outdoor SLAM using visual appearance and laser ranging, in: IEEE International Conference on Robotics and Automation, ICRA, Orlando, Florida, USA, 2006. [4] R. Bischoff, J. Kurth, G. Schreiber, R. Koeppe, A. Albu-Schäffer, A. Beyer, O. Eiberger, S. Haddadin, A. Stemmer, G. Grunwald, G. Hirzinger, The kukadlr lightweight robot arm-a new reference platform for robotics research and manufacturing, in: International Symposium on Robotics, ISR2010, Munich, Germany, 2010. [5] Z. Chen, N. Lii, T. Wimboeck, S. Fan, M. Jin, C. Borst, H. Liu, Experimental study on impedance control for the five-finger dexterous robot hand DLR-HIT II, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2010, pp. 5867–5874. [6] C. Borst, M. Fischer, G. Hirzinger, Calculating hand configurations for precision and pinch grasps, in: IEEE/RSJ International Conference on Intelligent Robots and System, vol. 2, 2002, pp. 1553–1559. doi:10.1109/IRDS.2002.1043976. [7] K. Huebner, D. Kragic, Selection of robot pre-grasps using box-based shape approximation, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2008, pp. 1765–1770. [8] C. Ferrari, J. Canny, Planning optimal grasps, in: IEEE International Conference on Robotics and Automation, 1992, pp. 2290–2295. [9] C. Dune, E. Marchand, C. Collowet, C. Leroux, Active rough shape estimation of unknown objects, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, 2008, pp. 3622–3627. [10] V. Lippiello, B. Siciliano, L. Villani, Interaction control of robot manipulators using force and vision, International Journal of Optomechatronics 2 (3) (2007). [11] V. Lippiello, F. Ruggiero, B. Siciliano, L. Villani, Preshaped visual grasp of unknown objects with a multi-fingered hand, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE, pp. 5894–5899. [12] T. Yoshikawa, M. Koeda, H. Fujimoto, Shape recognition and grasping by robotic hands with soft fingers and omnidirectional camera, in: IEEE International Conference on Robotics and Automation, IEEE, 2008, pp. 299–304. [13] Y. Zhou, B. Nelson, B. Vikramaditya, Fusing force and vision feedback for micromanipulation, in: IEEE International Conference on Robotics and Automation, vol. 2, IEEE, 1998, pp. 1220–1225.

[14] N. Hogan, Impedance control — an approach to manipulation. I — theory. II — implementation. III — applications, ASME Transactions, Journal of Dynamic Systems, Measurement, and Control 107 (1985) 1–24. [15] L. Biagiotti, H. Liu, G. Hirzinger, C. Melchiorri, Cartesian impedance control for dexterous manipulation, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, vol. 4, 2003, pp. 3270–3275. doi:10.1109/IROS.2003.1249660. [16] A. Albu-Schaffer, C. Ott, G. Hirzinger, A passivity based cartesian impedance controller for flexible joint robots — part II: full state feedback, impedance design and experiments, in: IEEE International Conference on Robotics and Automation, vol. 3, 2004, pp. 2666–2672. [17] C. Ott, A. Albu-Schaffer, A. Kugi, S. Stamigioli, G. Hirzinger, A passivity based cartesian impedance controller for flexible joint robots — part I: torque feedback and gravity compensation, in: IEEE International Conference on Robotics and Automation, vol. 3, 2004, pp. 2659–2665. [18] C. Ott, Cartesian Impedance Control of Redundant and Flexible-Joint Robots, in: Springer Tracts in Advanced Robotics, vol. 49, 2008. [19] G. Bradski, The OpenCV Library, Dr. Dobb’s Journal of Software Tools. [20] K.S. Arun, T.S. Huang, S.D. Blostein, Least-squares fitting of two 3-d point sets, IEEE Transactions on Pattern Analysis and Machine Intelligence 9 (5) (1987) 698–700. [21] T. Kahlmann, F. Remondino, H. Ingensand, Calibration for increased accuracy of the range imaging camera swissranger, in: ISPRS Commission V Symposium Image Engineering and Vision Metrology, 2006. [22] J. Kuehnle, Z. Xue, T. Grundmann, A. Verl, S. Ruehl, R. Eidenberger, J.M. Zoellner, R.D. Zoellner, R. Dillmann, 6d object localization and obstacle detection for collision-free manipulation with a mobile service robot, in: 14th International Conference on Advanced Robotics, ICAR, 2009. [23] T. Grundmann, R. Eidenberger, R.D. Zoellner, Z. Xue, S. Ruehl, J.M. Zoellner, R. Dillmann, J. Kuehnle, A. Verl, Integration of 6d object localization and obstacle detection for collision free robotic manipulation, in: IEEE International Symposium on System Integration, SII, 2008. [24] M. Spong, Modeling and control of elastic joint robots, Journal of Dynamic Systems, Measurement, and Control 109 (1987) 310. [25] A.D. Luca, A. Albu-Schaffer, S. Haddadin, G. Hirzinger, Collision detection and safe reaction with the dlr-iii lightweight manipulator arm, in: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2006, pp. 1623–1630. [26] M.R. Cutkosky, R.D. Howe, Human Grasp Choice and Robotic Grasp Analysis, Springer-Verlag New York, Inc., New York, NY, USA, 1990, 5–31. [27] M. Saha, G. Sanchez, J. Latombe, Planning multi-goal tours for robot arms, IEEE International Conference on Robotics and Automation, 2003. [28] Z. Xue, A. Kasper, J.M. Zoellner, R. Dillmann, An automatic grasp planning system for service robots, in: 14th International Conference on Advanced Robotics, ICAR, 2009. [29] X. Zhang, S. Redon, M. Lee, Y.J. Kim, Continuous collision detection for articulated models using taylor models and temporal culling, ACM Transactions on Graphics (Proceedings of SIGGRAPH 2007) 26 (3) (2007). [30] Z. Xue, P. Woerner, J.M. Zoellner, R. Dillmann, Efficient grasp planning using continuous collision detection, in: IEEE International Conference on Mechatronics and Automation, ICMA, 2009, pp. 2752–2758.

Z. Xue et al. / Robotics and Autonomous Systems 60 (2012) 387–395 [31] M. Buss, H. Hashimoto, J. Moore, Dextrous hand grasping force optimization, IEEE Transactions on Robotics and Automation 12 (1996) 406–418. [32] L. Han, J. Trinkle, Z. Li, Grasp analysis as linear matrix inequality problems, IEEE Transactions on Robotics and Automation 16 (2000) 663–674. [33] A. Miller, P. Allen, Graspit! a versatile simulator for robotic grasping, IEEE Robotics & Automation Magazine 11 (4) (2004) 110–122. [34] C. Goldfeder, P. Allen, C. Lackner, R. Pelossof, Grasp planning via decomposition trees, in: IEEE International Conference on Robotics and Automation, 2007, pp. 4679–4684. [35] K. Huebner, S. Ruthotto, D. Kragic, Minimum volume bounding box decomposition for shape approximation in robot grasping, in: IEEE International Conference on Robotics and Automation, 2008, pp. 1628–1633.

Zhixing Xue received his Computer Science degree from the University of Karlsruhe in 2005. He is with FZI since 2006. His main research is directed toward grasp and manipulation planning for multi-fingered robotic hands and the software development of two-armed manipulation.

Steffen W. Ruehl received his degree in Computer Science from the University of Karlsruhe in 2007. His main research area is the planning and execution of manipulation tasks for bimanual robots in a variable environment.


Andreas Hermann received his degree in Computer Science from the University of Karlsruhe in 2009. His main research area is the mobile manipulation and cooperation between mobile systems.

Thilo Kerscher finalized his studies in Electrical Engineering and Information Technology at the University of Karlsruhe in 2002. Since November 2002 he is employed as a member of the scientific staff, also co-leading the group Interactive Diagnosis and Service Systems (IDS) at the FZI. He specialized in the fields of feedback control, automation and autonomous service robotics.

Ruediger Dillmann is a Professor of the Department of Computer Science at the University Karlsruhe, Germany. Ruediger Dillmann received a Ph.D. in 1980 and his habilitation in 1986 at the University of Karlsruhe. Since 1987 he is Professor of the Department of Computer Science and since 2001 head of the research group, Humanoids and Intelligence Systems Laboratories (HIS) at the University of Karlsruhe. Since 2002 he is also president of the Research Center for Information Technology (FZI), Karlsruhe, Germany. As a leader of these two institutes, Ruediger Dillmann supervises several research groups in the areas of robotics with special interest in intelligent, autonomous and mobile robotics, machine learning, machine vision, man–machine interaction, computer science in medicine and simulation techniques.