Trajectory features estimation for legged robots€¦ · 11/5/2019  · (32) (32) dropout (64)...

Post on 05-Oct-2020

1 views 0 download

Transcript of Trajectory features estimation for legged robots€¦ · 11/5/2019  · (32) (32) dropout (64)...

Trajectory features estimation for legged robots

R. Omar Chavez-Garcia, Jerome Guzzi, Alessandro Giusti, Luca Maria Gambardella

Fig. 1: Left: Simulated ANYmal platform on a challengingterrain. Circle regions represent the sampling area for thetarget pose input. Square area indicates the elevation patchinput. Right: A sample (K, q2, f(γ)) is the outcome of alocal motion from q1 to q2.

I. INTRODUCTION

Robot navigation relies on correctly estimating (throughperception or proprioception) negotiable terrains where therobot can safely traverse. Data-driven approaches for localtraversability estimation (seen as a local planner) has beenpreviously studied for selecting appropriate locomotion com-mands to negotiate challenging trajectories [1], [2].

We explore an approach to solve the problem of localestimation of trajectory features by applying self-supervisedmachine learning methods to learn the mapping between theinputs of a local planner and its outputs. In our approach,inputs consist of a relative target pose and elevation infor-mation surrounding the ANYmal legged robot while outputsindicate if a target pose can be successfully reached andthe cost of doing so in terms of time. The mapping islearned from instances of local motions (trajectories) andtheir outcomes, obtained via simulation.

II. PROPOSED APPROACH

The robot moves in an environment in which it has partialknowledge K. A local motion e = (q1, q2) ∈ C2 is defined bytwo nearby states in the robot’s configuration space C. Whenthe robot try to move from q1 to q2 using a local planner anda controller, it will follow a trajectory γ : [0, T ] → C, withγ(0) = q1 and γ(T ) = q2 (when it successfully reached q2at time T ).

We introduce f(γ) = (f1(γ), . . . , fn(γ)) ∈ F , a n−dimensional description of trajectory features; f should con-tain any information relevant to evaluate the trajectory. For

Authors are with the Dalle Molle Institute for Artificial Intelligence(IDSIA), USI-SUPSI, Lugano, Switzerland. omar@idsia.ch

This work has been conducted as part of ANYmal Research, a communityto advance legged robotics, and was supported by the National Centre ofCompetence in Research Robotics.

example it may denote if the robot has reached a target pose,the duration, the energy consumed or a failure type.

We apply a self-supervised machine approach to model themapping (e,K) 7→ y ≈ E[f(e,K)]. Thus, we collect manyinstances of local trajectories γ by sampling local motionsin different environments and recording their features f(γ).

We assume that relevant features depend only on therelative position of the target q2 with respect to the sourceq1 and the local knowledge around it. Hence, we use alocal coordinate frame, centred at q1, to represent q2 andK. The resulting dataset is composed of samples of the form(K, q2, f1, f2, . . . , fn) (see Figure 1-right).

III. EXPERIMENTAL RESULTS

ANYmal is a state-of-the-art quadruped robot developedfor autonomous operation in challenging environments byETHZ [3] with the ability to walk on rough terrain. We usea simulation of the ANYmal in Gazebo (see Figure 1-left)to learn how well the robot copes with different obstacles.

ANYmal moves on a terrain whose information K isencoded as a heightmap (with 2 cm per pixel resolution),i.e. a regular image with an elevation value associated toeach pixel; in this work we ignore any property that is notgeometrical (like the surface material). ANYmal’s footprintis 80 cm× 60 cm× 70 cm. Local motion knowledge (K) isgiven by 100 px× 100 px patches centred at the robot’slocation and oriented along its heading.

The simulated ANYmal uses a simple local planner and acomplex closed feedback controller to follow the local tra-jectory while compensating for the terrain irregularities [4].The local planner takes as only input the relative target poseq2 = (x, y, θ). For simplicity, we limit our analysis to localmotions that maintain robot’s orientation. For any sampledtrajectory, f ∈ {0, 1} × R+ is composed by the success ofreaching q2 and the duration T of the trajectory.

We collected a training dataset of around 70k samples(74% with success = 1) of the form (K, x, y, success, T ),by randomly spawning the robot on stable poses on differentgenerated maps (similar as in our previous work [5]) andrandomly sampling a relative target point at a distancebetween 15 cm to 50 cm in any direction; in average therobot needs 8 steps and 10 seconds per sample (see Figure 1-right). We employed 12 simulated maps (10m× 10m each)for data gathering containing challenging obstacles such asslopes, bumps, holes, rails and steps.

Figure 2 shows a schematic of the proposed CNN. Thismodel is expected to learn relevant features related to robot’slocomotion. CNN’s architecture is divided in two stages: thefirst one processes the visual input provided by the patch

inputmap

patch K conv

(16)

conv

(16)

conv

(32)

conv

(32)

drop

out

flatte

n/

dens

e(6

4)

inputtarget(x, y)

conc

aten

atio

n

dens

e(6

4)

dens

e(3

2) softmaxoutputsuccess

dens

e(6

4)

dense (1)output T

Fig. 2: ANYmal Convolutional Neural Network architecture:inputs comprise the heightmap patch K and the relative targetgoal (x, y); outputs consist of a soft classification of successfor reaching the target pose, and an estimation of the time toreach such target pose. The model is supervised via two lossfunctions: cross-entropy for the classifier and mean-squarederror for the regressor. The minimized loss value of the modelis the weighted sum of both individual losses.

TABLE I: ANYmal’s model evaluation for success predictionand duration estimation in two datasets. Metrics include AreaUnder the ROC (AUC) for the classification of success andMean Squared Error (MSE) for duration estimation.

dataset (success/failures samples) AUC MSE

evaluation (7300 / 1398) 0.892 1.89rough map (18453 / 185) 0.867 2.35

sample; an additional input represented by the relative targetpoint (x, y) is then added to a second stage. The output ofthe CNN consists: of a success score ysuccess from a softmaxlayer and a duration estimation (in seconds) yT from a denseconnected layer.

Table I show quantitative results of the model’s evalu-ation. We evaluated the trained model with two datasets:an evaluation dataset containing samples from 8 differentheightmaps (similar to those in the training dataset); anda dataset from a particularly rough terrain (10m× 10m)where we executed several paths while recording success andfailure samples. Examples of estimated success and durationvalues are depicted in Figure 3. Success estimations arecoherent with the elevation obstacles around the robot; inboth cases it can be seen that lower success values appeararound the prominent obstacles, i.e. the walls and the holeedges. Duration estimation is also in agreement with thedistance to a target and the obstacles the robot may facetowards such target.

IV. CONCLUSIONS AND PERSPECTIVES

We introduced a data-driven approach to estimate featuresof short trajectories using local knowledge. Results show themodel’s ability to predict the success of the trajectory andits duration. Additionally, we are exploring estimating energyconsumption computed as the time integral of the leg motorspower.

We are currently working in a general approach for robotpath planning: first we employ the proposed local estimatorto predict the outcome of short trajectories from local in-formation, then we use this model to discriminate between

Fig. 3: Examples of estimated values on two patches(100 px× 100 px = 2m× 2m) from the rough map dataset.Left: estimated success when target pose is sampled froma circular region with radius between 15 cm to 50 cm andfixed heading; green indicates a high success score andred otherwise. Right: duration estimation for trajectorieswith success score > 0.6; green colouring indicates shortdurations, pink otherwise. ANYmal’s footprint appears as ablue arrow.

potential edges while building a dense random tree to connectsource and target poses accounting for: traversability risks,and costs derived from estimated duration and energy.

REFERENCES

[1] W. Gao, D. Hsu, W. S. Lee, and S. S. K. Subramanian, “Intention-net:Integrating planning and deep learning for goal-directed autonomousnavigation,” in Conference on Robot Learning, volume 78 of Proc.Machine Learning Research, 2017, pp. 185–194.

[2] L. Wellhausen, A. Dosovitskiy, R. Ranftl, K. Walas, C. Cadena Lerma,and M. Hutter, “Where should i walk? predicting terrain properties fromimages via self-supervised learning,” IEEE Robotics and AutomationLetters, vol. 4, no. 2, pp. 1509–1516, 2019.

[3] M. Hutter, C. Gehring, D. Jud, A. Lauber, C. D. Bellicoso, V. Tsounis,J. Hwangbo, K. Bodie, P. Fankhauser, M. Bloesch, R. Diethelm,S. Bachmann, A. Melzer, and M. Hoepflinger, “Anymal - a highlymobile and dynamic quadrupedal robot,” in IEEE/RSJ InternationalConference on Intelligent Robots and Systems (IROS), 2016, pp. 38–44.

[4] P. Fankhauser, C. D. Bellicoso, C. Gehring, R. Dube, A. Gawel, andM. Hutter, “Free Gait – An Architecture for the Versatile Control ofLegged Robots,” in IEEE-RAS International Conference on HumanoidRobots, 2016.

[5] R. O. Chavez-Garcia, J. Guzzi, L. M. Gambardella, and A. Giusti,“Learning ground traversability from simulations,” IEEE Robotics andAutomation Letters, vol. 3, no. 3, pp. 1695–1702, July 2018.