PreprintPDF Available

Demonstration-guided Optimal Control for Long-term Non-prehensile Planar Manipulation

December 2022

December 2022

License
CC BY 4.0

Authors:

Teguh Santoso Lembono

Idiap Research Institute

Sylvain Calinon

Idiap Research Institute

Preprints and early-stage research may not have been peer reviewed yet.

Long-term non-prehensile planar manipulation is a challenging task for robot planning and feedback control. It is characterized by underactuation, hybrid control, and contact uncertainty. One main difficulty is to determine contact points and directions, which involves joint logic and geometrical reasoning in the modes of the dynamics model. To tackle this issue, we propose a demonstration-guided hierarchical optimization framework to achieve offline task and motion planning (TAMP). Our work extends the formulation of the dynamics model of the pusher-slider system to include separation mode with face switching cases, and solves a warm-started TAMP problem by exploiting human demonstrations. We show that our approach can cope well with the local minima problems currently present in the state-of-the-art solvers and determine a valid solution to the task. We validate our results in simulation and demonstrate its applicability on a pusher-slider system with real Franka Emika robot in the presence of external disturbances.

System setup where the robot pushes an object with contact switching.

…

Kinematics of pusher-slider system allowing face switching.

…

Convergence curve Fig. 4: Solving time

…

Pushing task with face switching. The manipulator starts from (a) and selects an optimal face (orange) and contact point (b) for pushing, until reaching the planned face switching point (c). Next, the manipulator changes to face (d) and touches the object again at the selected point (e), followed by the next phase of pushing (f), until reaching the final target (g). This example is for N s = 1. (d)∼(f) should be repeated if N s > 1. The colored line is used to express the current active face, and the black arrow in (d) represents the face switching process.

…

Planar pushing with face switching. Both simulation and experimental results reach final targets within tolerance.

…

Figures - available via license: Creative Commons Attribution 4.0 International

Content may be subject to copyright.

Available via license: CC BY 4.0

Content may be subject to copyright.

Demonstration-guided Optimal Control

for Long-term Non-prehensile Planar Manipulation

Teng Xue1,2, Hakan Girgin1,2, Teguh Santoso Lembono1,2, and Sylvain Calinon1,2

Abstract— Long-term non-prehensile planar manipulation is

a challenging task for robot planning and feedback control. It

is characterized by underactuation, hybrid control, and contact

uncertainty. One main difﬁculty is to determine contact points

and directions, which involves joint logic and geometrical rea-

soning in the modes of the dynamics model. To tackle this issue,

we propose a demonstration-guided hierarchical optimization

framework to achieve ofﬂine task and motion planning (TAMP).

Our work extends the formulation of the dynamics model of

the pusher-slider system to include separation mode with face

switching cases, and solves a warm-started TAMP problem by

exploiting human demonstrations. We show that our approach

can cope well with the local minima problems currently present

in the state-of-the-art solvers and determine a valid solution to

the task. We validate our results in simulation and demonstrate

its applicability on a pusher-slider system with real Franka

Emika robot in the presence of external disturbances.

I. INTRODUCTION

With the trend of labor shortage and aging population,

robots are required for increasingly more complicated tasks,

moving beyond the typical pick-and-place tasks to non-

prehensile manipulation, which refers to manipulating with-

out grasping but with mutual interaction. This requires real-

time contact identiﬁcation and adaptation. In this paper,

we focus on long-term non-prehensile planar manipulation,

which concerns achieving reliable non-prehensile planar ma-

nipulation with a long-term horizon, involving joint logistic

and geometric planning and feedback control over diverse

interaction modes and face switches. For example, to push

an object, a prerequisite is to decide how much force should

be applied, and which point to push. Moreover, in some

cases such as pushing an object with small distance but

large rotation, relying on a single ﬁxed face is not feasible.

Therefore, a sequence of face switching is required, as well

as contact mode schedule resulting from Coulomb friction.

To achieve non-prehensile planar manipulation, four main

challenges appear as follows:

1.Hybrid Dynamics. The dynamics of a pusher-slider sys-

tem depends on the current interaction mode and contact face

between the pusher and the slider. In this paper, we consider

not only the motion involving various interaction modes, e.g.,

separation, sticking, sliding up, and sliding down, but also

the switching between the contact faces, i.e., left, bottom,

right, and up (unlike previous works [1], [2] that only work

with a single face). The transitions between different faces

and the separation to the other three contacting modes pose

1Idiap Research Institute, Martigny, Switzerland (e-mail:

ﬁrstname.lastname@idiap.ch)

2´

Ecole Polytechnique F´

ed´

erale de Lausanne (EPFL), Switzerland

Fig. 1: System setup where the robot pushes an object with contact

switching.

important difﬁculty for gradient-based optimization methods.

2.Underactuation. The contact force between the pusher

and the slider is constrained within a motion cone, which

makes it impossible to exert arbitrary acceleration on the

object to achieve omnidirectional movement.

3.Long-horizon TAMP. Due to the characteristics of hybrid

dynamics and underactuation, it is important to reason over

long horizon using both logic and geometric descriptors,

where the logic variables relate to contact modes and faces

and the geometric variables relate to contact points, slider

states, switching points, and control commands.

4.Contact Uncertainty. Arising from the frictional contact

interactions between the pusher and the slider, as well as

between the slider and the table, the contact is hard to

be modeled precisely, therefore a controller enabling online

contact adaptation is required.

To address these challenges, we propose a hybrid frame-

work based on optimization and learning from human

demonstrations. An interface is ﬁrst built to collect human

demonstrations of the pushing task. Given a speciﬁc target

conﬁguration, we use k-nearest neighbor (k-NN) algorithm to

retrieve a demonstration trajectory closest to the target con-

ﬁguration, and we use it to formulate the soft constraints of

a hierarchical optimal control problem. The optimal control

problem is solved ofﬂine to produce the optimal trajectory

and the optimal control commands. A feedback controller

is then used to adapt to the contact uncertainty online by

following the ofﬂine trajectory.

The main contribution of our paper is an approach to

solve the complete long-term non-prehensile manipulation

task by covering all of the interaction modes and the face

arXiv:2212.12814v1 [cs.RO] 24 Dec 2022

switching cases, with the ability of ofﬂine planning and

online tracking. This is achieved by introducing human

demonstration into the optimal control problem. By using

demonstrated continuous variables to express the discrete

variables implicitly, we can successfully project traditional

TAMP formulation into geometric ﬁeld. A demonstration-

guided hierarchical optimization framework is then proposed,

allowing the robot to obtain (sub)optimal solutions very

quickly. A real-time feedback controller is ﬁnally proposed

to replan the trajectory, by compensating for model mismatch

and contact uncertainty when interacting with the real phys-

ical world.

II. REL ATED WORK

In this section, we discuss some related work on dynamics

modeling, TAMP, and robot learning from demonstration.

Non-prehensile manipulation has been widely studied as

a challenging task for model-based planning and control [3],

with the pusher-slider system as one of the most prominent

example. Under the assumption of quasi-static interaction [4]

and ellipsoidal limit surface [5], Lynch et al. put forward

a set of analytical methods to compute velocities based on

the motion cone [6], which was previously studied in [7]

to distinguish different sticking and sliding behaviors with a

friction cone. Recently, Hogan et al. reformulated the motion

equation as a piecewise function to describe hybrid dynamics

[1]. In [8], they instead expressed the limit surface in convex

quadratic form to simplify the previous equation as a more

general form. Nevertheless, the motion equations mentioned

above are all assuming that the pushing face is ﬁxed, which

is too much constrained for long-term manipulation, since

the pusher will lose the chance to adjust the contact point

and face during pushing. Therefore, an extended dynamics

model is required to provide the pusher with more ﬂexible

choices on the contact conﬁguration.

A hybrid framework with planning and feedback control

is important to achieve long-term non-prehensile planar ma-

nipulation. Optimally unifying hybrid interaction modes and

geometric variables is still an open challenge. For instance,

in [1], a mixed integer programming (MIP) was introduced

into a model predictive controller (MPC) to incorporate the

selection of interaction modes, i.e., sticking and sliding, into

a common optimization framework. Due to the non-convex

nature of integer variables, the computation can be time-

consuming and not feasible for high-frequency feedback

controller. An ofﬂine mode schedule learning was later

proposed in [8], using a 3-layer mode sequence classi-

ﬁer. To train the classiﬁer, a large dataset was built by

a random state generator. A drawback of this approach is

that a higher number of discrete modes and contact faces

would require the dataset volume to increase exponentially,

leading to more time-consuming data generation process.

Moreover, the works mentioned above are both based on

short-horizon MPC, which is capable of online feedback

control but shortsighted for long-term planning, which typi-

cally does not allow for separation mode and face switching

cases. Alternatively, a mathematical program with comple-

mentarity constraints (MPCC) was proposed in [2], which

was introduced into a trajectory optimization framework to

replace integer variables, showing faster convergence for

planning and recovery from external disturbance. However,

the formulation of MPCC is based on the assumption of

continuous mode transitions, which is not suitable for the

separation cases appearing in long-term manipulation, be-

cause the instantaneous conversion from separation to contact

is physically infeasible. Thus, a framework focusing on non-

prehensile manipulation with long-term TAMP fashion is

necessary.

To cope with this issue, sampling-based methods have

been used to build a search tree to optimize the symbolic

nodes with geometric variables [9], [10], [11], [12]. In the

ﬁeld of non-prehensile manipulation, a tree with predeﬁned

depth was used in [13], corresponding to the number of

hybrid switches to optimize over a ﬁxed contact modes se-

quence. A hybrid Differential Dynamic Programming (DDP)

algorithm with a linear form was simultaneously used for

geometric reasoning. However, the method was validated

only with a simpliﬁed setup, without considering sliding and

separation mode, and without considering optimal viapoints

for face switching. Hereby, with additional logic variables,

the nodes in the search tree would exponentially grow, yield-

ing exponential computation. Moreover, although expensive

sampling computation can be resolved by multiple threads

with a powerful machine, there is no guarantee that a feasible

solution can be solved in a ﬁnite time.

In contrast to such long-term TAMP formulations, humans

showcase an impressive agility at exploiting (sub)optimal

strategies, especially in terms of discrete variables, by re-

lying on past experience. In this direction, Learning from

Demonstration (LfD) has been studied in robotics for a long

time and has proven to be an efﬁcient way to solve motion

planning challenges [14], [15]. LfD aims to extract motion

features from only few demonstrations and then generalize

the learned tasks to new situations. Many algorithms have

been proposed to encode human demonstrations, such as Dy-

namic Movement Primitive (DMP) [16], Gaussian Mixture

Regression (GMR) [17] and task-parameterized probabilistic

models [18]. Among these techniques, K-nearest neighbor

(k-NN) [19] is the simplest, yet remaining a powerful tool

for discrete data classiﬁcation [20].

III. DYNAMICAL SYS TEM

In this section, the hybrid dynamics of the pusher-slider

system is described. We introduce the kinematics model (Sec.

III-A) ﬁrst, and then extend motion cone (Sec. III-B) and

motion equation (Sec. III-C) by including separation mode

and face switching mechanism.

A. Kinematics

Fig. 2 shows the kinematics of the pusher-slider system.

The red and blue circles represent the pusher at the initial

face and the face after contact switching. The pose of the

slider is deﬁned as qs= [x y θ]>w.r.t. the global frame Fg,

TABLE I: Constraints of different interaction modes

Sticking Sliding Up Sliding Down Separation

{vf∈Ω} ∩ {qp∈ψ} {Ω<vf<φ} ∩ {qp∈ψ} {φ<vf<Ω} ∩ {qp∈ψ} {vf/∈φ} ∪ {qp/∈ψ}

vt f

vn f

≤γup ,vtf

vn f

≥γdn ,vnf >0,

|px|=rs+rp,py≤rs.

vt f

vn f

>γup ,vnf >0,

|px|=rs+rp,py≤rs.

vt f

vn f

<γdn ,vnf >0,

|px|=rs+rp,py≤rs.

∨vn f <0,

∨|px|>rs+rp,

∨py>rs.

where xand yare the Cartesian coordinates of the center of

mass, and θis the rotation angle around the vertical axis.

The contact position between the pusher and the slider is

described as qp= [pxpy]>w.r.t. the current slider frame Ff

after face switching, while qf=Rθfqpis the expression of

qpin the initial slider frame Fs, and θfis the rotation angle

from Fsto Ff. The input of this system is the acceleration

of the pusher u= [ ˙vn˙vt]>, which is resolved in Fs, while

v= [vnvt]>is the pusher velocity deﬁned in frame Fs, and

vf=R>

θfv= [vn f vt f ]>is the expression of vin Ff.

Fig. 2: Kinematics of pusher-slider system allowing face switching.

B. Generalized Motion Cone

Based on the quasi-static approximation and ellipsoidal

limit surface assumption [1], [4], [5], [7], a motion cone

is introduced to determine the contact mode given by the

current pusher velocity and contact position. The two bound-

aries of the motion cone are given as vup =1fx+γup fyand

vdn =1fx+γdnfy, resolved in the current slider frame Ff,

with

γup =µpc2−pxpy+µppx2

c2+py2−µppxpy

,(1)

γdn =−µpc2−pxpy−µppx2

c2+py2+µppxpy

,(2)

where µpis the friction coefﬁcient between the pusher and

the slider, and cis the parameter connecting applied force

and the resulting velocity. More details are introduced in [1].

In the previous work [1], [6], the pusher is assumed to

remain on one face selected before and not change during the

execution. We would like to enable more complex pushing

that involves face switching. For that, we introduce another

interaction mode, which we call separation mode. Table I

lists the relationship between state constraints and interaction

modes, where Ωis the set within the boundaries of the

motion cone, φis the space where the pusher goes towards

the slider, and ψis the set where the pusher keeps touching

with the slider. rsand rpare the half length of the slider and

the radius of pusher, respectively.

C. Generalized Motion Equation

We build the generalized motion equation on the basis of

[6] and [1], by including separation mode and contact face

switching, namely

˙x=









g1(x,u),if Sticking,

g2(x,u),if Sliding Up,

g3(x,u),if Sliding Down,

g4(x,u),if Separation,

(3)

where x= [q>

sq>

fv>]>, and

gj(x,u) = 







RθfRθQP j

Rθf[djcj]>

R>

θfv





,Rθ=cosθ−sin θ

sinθcos θ,

Q=1

c2+p2

x+p2

yc2+px2pxpy

pxpyc2+p2

y,

b1=h−py

c2+p2

x+p2

c2+p2

x+p2

yi,b2=h−py+γup px

c2+p2

x+p2

0i,

b3=h−py+γdn px

c2+p2

x+p2

0i,b4= [0 0],

c1= [0 0],c2= [−γup 1],c3= [−γd n 1],c4= [0 1],

d1= [0 0],d2= [0 0],d3= [0 0],d4= [1 0],

P1=I2×2,P2=1 0

γup 0,P3=1 0

γdn 0,P4=02×2.

where j=1,2,3,4 corresponds to sticking, sliding up, sliding

down, and separation mode, respectively.

IV. MET H OD S

In this section, we ﬁrst formulate the long-term non-

prehensile planar pushing task as an optimal control problem

(Sec. IV-A), and then we propose three DDP-related methods

by modifying the initialization and cost function, namely

Demonstration-started DDP (Sec. IV-B), Demonstration-

constrained DDP (Sec. IV-C) and Warm-starting DDP (Sec.

IV-D). The combination of Demonstration-constrained DDP

and Warm-starting DDP leads to a demonstration-guided

hierarchical optimization framework. Finally, we describe the

online tracking controller (Sec. IV-E).

A. Problem formulation

A optimal control problem (OCP) can be described as

min

cT(xT) +

T−1

∑

t=1

ct(xt,ut),(4)

s.t. xt+1=f(xt,ut),(5)

where (4) is the cost function and (5) is the dynamic

equation. Practically, due to the nonlinearity and high degrees

of freedom involved in robotics, numerical optimization is

mostly used to solve this kind of problem. We use DDP

[21] in this paper, which has been shown effective for

this problem [13], [22]. It can also provide local feedback

mechanism that was used to keep the robustness of controller

in [23].

After minimizing the cost-to-go function w.r.t. ∆ut, a local

stabilizing controller can be obtained as

ut=ˆut+Kt(xt−ˆxt) + kt,(6)

where Ktis a feedback gain, and ktis a feedforward term.

Given (4) is a non-convex problem, DDP solves it by

optimizing around the current solution iteratively. The con-

vergence is very sensitive to the initial guess, which means

that it is easy to be stuck at poor local optima if the initial

guess is far away from the optimal solution.

B. Demonstration-started DDP (DS-DDP)

To solve the problem of getting stuck at poor local optima,

we introduce human demonstrations into the basic DDP

as initialization, which is a trendy way for warm-starting

OCP [24]. The demonstrations are designed as continuous

variables, which can implicitly express the discrete variables

instead of explicitly specifying the mode sequence as in

previous work [1], [2], [8]. In this way, we tactfully convert

the joint logic and geometric optimization problem to a

typical geometric optimization problem, which can be solved

much efﬁciently.

The collected human demonstrations are denoted

as [e

qs,e

qf,e

v,e

u], with e

qs= [e

qs0,e

qs1,·· · ,e

qsT],e

qf=

qf0,e

qf1,·· · ,e

qfT], where e

qst∈R3and e

qft∈R2represent the

state of the slider and the pusher at timestep t, respectively.

v= [e

v0,e

v1,·· · ,e

vT−1]∈R2and e

u= [e

u0,e

u1,·· · ,e

uT−1]∈R2

denote the velocity and acceleration at each timestep.

Given a target q∗

s, we use k-NN to select the index of j∗

with the closest demonstration e

qsTto the target in the task

space by evaluating

j∗=min

j∈Sdist(e

sT,q∗

s),(7)

where S={j:j∈ {0,1,·· · ,nd},ndis the number of

demonstrations, and

dist(x,y) = (

∑

r=1

|xr−yr|p)1/p,(8)

where d is the dimension of slider state, and p=2.

The cost function of DS-DDP is deﬁned as:

c1=cre +crg +cbd,(9)

with

cre = (µT−xT)>QT(µT−xT),crg =

T−1

∑

t=0

(u>

tRut),

cbd =

T−1

∑

t=0

(fcut(ut,ul)>Kf cut (ut,ul)),

where cre is the reaching cost, and crg,cbd are the regularizer

and boundary penalizer of control commands. ulis the

predeﬁned bounding box of u.fcut is a soft-thresholding

function. The initial guess u0=e

uis drawn from human

demonstrations directly.

Although this method seems like an effective warm-

starting method, it is restricted by the number of acquired

demonstrations.

C. Demonstration-constrained DDP (DC-DDP)

To alleviate the problem mentioned above, we propose to

use demonstration as the soft constraints of control com-

mands in OCP. It is achieved by designing the cost function

c2=cre +crg +cbd +csw +cve +cac,(10)

with

csw =

∑

n=t0

((µn−xn)>Qn(µn−xn)),

cve =

T−1

∑

t=0

((e

vt−vt)>Rdv(e

vt−vt)),

cac =

T−1

∑

t=0

((e

ut−ut)>Rdu(e

ut−ut)),

where csw,cve and cac are desinged to follow the demon-

strated face switching strategy, pusher velocity and pusher

acceleration. n= [t0,··· ,tN]is the timestep when the contact

face switches, µ= [e

v>]>is the state of the selected

demonstration, and e

vt,e

utare the demonstrated velocity and

acceleration at timestep t.

D. Warm-starting DDP (WS-DDP)

Demonstration-constrained DDP shows good performance

and can avoid poor local optima, but in order to further im-

prove its convergence properties, we propose a hierarchical

optimization framework, where the solution of DC-DDP is

used to initialize another DDP problem that we call Warm-

starting DDP (WS-DDP).

The cost function of WS-DDP is as same as DS-DDP,

allowing it to explore much freely towards the ﬁnal target.

The initial guess u0=u∗

DC is the result of previous DC-DDP.

E. Adaptation to disturbance

Similarly to basic DDP, we can see that for a track-

ing problem, the resulting optimal control policy takes the

same form as (6), characterized by a feedback gain and

a feedforward term. Typically, the optimal control policy

is used to generate control commands at each timestep

based on the current state to stabilize the motion along

the nominal trajectory. However, the frictions and other

unmodeled nonlinearities might cause undesired behaviors,

especially when the error between the current state and the

planned solution kxt−ˆxtk2is too big. To alleviate this

issue, inspired by [23], we propose to use an error ﬁltering

method based on a trust region deﬁned as a ball of radius r

as Bt(r) = {x∈Rn|kx−ˆxtk<r}. By denoting the actual

robot state as x0

t, we ﬁlter the xtin (6) as follows: if

t∈Bt(r), then we take xt=ˆxt, else we take xt=x0

This means that if the error between the actual robot state

and the planned state is small, we can use the feedforward

terms of the controller directly, and if it is big, then we

replan the trajectory using the feedback controller gains.

This allows us to fully exploit the feedback and feedforward

gains computed ofﬂine during the online execution of the

task avoiding expensive recomputations at each timestep.

V. EX PE R IM ENT S

We evaluate in this section the proposed ofﬂine program-

ming method (Sec. V-A) and the proposed online tracking

controller (Sec. V-B).

A. Ofﬂine Programming

In this work, the task space is the horizontal plane

on the table, and it is limited as T={[x,y,θ]:x∈

[

−

25cm,25cm],y∈[

−

25cm,25cm],θ∈[

−

π,π]}. We

collected 3 representative demonstrations, which are

[15cm,−10cm,

−

π/2],[0,

−

20cm,π/2],[15cm,−15cm,π/2],

corresponding to Ns=0, Ns=1 and Ns=2, respectively,

where Nsis the number of face switches during

the demonstration. The initial state is deﬁned as

[0 0 0 αpx0 0 0]>, where α=1.3, corresponding to the

separation mode in Sec. III-B, allowing the pusher to select

the contact point at the beginning. The cost function gains are

set to QT=106×diag{1,1,1,106p−5,101−6p,10−3,10−3},

Qn=106×diag{10−3,10−3,10−3,p,(1−p),0,0}, where

p=1 when θf=0 or θf=π, otherwise p=0.

R=K=diag{1,1},Rdu =Rdv =diag{100,100}.

In other works, planar pushing tasks are often formulated

as Mixed-Integer Programming (MIP) problems [1], [8],

[2]. To compare with our method, we use demonstrations

as the initialization for MIP as well, which is called as

DS-MIP. CasADi [25] with the Bonmin solver [26] and

Crocoddyl [27] with the FDDP solver are used separately

to solve DS-MIP and DS-DDP. Fig. 3 and Fig. 4 show

the convergence curve and the solving time of 10 randomly

selected target conﬁgurations. The vertical axis of Fig. 3 is

the norm of the reaching error at each iteration with respect

to the initial error. We can see in these ﬁgures that DS-MIP

can almost get the same result as DS-DDP but needs 10

times more time. This is because of the nonconvex nature of

integer variables. By implicitly expressing integer variables

as demonstrated continuous control commands, TAMP prob-

lems can be solved much more efﬁciently. Moreover, we also

ﬁnd that the proposed DC-DDP and WS-DDP can achieve

better results in relatively longer time (but still acceptable)

compared to DS-DDP. This shows that simply initializing

using the demonstration as done in DS-DDP is not enough,

and constraining the search space using demonstrations as

the soft constraints during early iterations helps to reach a

better solution.

Additionally, we tested the generalization ability of

the proposed demonstration-guided ofﬂine programming

method. A successful ofﬂine programming is deﬁned as:

{xerr <1cm,yerr <1cm,θerr <5◦}, where xerr,yerr and

θerr are the difference between ﬁnal pose and target pose.

100 targets are randomly selected in the task space Tto

test the generalization capability. The statistical results are

listed in Table II. Clearly, with demonstrations, the success

Fig. 3: Convergence curve Fig. 4: Solving time

TABLE II: Performance of DS-DDP, DC-DDP, and WS-DDP for

ofﬂine programming

Method xerr/cm yerr /cm θerr/rad succ rate

DS-DDP 0.24 ±2.22 0.96 ±4.03 0.10 ±0.34 74%

DC-DDP 0.04 ±1.33 0.85 ±1.91 0.02 ±0.08 75%

WS-DDP 0.11 ±1.01 0.56 ±1.63 0.01 ±0.07 84%

Fig. 5: Tracking performance under disturbance. The dashed lines

present the tolerance.

rate signiﬁcantly increases for random-selected targets. By

using only 3 demonstrations, DC-DDP can accomplish 75

out of 100 random targets in task space, also with low mean

and standard deviations for the errors. WS-DDP achieves a

slightly higher success rate based on the DC-DDP solution,

showing that this demonstration-guided hierarchical opti-

mization framework can generalize to unknown targets very

well. Practically, the generalization result doesn’t change a

lot as long as the selected 3 demonstrations are informative.

It would be studied in the future about how to collect

demonstrations more efﬁciently and actively.

B. Online Tracking

For online tracking, we investigated both numerical simu-

lation and real robot experiments. In simulation, we introduce

a disturbance on the state as x=x+from the beginning

to the end, where εx∼U(−xM,xM),εy∼U(−yM,yM),

εθ∼U(−θM,θM), are the components of , drawn from

a uniform distributions. Fig. 5 shows the evolution of the

errors on x,y, and θ, computed as the difference between

the ﬁnal point and the target, for increasing xM,yMand θM.

The tolerance for online tracking is set as {xerr <3cm,yerr <

3cm,θerr <5◦/0.087rad}. We can ﬁnd that the controller can

successfully resist 4cm perturbation for xand y, and 0.117rad

for θ.

(a) Initialization (b) Contact (c) Pushing (d) Face switching (e) Contact (f) Pushing (g) Reaching

Fig. 6: Pushing task with face switching. The manipulator starts from (a) and selects an optimal face (orange) and contact point (b) for

pushing, until reaching the planned face switching point (c). Next, the manipulator changes to face (d) and touches the object again at

the selected point (e), followed by the next phase of pushing (f), until reaching the ﬁnal target (g). This example is for Ns=1. (d)∼(f)

should be repeated if Ns>1. The colored line is used to express the current active face, and the black arrow in (d) represents the face

switching process.

a) Simulation results b) Experimental results

Fig. 7: Planar pushing with face switching. Both simulation and

experimental results reach ﬁnal targets within tolerance.

Then, we tested the proposed method on the real robot

setup (Fig. 1), using a 7-axis Franka Emika robot and a

RealSense D435 camera. The slider (rs=6cm) is a 3D-

printed prismatic object with PLA, lying on a ﬂat plywood

surface, with an Aruco Marker on the top face. A wooden

pusher (rp=0.5cm) is attached to the robot to move the

object. The motion of the object is tracked by the camera

at 30 HZ, and the feedback controller runs at 100 HZ,

with a low-level Cartesian impedance controller (1000 HZ)

actuating the robot.

In this experiment, we tried one line tracking task with

disturbance and two face switching tasks. Both simulated

and experimental results achieve the targets within speciﬁed

tolerance (see accompanying video). The trajectories in Fig.

7 correspond to one of the task requiring one face switching

to push the object from [0,0,0]to [20cm,

−

20cm,π/2]. The

control strategy is intuitive: pushing it from the left face to

slightly adjust the pose and then changing to the top face

for the next phase of pushing. Fig. 7-(a) is the simulation

trajectory, which is generated by using PyBullet [28], while

Fig. 7-(b) shows the real robot trajectory. It is observed that

these two trajectories are signiﬁcantly different, because of

the different friction parameters in the two different worlds.

Still, both can overcome the uncertainty to reach the ﬁnal

target. Despite the existence of unstructured elements such

as differences in the visual system and the robot controller,

several assumptions of the dynamics model, as well as

immeasurable friction, the feedback controller is able to

cope with these different mismatches and track the reference

trajectory successfully. Fig. 6 shows the keyframes of the

pusher pushing the slider toward the target, indicating that

7 steps are needed with the face switching strategy. Another

ﬁnal target conﬁguration, [5cm,

−

18cm,π/5], which requires

two face switches, is additionally shown in the video.

VI. DISCUSSION, L IMITATIONS AND CONCLUSION

In this paper, we propose to add separation modes and

face switching mechanisms to the problem of pushing ob-

jects on a planar surface. We showed that by introducing

human demonstrations, the typical TAMP problem can be

expressed as a classical geometric optimization problem,

which is much more efﬁcient to be solved. With the proposed

demonstration-guided hierarchical optimization framework,

we demonstrated signiﬁcantly better results in terms of

generalization and precision compared to the state-of-the-art

methods. With more demonstrations, the proposed approach

has the potential to precisely reach almost any point in the

task space. Additionally, we developed a feedback controller

based on DDP feedback gains to replan the trajectory for on-

line tracking. We tested the combination of these approaches

in both PyBullet simulation and in real robot application,

showing good performance to resist contact uncertainty.

Currently, we are using a feedback controller to track

the ofﬂine trajectory. If the system is subject to large

perturbation such as rotating 180◦, an online Model Pre-

dictive Control (MPC) may still be required. Nevertheless,

our demonstration-guided method is also promising as an

optimizer within MPC to avoid poor local optima.

As future work, we aim to apply our demonstration-guided

approach to a broader range of manipulation tasks (namely,

beyond pushing problems), which requires further study on

how to extract constraints from demonstration, and how to

formulate an optimal control problem based on the extracted

constraints. It would also be relevant to explore extensions

of TAMP problems by introducing similar continuous human

demonstration. Another future work consists in exploiting

human demonstrations in model-based learning strategies to

let the robot automatically reﬁne the pushing model and its

motion.

REFERENCES

[1] F. R. Hogan and A. Rodriguez, “Feedback control of the pusher-slider

system: A story of hybrid and underactuated contact dynamics,” arXiv

preprint arXiv:1611.08268, 2016.

[2] J. P. De Moura, T. Stouraitis, and S. Vijayakumar, “Non-prehensile

planar manipulation via trajectory optimization with complementarity

constraints,” in 2022 IEEE International Conference on Robotics and

Automation, 2022.

[3] M. T. Mason, “Progress in nonprehensile manipulation,” The Interna-

tional Journal of Robotics Research, vol. 18, no. 11, pp. 1129–1141,

1999.

[4] S. Goyal, A. Ruina, and J. Papadopoulos, “Limit surface and moment

function descriptions of planar sliding,” in 1989 IEEE International

Conference on Robotics and Automation. Citeseer, 1989, pp. 794–

795.

[5] S. H. Lee and M. Cutkosky, “Fixture planning with friction,” Journal

of Manufacturing Science and Engineering, Transactions of the ASME,

vol. 113, no. 3, pp. 320–327, 1991.

[6] K. M. Lynch, H. Maekawa, and K. Tanie, “Manipulation and active

sensing by pushing using tactile feedback.” in IROS, vol. 1, 1992, pp.

416–421.

[7] M. T. Mason, “Mechanics and planning of manipulator pushing

operations,” The International Journal of Robotics Research, vol. 5,

no. 3, pp. 53–71, 1986.

[8] F. R. Hogan, E. R. Grau, and A. Rodriguez, “Reactive planar manipula-

tion with convex hybrid mpc,” in 2018 IEEE International Conference

on Robotics and Automation (ICRA). IEEE, 2018, pp. 247–253.

[9] A. Simeonov, Y. Du, B. Kim, F. Hogan, J. Tenenbaum, P. Agrawal, and

A. Rodriguez, “A long horizon planning framework for manipulating

rigid pointcloud objects,” in Conference on Robot Learning. PMLR,

2021, pp. 1582–1601.

[10] M. Toussaint, “Logic-geometric programming: An optimization-based

approach to combined task and motion planning,” in Twenty-Fourth

International Joint Conference on Artiﬁcial Intelligence, 2015.

[11] T. Migimatsu and J. Bohg, “Object-centric task and motion planning

in dynamic environments,” IEEE Robotics and Automation Letters,

vol. 5, no. 2, pp. 844–851, 2020.

[12] R. E. Fikes and N. J. Nilsson, “Strips: A new approach to the appli-

cation of theorem proving to problem solving,” Artiﬁcial intelligence,

vol. 2, no. 3-4, pp. 189–208, 1971.

[13] N. Doshi, F. R. Hogan, and A. Rodriguez, “Hybrid differential

dynamic programming for planar manipulation primitives,” in 2020

IEEE International Conference on Robotics and Automation (ICRA).

IEEE, 2020, pp. 6759–6765.

[14] A. G. Billard, S. Calinon, and R. Dillmann, “Learning from humans,”

in Handbook of Robotics, B. Siciliano and O. Khatib, Eds. Secaucus,

NJ, USA: Springer, 2016, ch. 74, pp. 1995–2014, 2nd Edition.

[15] S. Calinon and D. Lee, “Learning control,” in Humanoid Robotics: a

Reference, P. Vadakkepat and A. Goswami, Eds. Springer, 2019, pp.

1261–1312.

[16] A. J. Ijspeert, J. Nakanishi, H. Hoffmann, P. Pastor, and S. Schaal,

“Dynamical movement primitives: learning attractor models for motor

behaviors,” Neural computation, vol. 25, no. 2, pp. 328–373, 2013.

[17] S. Calinon and D. Lee, “Learning control,” Humanoid Robotics: a

Reference, 2017.

[18] S. Calinon, “Robot learning with task-parameterized generative mod-

els,” in Robotics Research. Springer, 2018, pp. 111–126.

[19] L. E. Peterson, “K-nearest neighbor,” Scholarpedia, vol. 4, no. 2, p.

1883, 2009.

[20] P. Cunningham and S. J. Delany, “K-nearest neighbour classiﬁers-a

tutorial,” ACM Computing Surveys (CSUR), vol. 54, no. 6, pp. 1–25,

2021.

[21] D. Mayne, “A second-order gradient method for determining optimal

trajectories of non-linear discrete-time systems,” International Journal

of Control, vol. 3, no. 1, pp. 85–95, 1966.

[22] Y. Tassa, N. Mansard, and E. Todorov, “Control-limited differential

dynamic programming,” in 2014 IEEE International Conference on

Robotics and Automation (ICRA). IEEE, 2014, pp. 1168–1175.

[23] H. Girgin, J. Jankowski, and S. Calinon, “Reactive anticipatory robot

skills with memory,” in International Symposium on Robotics Research

(ISRR), 2022.

[24] T. S. Lembono, A. Paolillo, E. Pignat, and S. Calinon, “Memory of

motion for warm-starting trajectory optimization,” IEEE Robotics and

Automation Letters, vol. 5, no. 2, pp. 2594–2601, 2020.

[25] J. A. Andersson, J. Gillis, G. Horn, J. B. Rawlings, and M. Diehl,

“Casadi: a software framework for nonlinear optimization and optimal

control,” Mathematical Programming Computation, vol. 11, no. 1, pp.

1–36, 2019.

[26] P. Bonami, M. Kilinc¸, and J. Linderoth, “Algorithms and software for

convex mixed integer nonlinear programs,” in Mixed integer nonlinear

programming. Springer, 2012, pp. 1–39.

[27] C. Mastalli, R. Budhiraja, W. Merkt, G. Saurel, B. Hammoud,

M. Naveau, J. Carpentier, L. Righetti, S. Vijayakumar, and

N. Mansard, “Crocoddyl: An efﬁcient and versatile framework for

multi-contact optimal control,” in 2020 IEEE International Conference

on Robotics and Automation (ICRA). IEEE, 2020, pp. 2536–2542.

[28] E. Coumans and Y. Bai, “Pybullet, a python module for physics sim-

ulation for games,” Robotics and machine learning.[Google Scholar],

2016.

ResearchGate has not been able to resolve any citations for this publication.

Crocoddyl: An Efficient and Versatile Framework for Multi-Contact Optimal Control

Conference Paper

Full-text available

Mar 2020

We introduce Crocoddyl (Contact RObot COntrol by Differential DYnamic Library), an open-source framework tailored for efficient multi-contact optimal control. Crocoddyl efficiently computes the state trajectory and the control policy for a given predefined sequence of contacts. Its efficiency is due to the use of sparse analytical derivatives, exploitation of the problem structure, and data sharing. It employs differential geometry to properly describe the state of any geometrical system, e.g. floating-base systems. Additionally, we propose a novel optimal control algorithm called Feasibility-driven Differential Dynamic Programming (FDDP). Our method does not add extra decision variables which often increases the computation time per iteration due to factorization. FDDP shows a greater globalization strategy compared to classical Differential Dynamic Programming (DDP) algorithms. Concretely , we propose two modifications to the classical DDP algorithm. First, the backward pass accepts infeasible state-control trajectories. Second, the rollout keeps the gaps open during the early "exploratory" iterations (as expected in multiple-shooting methods with only equality constraints). We showcase the performance of our framework using different tasks. With our method, we can compute highly-dynamic maneuvers (e.g. jumping, front-flip) within few milliseconds.

Memory of Motion for Warm-Starting Trajectory Optimization

Article

Full-text available

Feb 2020

Trajectory optimization for motion planning requires good initial guesses to obtain good performance. In our proposed approach, we build a memory of motion based on a database of robot paths to provide good initial guesses. The memory of motion relies on function approximators and dimensionality reduction techniques to learn the mapping between the tasks and the robot paths. Three function approximators are compared: ${k}$ -Nearest Neighbor, Gaussian Process Regression, and Bayesian Gaussian Mixture Regression. In addition, we show that the memory can be used as a metric to choose between several possible goals, and using an ensemble method to combine different function approximators results in a significantly improved warm-starting performance. We demonstrate the proposed approach with motion planning examples on the dual-arm robot PR2 and the humanoid robot Atlas.

Non-prehensile Planar Manipulation via Trajectory Optimization with Complementarity Constraints

Conference Paper

May 2022

Hybrid Differential Dynamic Programming for Planar Manipulation Primitives

Conference Paper

May 2020

Feedback Control of the Pusher-Slider System: A Story of Hybrid and Underactuated Contact Dynamics

Chapter

May 2020

This paper investigates real-time control strategies for dynamical systems that involve frictional contact interactions. Hybridness and underactuation are key characteristics of these systems that complicate the design of feedback controllers. In this research, we examine and test a novel feedback controller design on a planar pushing system, where the purpose is to control the motion of a sliding object on a flat surface using a point robotic pusher. The pusher-slider is a simple dynamical system that retains many of the challenges that are typical of robotic manipulation tasks.

Object-Centric Task and Motion Planning in Dynamic Environments

Article

Jan 2020

We address the problem of applying Task and Motion Planning (TAMP) in real world environments. TAMP combines symbolic and geometric reasoning to produce sequential manipulation plans, typically specified as joint-space trajectories, which are valid only as long as the environment is static and perception and control are highly accurate. In case of any changes in the environment, slow re-planning is required. We propose a TAMP algorithm that optimizes over Cartesian frames defined relative to target objects. The resulting plan then remains valid even if the objects are moving and can be executed by reactive controllers that adapt to these changes in real time. We apply our TAMP framework to a torque-controlled robot in a pick and place setting and demonstrate its ability to adapt to changing environments, inaccurate perception, and imprecise control, both in simulation and the real world.

Reactive Planar Manipulation with Convex Hybrid MPC

Conference Paper

May 2018

CasADi: a software framework for nonlinear optimization and optimal control

Article

Jul 2018

We present CasADi, an open-source software framework for numerical optimization. CasADi is a general-purpose tool that can be used to model and solve optimization problems with a large degree of flexibility, larger than what is associated with popular algebraic modeling languages such as AMPL, GAMS, JuMP or Pyomo. Of special interest are problems constrained by differential equations, i.e. optimal control problems. CasADi is written in self-contained C++, but is most conveniently used via full-featured interfaces to Python, MATLAB or Octave. Since its inception in late 2009, it has been used successfully for academic teaching as well as in applications from multiple fields, including process control, robotics and aerospace. This article gives an up-to-date and accessible introduction to the CasADi framework, which has undergone numerous design improvements over the last 7 years.

Robot Learning with Task-Parameterized Generative Models

Chapter

Jan 2018

Sylvain Calinon

Task-parameterized models provide a representation of movement/behavior that can adapt to a set of task parameters describing the current situation encountered by the robot, such as location of objects or landmarks in its workspace. This paper gives an overview of the task-parameterized Gaussian mixture model (TP-GMM) presented in previous publications, and introduces a number of extensions and ongoing challenges required to move the approach toward unconstrained environments. In particular, it discusses its generalization capability and the handling of movements with a high number of degrees of freedom. It then shows that the method is not restricted to movements in task space, but that it can also be exploited to handle constraints in joint space, including priority constraints.

Learning from Humans

Article

Jan 2016

This chapter surveys the main approaches developed to date to endow robots with the ability to learn from human guidance. The field is best known as robot programming by demonstration, robot learning from/by demonstration, apprenticeship learning and imitation learning. We start with a brief historical overview of the field. We then summarize the various approaches taken to solve four main questions: when, what, who and when to imitate. We emphasize the importance of choosing well the interface and the channels used to convey the demonstrations, with an eye on interfaces providing force control and force feedback. We then review algorithmic approaches to model skills individually and as a compound and algorithms that combine learning from human guidance with reinforcement learning. We close with a look on the use of language to guide teaching and a list of open issues.

Demonstration-guided Optimal Control for Long-term Non-prehensile Planar Manipulation

Abstract and Figures

Recommended publications

Demonstration-guided Optimal Control for Long-term Non-prehensile Planar Manipulation

Probabilistic Iterative LQR for Short Time Horizon MPC

Projection-based first-order constrained optimization solver for robotics

Tensor Train for Global Optimization Problems in Robotics