ArticlePDF Available

High-precision robotic assembly system using three-dimensional vision

SAGE Publications Inc
International Journal of Advanced Robotic Systems
Authors:

Abstract and Figures

The design of a high-precision robot assembly system is a great challenge. In this article, a robotic assembly system is developed to assemble two components with six degree-of-freedoms in three-dimensional space. It consists of two manipulators, a structured light camera which is mounted on the end-effector aside component A to measure the pose of component B. Firstly, the features of irregular components are extracted based on U-NET network training with few labeled images. Secondly, an algorithm is proposed to calculate the pose of component B based on the image features and the corresponding three-dimensional coordinates on its ellipse surface. Thirdly, the six errors including two position errors and one orientation error in image space, and one position error and two orientation errors in Cartesian space are computed to control the motions of component A to align with component B. The hybrid visual servoing method is used in the control system. The experimental results verify the effectiveness of the designed system.
This content is subject to copyright.
Research Article
High-precision robotic assembly system
using three-dimensional vision
Shaohua Yan
1,2
, Xian Tao
1,2
and De Xu
1,2
Abstract
The design of a high-precision robot assembly system is a great challenge. In this article, a robotic assembly system is
developed to assemble two components with six degree-of-freedoms in three-dimensional space. It consists of two
manipulators, a structured light camera which is mounted on the end-effector aside component A to measure the pose of
component B. Firstly, the features of irregular components are extracted based on U-NET network training with few
labeled images. Secondly, an algorithm is proposed to calculate the pose of component B based on the image features and
the corresponding three-dimensional coordinates on its ellipse surface. Thirdly, the six errors including two position
errors and one orientation error in image space, and one position error and two orientation errors in Cartesian space are
computed to control the motions of component A to align with component B. The hybrid visual servoing method is used
in the control system. The experimental results verify the effectiveness of the designed system.
Keywords
3D vision, feature extraction, pose estimation, hybrid visual servoing, robotic assembly system
Date received: 15 February 2021; accepted: 01 June 2021
Topic Area: Vision Systems
Topic Editor: Antonio Fernandez-Caballero
Associate Editor: Grazia Cicirelli
Introduction
With the development of technology, the demand for high-
precision assembly in industrial manufacturing and space
exploration is increasing.
1–3
Industrial assembly devices are
generally divided into two categories. One is the specific
translation and rotation mechanism.
4,5
For example, Luo
et al.
4
used a linear drive mechanism for precision threading
operations. The translation error and rotation error of the
platform reached 3 mm and 0.005, respectively. Yu et al.
5
used the feature constraint relationship between components
to control translation and rotation devices completing com-
ponent assembly simulation. However, the working range of
specific translation and rotation mechanisms is small, and its
flexibility is low. The other is based on a general manipu-
lator.
6,7
For example, Wang et al.
6
added an elastic displace-
ment device to the manipulator to achieve peg-in-hole
assembly, which improved the success rate of each
assembly. Meng et al.
7
realized precise robot assembly for
large-scale spacecraft components based on computer-aided
design models of aircraft components and key geometric
features located by ranging sensors and binocular vision.
Generally, a manipulator has six degree-of-freedoms
(DOFs). Therefore, it is very helpful for manipulator-based
assembly systems to realize high-precision assembly with
six DOFs in three-dimensional (3D) space.
In the robotic assembly system, the target pose is usually
measured with vision-based methods.
8,9
Generally, the
1
Research Center of Precision Sensing and Control, Institute of
Automation, Chinese Academy of Sciences, Beijing, China
2
School of Artificial Intelligence, University of Chinese Academy of
Sciences, Beijing, China
Corresponding author:
Xian Tao, School of Artificial Intelligence, University of Chinese Academy
of Sciences, Beijing, 100080, China.
Email: taoxian2013@ia.ac.cn
International Journal of Advanced
Robotic Systems
May-June 2021: 1–12
ªThe Author(s) 2021
Article reuse guidelines:
sagepub.com/journals-permissions
DOI: 10.1177/17298814211027029
journals.sagepub.com/home/arx
Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License
(https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without
further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/
open-access-at-sage).
point feature, line feature, and circle feature are employed
in the pose estimation methods. For example, in Liu et al.,
10
the end position of the dispensing needle was obtained
through the point feature and then the precision dispensing
operation was completed. In Liu et al.,
11
line features were
used to measure the pose of a long cylindrical component.
In Liu and Xu,
12
a fast and effective circle detection algo-
rithm was proposed for target position estimation. How-
ever, the above pose estimation methods require two or
three cameras placed in different directions. Sun et al.
13
measured the target pose with a camera based on the pro-
jection relationship between the circle and the ellipse. But
the accuracy of this method relies on the ellipse fitting.
Another kind of target pose measurement method with the
structured light camera is becoming popular.
14–16
For
example, Kim et al.
14
accurately estimated the surface nor-
mal vector of the target based on a structured light camera
and then completed the object-grasping task. In Satorres
et al.,
15
the relative position relationship between the
manipulator and the object was obtained through the 3D
data in the 3D camera. Litvak et al.
16
assembled randomly
distributed components based on the depth camera and
convolutional neural network, and the success rate reached
91%. Therefore, pose measurement based on a structured
light camera is a better choice.
Visual servoing methods are very popular in many
applications including automatic assembly systems.
17,18
They are classified as image-based visual servoing,
19
position-based visual servoing,
20
and hybrid visual servo-
ing method.
21
Xu et al.
19
proposed an image-based visual
servoing method, in which point features and line features
are used for position control and attitude control, respec-
tively. Image-based visual servoing has certain robustness
to camera calibration errors and robot model errors.
Through comparative experiments on position-based and
image-based visual servoing systems, Peng et al.
20
found
that position-based visual servoing has a faster conver-
gence speed. What’s more, some advanced control meth-
ods for tracking control of mechanical servo systems help
improve convergence speed. For example, Deng and
Yao
22
designed a high-performance tracking controller
without velocity measurement in electrohydraulic
servomechanisms, which achieves asymptotic tracking
performance when facing time-invariant modeling uncer-
tainties. Aiming at mechanical servosystems with mis-
matched uncertainties, Deng and Yao
23
proposed a
novel recursive robust integral of the sign of the error
control method, which achieves excellent asymptotic
tracking performance. Therefore, it is necessary to com-
bine the advantages of image-based visual servoing and
position-based visual servoing methods to realize the pre-
cision assembly of two components.
The purpose of this article is to achieve precise assembly
of irregular components. A robotic assembly system is
developed to assemble two components with six DOFs in
3D space, which consists of two manipulators and a
structured light camera. Image-space information and 3D
space information acquired by structured light cameras are
effectively combined to measure the pose of component
B. Considering the advantages of image-based and
position-based visual servoing methods, this article pro-
poses a hybrid visual servoing method with higher con-
vergence speed and accuracy. The manipulators can
control the components of different initial positions and
postures for automatic assembly. The main contributions
of this paper are as follows:
1. A robotic assembly system with two manipulators is
developed to assemble two components with six
DOFs in 3D space. The hybrid visual servoing
method combining errors in Cartesian space and
image space is used in the control system.
2. A feature extraction algorithm for the images of
irregular components is proposed, which is based
on U-NET network training with few labeled images.
3. The pose of component B is calculated from the
image features and the corresponding 3D coordi-
nates on its ellipse surface.
The rest of this article is organized as follows. The first
part describes the assembly task and system. Secondly, an
image feature extraction and pose measurement method is
proposed. Then presents a hybrid visual servoing method to
align the two components. The details of the automated
assembly process are also introduced. The experiments and
results with the proposed assembly method are given.
Finally, this article is concluded.
Assembly task and system
Assembly task
The two components to be assembled are shown in Figure 1.
They are metal connectors with an outer diameter of about
43 mm, which are divided into component A and compo-
nent B. As shown in Figure 1(a), the left side is component
A and the right side is component B. There are five groove
areas on the inner side of component B, as shown in
Figure 1(b). The positions and sizes of the grooves are
unevenly distributed. Correspondingly, there are five pro-
truding areas on the upper surface of component A, as
shown in Figure 1(c).
When assembling, it is necessary to align the groove
area of component B to the protruding area of component
A with six DOFs, including 3D position and three-direction
angles. Our task is to realize the precise assembly of these
two components.
Assembly system
The automated precision assembly system is designed as
given in Figure 2. Manipulator 1 is a seven-DOF robot with
a clamping device and component A connected to it. A
2International Journal of Advanced Robotic Systems
structured light camera is fixed at the end of manipulator 1.
Manipulator 2 is a universal robot (UR3) with a gripping
device and component B connected to it.
Manipulator 1 can translate along and rotate around the
X,Y, and Zaxes to align component A to component B. The
poses of manipulator 1 and manipulator 2 can be adjusted
to initialize the pose of component B in the structured light
camera. The computer can control the entire assembly pro-
cess including image capture with the camera, image pro-
cessing, feature extraction, pose estimation, and alignment
and insertion of the two components.
The coordinates are established as shown in Figure 2.
O
R1
X
R1
Y
R1
Z
R1
is the base frame of manipulation 1, O
R2
X-
R2
Y
R2
Z
R2
is the base frame of manipulation 2, O
D
X
D
Y
D
Z
D
is
the end-effector frame of manipulation 2, O
C
X
C
Y
C
Z
C
is the
camera frame, and O
F
X
F
Y
F
Z
F
is the end-effector frame of
manipulation 1. The camera is carefully adjusted so that the
axes of the camera frame are as parallel to the axes of the
end-effector frame of manipulation 1 as possible.
Image feature extraction
Elliptic ring region extraction
Figure 3 shows the image of component B captured by the
structured light camera. To get the current pose of
component B in the camera frame, its inherent features such
as ring circles should be extracted. As shown in Figure 3(a),
there is noise in the gray image of component B, which leads
to the disturbance on the edges.
There will be a large error in detecting the ring contour
of the component through edge detection and ellipse fitting.
Another method is to obtain the ring area via threshold
segmentation.
But the gray value of the ring area is not evenly distributed
due to the influence of light. Therefore, it is difficult to accu-
rately segment the ring area with threshold segmentation.
Therefore, this article uses data labeling and deep learn-
ing methods to solve the problem of inaccurate feature
extraction. As shown in Figure 3(b), the elliptical ring area
on the surface of component B is marked, the outside of the
ring is an ellipse, and the inside is an ellipse containing the
edge of the groove. A U-NET network is designed and its
structure diagram is shown in Figure 4. It includes a contrac-
tion path forcapturing semantics and an asymmetricalexpan-
sion path for precise positioning. The contracted path part
consists of four convolutional layers and pooling layers for
down-sampling, and the extended path part consists of four
deconvolutional layers and convolutional layers for up-
sampling. This U-NET network is trained with the labeled
data. Then it is usedto segment the ring area from theimage of
component B. As shown in Figure 3 (c), the ellipticalring area
containing the groove information is accurately extracted.
Groove feature extraction
General methods cannot effectively detect the groove fea-
tures on the ring. Therefore, the inner and outer ellipses are
combined to detect the groove feature.
As shown in Figure 5(a), the contour of the elliptical
ring area is output by the U-NET network. The two con-
tours containing most edge points of the inner and outer
sides are considered as the inner ellipse and the outer
ellipse of the ring. Then the least square method is used
to fit the inner ellipse parameter equations (1) and outer
ellipse parameter equations (2), respectively. The ellipse
fitting result is shown in Figure 5(b)
Figure 1. Components: (a) components A and B, (b) component A and its surface structure, and (c) component B and its surface
structure.
Figure 2. Assembly system configuration.
Yan et al. 3
uin ¼u0þaincosq0cosq=2bin sinq0sinq=2
vin ¼v0þainsinq0cosq=2þbin cosq0sinq=2
(1)
where (u
0
,v
0
) is the pixel coordinate value of the center
point of the ellipse, (u
in
,v
in
) is the pixel coordinate of
the point on the inner ellipse, a
in
and b
in
are the long
and short axis lengths of the inner ellipse, q0represents
the initial angle of the ellipse, and q0;2pÞis the
parameter variable
uout ¼u0þaout cosq0cosq=2bout sinq0sinq=2
vout ¼v0þaout sinq0cosq=2þbout cosq0sinq=2
(2)
where (u
out
,v
out
) is the pixel coordinate of the point on the
outer ellipse and a
out
and b
out
are the long and short axis
lengths of the outer ellipse.
According to the inner and outer ellipse equations, sim-
ilar ellipse parameter equations (3) passing through the
groove area are obtained
ue¼u0þðain þkðaout ain ÞÞcosq0cosq=2
ðbin þkðbout bin ÞÞsinq0sinq=2
ve¼v0þðain þkðaout ain ÞÞsinq0cosq=2
þðbin þkðbout bin ÞÞcosq0sinq=2
8
>
>
>
<
>
>
>
:
(3)
Figure 3. Image of component B: (a) original image, (b) image with manually marked ring area, and (c) image with segmented ring area.
In (b) and (c), the ring area is indicated with red color.
Figure 4. U-NET network structure diagram.
Figure 5. Groove feature extraction process: (a) contours
detection, (b) ellipse fitting, (c) groove feature extraction, and (d)
image with extracted groove features.
4International Journal of Advanced Robotic Systems
where (u
e
,v
e
) is the pixel coordinate of the point on the
similar ellipse and k0;1Þrepresents the coefficient of
similar ellipse close to the outer ellipse.
The parameter angle qin the similar ellipse equation (3)
is gradually increased to find the continuous point along the
similar ellipse where the pixel value is significantly differ-
ent from the ring area. The corresponding parameter angle
set ðq11;q12 ;q1kÞis recorded. After traversing
q0;2pÞ, we can get five parameter angle sets
ðq11;q11 ;q1k1Þ;ðq21 ;q21 ;q2k2Þ;ðq51;q51 ;q5k5Þfg
(4)
Feature extraction results via searching along similar
ellipses are shown in Figure 5(c). Finally, the average
ðq1;q2;q3;q4;q5Þof five parameter angle sets is considered
as the angle of each groove. The results of feature extrac-
tion on the original image are shown in Figure 5(d).
Automatic assembly
Automatic assembly is divided into three parts, namely the
desired image capture stage, the camera alignment stage,
and the component insertion stage. The whole assembly
process is given in Figure 6.
The desired image capture stage is mainly to obtain the
desired image and the displacement of the manipulator
between the alignment and insertion positions via one
assembly manually controlled. The desired image features
are extracted from the desired image. During the camera
alignment stage, the features of component B in image
space and Cartesian space are acquired. A hybrid visual
servoing control method is designed for precise alignment.
In the component insertion stage, component A is trans-
lated by displacements D
1
and D
2
. Then component A is
inserted into component B.
Desired image capture stage
As shown in stage A in Figure 6, manipulator 1 is manu-
ally controlled to complete one assembly. Manipulator 1
is translated the given displacement D
1
along the z-axis
in its end-effector frame to move component A away
from component B. Manipulator 1 is translated along
the x-axis in its end-effector frame until the camera can
capture the image of component B. The displacement
along the x-axisisrecordedasD
2
. This state is called
the camera alignment state.
The images captured in the camera alignment state are
considered as the desired image. The elliptical ring area
containing the groove of the desired image is extracted
by trained U-NET network. The image coordinates
ðu1;v1Þ;ðu2;v2Þ;ðun;vnÞ
fg
of sampled points in the
ellipse ring area are obtained. Corresponding, the 3D
coordinates f(x
1
,y
1
,z
1
), (x
2
,y
2
,z
2
), ...,(x
n
,y
n
,z
n
)gin
the camera frame are recorded. The random sample
consensus algorithm is used to fit the ring area plane
(4)ofcomponentB
adxþbdyþcdzþed¼0(5)
where a
d
,b
d
,c
d
, and e
d
are the parameters of the fitting
plane.
Figure 6. The program flow chart of the assembly procedure.
Yan et al. 5
The desired normal vector [a
d
,b
d
,c
d
]
T
is obtained. The
desired normal vector is normalized to the desired unit
normal vector n
d
nd¼adbdcd
½
T
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
a2
dþb2
dþc2
d
q¼ndx ndy ndz
½
T(6)
The desired posture angle q
dx
and q
dy
are calculated with
the desired plane unit normal vector by formula (6). Posture
angle q
mdz
is an angle sequence, which contains groove
angle information. It is obtained by the above groove fea-
ture extraction algorithm
qdx ¼asinndx
qdy ¼asinndy
qmcz ¼½qc1;qc2;qc3;qc4;qc5T
8
>
<
>
:
(7)
The desired center point image coordinate
Pd¼ðuad ;vad Þof component B is obtained through
ellipse fitting. Correspondingly, the 3D coordinate
Pad ¼ðxad ;yad ;zad Þin the camera frame is read from the
3D camera.
In this way, the desired features P
d
,q
dx
,andq
dy
are
acquired in image space, and the desired features n
d
,P
ad
,
and q
mdz
are acquired in Cartesian space.
Camera alignment stage
The current image of component B is acquired in real
time. According to the above method of feature extrac-
tion, the current features Pc¼ðuac;vac Þ,q
cx
,andq
cy
are acquired from the current image, and the current
features n
c
¼[a
c
,b
c
,c
c
]
T
,Pac ¼ðxac;yac ;zacÞ,and
qmcz ¼½qc1;qc2;qc3;qc4;qc5Tare acquired in Cartesian
space, as described in the “Desired Image Capture
Stage” section.
A hybrid visual servoing control system is designed, in
which the features from image space and Cartesian space
are combined to realize the alignment between component
B and camera. The block diagram of the hybrid visual
servoing automatic control system is shown in Figure 7.
The pose of the end-effector of manipulator 1 is adjusted
in its end-effector frame according to formula (11). The
features from image space are used to control translations
along the x-axis and y-axis and rotation around the z-axis.
The features from Cartesian space are used to control trans-
lation of the end-effector along the z-axis and rotations
around the x-axis and y-axis
Dx
Dy
Dz
Dqx
Dqy
Dqz
2
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
5
¼
k1ðuac uad Þ
k1ðvac vad Þ
k2ðzac zad Þ
k2ðqcx qdxÞ
k2ðqcy qdyÞ
k2Dqmz
2
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
5
(8)
where k
1
and k
2
are coefficients and Dqmz is the best angle
error calculated by qmcz ¼½qc1;qc2;qc3;qc4;qc5Tand
qmdz ¼½qd1;qd2;qd3;qd4;qd5T.
As shown in stage B in Figure 6, the camera alignment
state is achieved after hybrid visual servoing control. At
this point, the errors between the current pose and desired
pose approach 0. The displacement between component A
and component B along the z-axis in the end-effector frame
of manipulator 1 is D
1
. The displacement between compo-
nent A and component B along the x-axis in the end-
effector frame of manipulator 1 is D
2
.
Component insertion stage
In the component insertion stage, component alignment
and component insertion are completed. At first, as shown
in stage C in Figure 6, component A is translated the dis-
placement (D
1
d) along the z-axis and the displacement D
2
along the x-axis in the end-effector frame of manipulation
1, where dis a small displacement. After ensuring the
safety of assembly, component A is translated the displace-
ment dalong the z-axis in the end-effector frame of manip-
ulation 1. Then component A is inserted into component B.
The entire assembly is completed precisely and efficiently.
Figure 7. Block diagram of automatic control system.
6International Journal of Advanced Robotic Systems
Experiments and results
Experiment system
An experiment system was established according to the
scheme given in the “Assembly System” section, as shown
in Figure 8. In this experiment system, there were two
manipulators including one seven-DOF robotic arm and
one six-DOF manipulator. Manipulator 1 had a clamping
device and component A connected to it. Manipulator 2
was a UR3 (universal robots company) manipulator with
a gripping device and component B connected to it. A
structured light camera was fixed at the end of manipulator
1. The structured light camera was LMI Gocator3210 (LMI
technologies company) binocular snapshot sensor. The res-
olution of the camera in the x-axis and y-axis directions is
60–90 mm, the field of view is 71 98 mm–100 154 mm,
and the working distance is 164 mm. Figure 8. Experiment system.
Figure 9. Feature extraction results of images at different angles and distances: (a) image after rotating around the positive directions of
the x-axis, (b) image after rotating around the negative directions of the x-axis, (c) image after rotating around the positive directions
of the y-axis, (d) image after rotating around the negative directions of the y-axis, (e) image after translating along the positive directions
of the z-axis, and (f) image after translating along the negative directions of the z-axis.
Yan et al. 7
U-Net network and feature extraction results
The training set for the U-NET network consisted of 60
images with different angles and distances and 600 images
generated by data augmentation. Each image was a gray
image obtained by the structured light camera in an actual
environment. The size of the original images was 1251
1925 pixels, which were resized to 512 512 pixels when
training U-NET network.
The new images of different angles and distances were
input into the trained U-NET network for test. The feature
extraction experiments with the method described in the
“Groove Feature Extraction” section were conducted. The
extracted features for the images captured at different angles
and distances are shown in Figure 9. Figure 9(a) and (b) are
the feature extraction results of the image after rotating
around the positive and negative directions of the x-axis,
respectively. Figure 9(c) and (d) are the feature extraction
results of the image after rotating around the positive and
negative directions of the y-axis, respectively. Figure 9(e)
and (f) are the feature extraction results of the image after
translating along the positive and negative directions of the
z-axis, respectively. It can be seen from Figure 9 that the five
grooves on the ring area are all accurately extracted.
Automatic assembly
Before the assembly experiment, the desired features of
component B had been obtained by the method in the
“Groove Feature Extraction” and “Desired Image Cap-
ture Stage” sections. The coefficient kof the similar
ellipse was equal to 0.25. The prior information
obtained in desired image capture stage is presented
in Table 1.
In the assembly experiments, the poses of component A
and component B were initialized randomly within a cer-
tain range, and the structured light camera obtained the
current image of component B in real time. The current
features of component B were obtained by the method in
the “Groove Feature Extraction” and “Desired Image Cap-
ture Stage” sections. The errors between the current fea-
tures and the desired features were used as the input of
hybrid visual servoing system. The coefficients k
1
and k
2
in the hybrid visual servoing system were both set to 0.6.
The error curves of component B between the current pose
and the desired pose are shown in Figure 10. It can be seen
that after about eight steps, the position error and orienta-
tion error have approached 0.
The trajectory of component B in image space during
the assembly process is shown in Figure 11(a). It can be
seen that the center point image coordinates of component
B are gradually approached the desired center point image
coordinates. The trajectory of component B in Cartesian
space during the assembly process is shown in
Figure 11(b). It can be seen that the center point 3D coor-
dinates of component B are gradually approached the
desired center point 3D coordinates.
Table 1. Prior information.
Desired center point image coordinates P
ad
(pixel) (627.08, 1233.53)
3D coordinates of desired center point P
ad
(mm) (0.17, 21.72, 180.04)
Desired attitude angle q
dx
() 2.72
Desired attitude angle q
dy
()0.86
Groove angle q
mdz
() (10.26, 39.06, 138.60, 174.96, 270.54)
The displacement D
1
of the evacuation (mm) 180
The displacement D
2
of the captured image (mm) 135
Figure 10. Error curves with the proposed method: (a) position error of component B and (b) orientation error of component A.
8International Journal of Advanced Robotic Systems
The actual scenes of the desired image capture stage
are shown in Figure 12. As shown in Figure 12(a), the
manipulator was manually controlled to complete one
assembly. Manipulator 1 was translated the given dis-
placement D
1
along the z-axis in its end-effector frame
to move component A away from component B. Manip-
ulator 1 was translated along the x-axis in its end-
effector frame until the camera could capture the image
of component B. The displacement along the x-axis was
recorded as D
2
.
After the desired features had been obtained, we initi-
alized the poses of component A and component B, as
shown in Figure 13(a). As shown in Figure 13(b), the cam-
era alignment state was achieved after hybrid visual servo-
ing control.
As shown in Figure 14(a), after component A had
moved up D
2
, it was aligned with component B. Then
component A was translated the displacements (D
1
d)
along the z-axis in the end-effector frame of manipulation
1, where dwas equal to 3 mm. After ensuring the safety of
assembly, component A was translated the displacement d
along the z-axis in the end-effector frame of manipulation
1. Then component A was inserted into component B, as
shown in Figure 14(b).
Figure 11. The trajectory of component B in assembly: (a) trajectory in image space and (b) trajectory in Cartesian space.
Figure 12. Desired image capture stage: (a) the direction of movement of the end-effector of the manipulator and (b) the displacement
D
1
of the evacuation.
Figure 13. The camera alignment stage: (a) initial state and (b)
camera alignment state.
Yan et al. 9
The total time cost in one assembly was about 18 s: it
was as follows, camera alignment 16 s and component
insertion 2 s. Fifty assembly experiments were conducted,
and all were successful. It can be found the alignment and
insertion achieved good results.
Comparative experiments
The position-based method in ref.
20
was selected as the
comparative method. The position-based visual servoing
control was realized according to formula (9), and the fea-
tures were all from Cartesian space
Dx
Dy
Dz
Dqx
Dqy
Dqz
2
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
5
¼
kaðxac xad Þ
kaðyac yad Þ
kaðzac zad Þ
kbðqcx qdxÞ
kbðqcy qdyÞ
kbDqz
2
6
6
6
6
6
6
6
6
4
3
7
7
7
7
7
7
7
7
5
(9)
where the difference from formula (8) was that x
ac
,x
ad
,y
ac
,
and y
ad
were obtained by directly reading the 3D coordi-
nates of the desired point and current point in the camera,
and Dqzis calculated by 3D coordinates.
The coefficients k
a
and k
b
in equation (9) were both set
to 0.6. A series of comparative experiments were well con-
ducted. Component A was also well aligned with compo-
nent B in orientation and position and was successfully
inserted into component B to form an assembled compo-
nent with the method in ref.
20
In one experiment with the
comparative method, the error curves of component B
between the current pose and the desired pose are shown
in Figure 15. It can be seen that after about 10 steps, the
position error and orientation error have approached 0. The
error curves of the comparative method oscillate more
times, and our method has a faster convergence speed.
The errors and steps of eight groups of comparative
experiments in orientation alignment and position align-
ment were listed in Table 2. It can be found that the errors
of our method are in a smaller range. Because the method in
ref.
20
will suddenly have a large error in a certain dimen-
sion, our proposed method is more steady.
Conclusions
A robotic assembly system with two manipulators is
designed to assemble two components with six DOFs in
3D space. A feature extraction algorithm for the images of
components is designed with the U-NET network. A hybrid
visual servoing method combining the errors in image
Figure 14. The component insertion stage: (a) translation D
2
and
(b) component insertion state.
Figure 15. Error curves with the comparative method: (a) position error of component B and (b) orientation error of component A.
10 International Journal of Advanced Robotic Systems
space and Cartesian space is proposed. Three DOFs are
controlled in image space, which are the center’s position
on the image plane and the rotation of component B around
the z-axis. The other three DOFs are controlled in Cartesian
space, which are the depth and the rotations around the x-
axis and y-axis.
A series of complete assembly experiments have been
conducted in a real environment. The pose error is reduced
to a small range in a few steps, and the success rate in 50
assembly experiments is 100%. Subsequently, a series of
comparative experiments to compare the proposed method
with the method in ref.
20
are well-conducted. The error
curves of the method in ref.
20
oscillate more times, and our
method has a faster convergence speed. The errors of our
method are in a smaller range. Our method can improve the
steadiness and efficiency of the alignment process. The
alignment process of component A to component B is con-
verged fast and accurately with our method.
In the future, we will pay more attention to more intel-
ligent assembly control methods.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with
respect to the research, authorship, and/or publication of this
article.
Funding
The author(s) disclosed receipt of the following financial support
for the research, authorship, and/or publication of this article: This
work was supported in part by the National Key Research and
Development Program of China under Grant 2018AAA0103004,
the National Natural Science Foundation of China under Grant
61873266, the Beijing Municipal Natural Science Foundation
under Grant 4212044, and the Science and Technology Program
of Beijing Municipal Science and Technology Commission under
Grant Z191100008019004.
ORCID iDs
Xian Tao https://orcid.org/0000-0001-5834-5181
De Xu https://orcid.org/0000-0002-7221-1654
References
1. Tsenev V. Robot assembly with flexible automatic control
according to INDUSTRY 4.0. In: IEEE XXVIII international
scientific conference electronics (ET), Sozopol, Bulgaria, 12–
14 September 2019, pp. 1–4. DOI: 10.1109/ET.2019.
8878551.
2. Zeng F, Xiao J, and Liu H. Force/torque sensorless compliant
control strategy for assembly tasks using a 6-DOF collabora-
tive robot. IEEE Access 2019; 7: 108795–108805.
3. Yu Y, Xu Z, Lv Y, et al. Design and analysis of space docking
mechanism for on-orbit assembly with application to space
telescopes. In: IEEE international conference on mechatro-
nics and automation (ICMA), Changchun, China, 5–8 August
2018, pp. 1867–1871. DOI: 10.1109/ICMA.2018.8484668.
4. Luo Y, Chen M, Wang X, et al. Precision assembly system
based on position-orientation decoupling design. In: 2nd
world conference on mechanical engineering and intelligent
manufacturing (WCMEIM), Shanghai, China, 22–24 Novem-
ber 2019, pp. 685–688. DOI: 10.1109/WCMEIM48965.2019.
00145.
5. Yu H, Ma T, Wang M, et al. Feature-based pose optimization
method for large component alignment. In: 4th international
conference on control, robotics and cybernetics (CRC),
Tokyo, Japan, 27–30 September 2019, pp. 152–156. DOI:
10.1109/CRC.2019.00039.
Table 2. The errors and steps in camera alignment.
No.
(Dx,Dy,Dz) (mm) and (Dq
x
,Dq
y
,Dq
z
) (degree) Steps
Initial
After camera alignment
Proposed method Method in ref.
20
Proposed method Method in ref.
20
1 (19.41, 10.42, 28.62)
(3.96, 0.96, 20.65)
(0.15, 0.16, 0.15)
(0.05, 0.06, 0.73)
(0.23, 0.17, 0.21)
(0.06, 0.04, 0.89)
79
2(21.72, 26.43, 42.40)
(1.03, 0.66, 10.23)
(0.17, 0.05, 0.26)
(0.05, 0.06, 0.23)
(0.21, 0.07, 0.15)
(0.06, 0.01, 1.52)
810
3 (30.07, 8.59, 7.68)
(3.12, 1.36, 5.03)
(0.15, 0.14, 0.23)
(0.01, 0.04, 0.41)
(0.11, 0.18, 0.06)
(0.02, 0.02, 0.07)
77
4(5.05, 4.26, 40.56)
(1.64, 1.87, 9.74)
(0.05, 0.23, 0.26)
(0.01, 0.01, 0.68)
(0.01, 0.10, 0.16)
(0.01, 0.01, 0.03)
67
5(12.11, 10.72, 7.80)
(3.86, 0.42, 30.89)
(0.03, 0.06, 0.29)
(0.01, 0.01, 0.03)
(0.05,0.09,0.16)
(0.01,0.02,0.19)
68
6 (1.21, 17.32, 60.63)
(6.46, 0.71, 10.51)
(0.11, 0.02, 0.04)
(0.02, 0.02, 0.51)
(0.13, 0.07, 0.14)
(0.01, 0.01, 0.29)
67
7(9.15, 26.29, 92.58)
(9.87, 0.03, 4.62)
(0.04, 0.08, 0.05)
(0.01, 0.01, 0.13)
(0.13, 0.14, 0.17)
(0.02, 0.02, 0.25)
77
8(10.08, 10.53, 76.86)
(8.87, 2.15, 10.09)
(0.04, 0.12, 0.13)
(0.01, 0.01, 0.48)
(0.25, 0.14, 0.23)
(0.02, 0.01, 0.53)
89
Yan et al. 11
6. Wang S, Chen G, Xu H, et al. A robotic peg-in-hole assembly
strategy based on variable compliance center. IEEE Access
2019; 7: 167534–167546.
7. Meng S, Ruiqin H, Lijian Z, et al. Precise robot assembly for
large-scale spacecraft components with a multi-sensor sys-
tem. In: 5th international conference on mechanical, auto-
motive and materials engineering (CMAME), Guangzhou,
China, 1–3 August 2017, pp. 254–258. DOI: 10.1109/
CMAME.2017.8540181.
8. Lei Y, Xu J, Zhou W, et al. Vision-based position/impedance
control for robotic assembly task. In: Chinese control confer-
ence (CCC), Guangzhou, China, 27–30 July 2019, pp.
4620–4625. DOI: 10.23919/ChiCC.2019.8865406.
9. Taptimtong P, Mitsantisuk C, Sripattanaon K, et al. Multi-
objects detection and classification using vision builder for
autonomous assembly. In: 10th International conference of
information and communication technology for embedded
systems (IC-ICTES), Bangkok, Thailand, 25–27 March
2019, pp. 1–4. DOI: 10.1109/ICTEmSys.2019.8695970.
10. Liu S, Xu D, Li Y, et al. Nanoliter fluid dispensing based on
microscopic vision and laser range sensor. IEEE Trans Ind
Electron 2017; 64(2): 1292–1302.
11. Liu S, Xu D, Liu F, et al. Relative pose estimation for alignment
of long cylindrical components based on microscopic vision.
IEEE/ASME Trans Mechatron 2016; 21(3): 1388–1398.
12. Liu S and Xu D. Fast and accurate circle detection algorithm
for porous components. J Electr Eng Electron Technol 2014;
03(01): 1–8.
13. Sun S, Yin Y, Wang X, et al. Robust landmark detection and
position measurement based on monocular vision for auton-
omous aerial refueling of UAVs. IEEE Trans Cybern 2019;
49(12): 4167–4179.
14. Kim J, Nguyen H, Lee Y, et al. Structured light camera base
3D visual perception and tracking application system with
robot grasping task. In: IEEE international symposium on
assembly and manufacturing (ISAM),Xian,China,30
July–2 August 2013, pp. 187–192. DOI: 10.1109/ISAM.
2013.6643524.
15. Satorres M, G´omez O, G ´amez G, et al. Visual predictive
control of robot manipulators using a 3D ToF camera. In:
IEEE international conference on systems, man, and cyber-
netics, Manchester, UK, 13–16 October 2013, pp. 3657–3662.
DOI: 10.1109/SMC.2013.623.
16. Litvak Y, Biess A, and Bar-Hillel A. Learning pose estima-
tion for high-precision robotic assembly using simulated
depth images. In: International conference on robotics and
automation (ICRA), Montreal, QC, Canada, 20–24 May 2019,
pp. 3521–3527. DOI: 10.1109/ICRA.2019.8794226.
17. Chaumette F and Hutchinson S. Visual servo control, part I:
basic approaches. IEEE Robot Autom Mag 2006; 13: 82–90.
18. Chaumette F and Hutchinson S. Visual servo control, part II:
advanced approaches. IEEE Robot Autom Mag 2007; 14:
109–118.
19. Xu D, Lu J, Wang P, et al. Partially decoupled image-based
visual servoing using different sensitive features. IEEE Trans
Syst Man Cybern Syst 2017; 47(8): 2233–2243.
20. Peng Y, Jivani D, Radke RJ, et al. Comparing position- and
image-based visual servoing for robotic assembly of large
structures. In: IEEE 16th international conference on auto-
mation science and engineering (CASE), Hong Kong, China,
20–21 August 2020, pp. 1608–1613. DOI: 10.1109/
CASE48305.2020.9217028.
21. Corke P and Hutchinson SA. A new hybrid image-based
visual servo control scheme. In: Proceedings of the 39th
IEEE conference on decision and control (Cat. No.
00CH37187), Sydney, NSW, 12–15 December 2000, pp.
2521–2526, vol. 3. DOI: 10.1109/CDC.2000.914182.
22. Deng W and Yao J. Extended-state-observer-based adaptive
control of electrohydraulic servomechanisms without velo-
city measurement. IEEE/ASME Trans Mechatron 2020;
25(3): 1151–1161.
23. Deng W and Yao J. Asymptotic tracking control of mechan-
ical servosystems with mismatched uncertainties. IEEE/
ASME Trans Mechatron 1–1. DOI: 10.1109/TMECH.2020.
3034923.
12 International Journal of Advanced Robotic Systems
... To solve this problem, many researchers apply multi-vision to the field of automatic assembly. Yan et al. 15 developed a robotic assembly system to assemble two components with 6-DOF in threedimensional space based on the U-NET network and the image features. Zhang et al. 16 made a robot guided by one or more cameras to align the assembly features of small parts accurately. ...
... According to equation (15), the feed position compensation can be determined depending on the force deviation between the expected force and the actual contact force sampled by the sensor and the position deviation of the first two cycles. ...
Article
Full-text available
Micro terminals are often used in every laptop, mobile, and other electrical product. It is challenging to automatically buckle the terminal head to its terminal base during manufacturing because of trouble in accurate positioning and gripping. A double-robots collaborative assembly system is developed to buckle millimeter-scale terminals in three-dimensional space. Robot 1 takes the terminal head horizontally by grasping its flexible line with a customized clamp, including two fingers. Robot 2 presses the aligned terminal head through a force control strategy to ensure that the terminal head and the terminal base can complete buckling accurately, even if there is a certain deviation in the vertical direction. There are two cameras to be used in the system. A horizontally placed camera is used to detect and calculate the angle between the terminal head and the horizontal plane. The angle data will drive robot 1 to make the terminal end face parallel to the horizontal plane to complete the pose correction of the terminal head. Another camera is vertically fixed at the end of industrial robot 1 and used to detect and calculate the position deviation between the terminal head and the terminal base. The position deviation will drive robot 1 to align the terminal head with the terminal base to complete the position correction. The YOLOv3, least square, and feature extraction algorithms are used in image processing. The accuracy of the YOLOv3 target detection model trained by self-made data set can reach more than 95% under different conditions. The detection period is within 65 ms. The experimental results show that the terminal assembly system designed in this paper has excellent reliability and assembly success rate. It also has a significant reference value for other terminals’ automatic buckling assemblies.
... Wang [7] proposed a robotic assembly system under the guidance of the 3D models over the internet, which used a monocular camera to collect a sequence of images in different poses. Yan [8] developed a robot assembly system with two manipulators and a structured light camera. The component pose at the end of a manipulator is calculated in 3D space. ...
Article
Full-text available
Disassembly plays an important role in the production process. Screw automatic unfastening guided by a robot has been widely used in the fields of industrial manufacturing and maintenance. Different from the previous studies that have used a flexible effector and expensive sensors, this paper presents a novel unfastening strategy based on robot vision for a hexagonal screw with an arbitrary loose state. In a robotic system, an industrial camera and a servo unfastening tool are installed at a robotic end-effector. The main contributions of this work are as follows. A camera pose adjustment method is proposed to obtain high-quality images of a target screw. The hexagonal screw pose calculation method based on the geometric analysis is developed to complete the screw–tool engagement. The cooperated motion of a robot and an unfastening tool is planned for the screw unfastening action. The effectiveness of the proposed control strategy is verified by experiments, and the influence of the motion speed on the unfastening quality is analyzed using the torque data collected by the unfastening tool. The analysis results can provide a significant foundation for the motion parameter selection in the proposed strategy.
... Dong et al. realized robot assembly pose estimation through point cloud registration [14]. Yan et al. used a structured light 3D camera to build a high-precision robot assembly system to achieve high-precision assembly of two workpieces [15]. Litvak et al. proposed a high-precision two-stage attitude estimation method based on deep learning to realize automated assembly of workpieces [16]. ...
Article
Full-text available
In order to improve industrial production efficiency, a hand–eye system based on 3D vision is proposed and the proposed system is applied to the assembly task of workpieces. First, a hand–eye calibration optimization algorithm based on data filtering is proposed in this paper. This method ensures the accuracy required for hand–eye calibration by filtering out part of the improper data. Furthermore, the improved U-net is adopted for image segmentation and SAC-IA coarse registration ICP fine registration method is adopted for point cloud registration. This method ensures that the 6D pose estimation of the object is more accurate. Through the hand–eye calibration method based on data filtering, the average error of hand–eye calibration is reduced by 0.42 mm to 0.08 mm. Compared with other models, the improved U-net proposed in this paper has higher accuracy for depth image segmentation, and the Acc coefficient and Dice coefficient achieve 0.961 and 0.876, respectively. The average translation error, average rotation error and average time-consuming of the object recognition and pose estimation methods proposed in this paper are 1.19 mm, 1.27°, and 7.5 s, respectively. The experimental results show that the proposed system in this paper can complete high-precision assembly tasks.
... In recent years, due to 3D vision robot's flexibility in varies models processing, they have been widely used in the field of manufacturing, logistics, warehousing [1][2][3][4][5] , etc. Traditional furnace industrial production lines are controlled by manual or pre-teaching and offline programming, so it has a disadvantage of low efficiency and large environmental impact. To solve the above problems, this paper proposed a 3D visual guidance robot picking method based on multi-channel image information fusion. ...
... Laser 3D scanning technology mostly adopts point-bypoint or line-by-line scanning, resulting in slow measurement and potential harm. Its main application is industrial static measurement [1][2] . Specifically, a camera of a binocular vision system is replaced by projection equipment, and the optical pattern containing specific coding information is projected to the object surface, thus solving the matching problem in stereo vision. ...
Article
Full-text available
During modern flexible lean manufacturing, flexible operation of irregular and complex workpieces with different specifications and arbitrary placement is an essential ability of industrial robots, while it cannot be met by traditional clamping methods. Vision technology brings flexibility and convenience to industrial robots, but the common two-dimensional technology only involves three degrees of freedom (plane displacement and rotation), which hinders the positioning of arbitrarily placing workpieces (often six degrees of freedom) and disorderly sorting. In addition, for typical visual tasks in industrial environments like defect detection, accurate distinguishing of such defects as pits and scratches is challenging under two-dimensional plane imaging. The introduction of three-dimensional information provides an effective solution to this problem. Thus, in the face of increasingly complex, flexible, intelligent and personalized manufacturing needs, the acquisition and processing of 3D visual information are of much importance.
Article
Facing some special operating environments or conditions, existing control methods for the peg-in-hole assembly guided by robots always have their own disadvantages, for example, low efficiency or poor adaptability. For the above problem, in this article, a new circular peg-in-hole assembly control strategy is proposed for the 6-Degree of Freedom (DOF) robot based on hybrid visual measurements, avoiding peg-in-hole contacts during the robotic operation. In the strategy, the pose of the monocular camera mounted at the end-effector is adaptively adjusted to improve the image quality through an algorithm based on the rough pose measurement of the target hole by the binocular camera; the accurate 3-D pose of the hole is determined by an algorithm based on processing of high-quality images and the compensation of the orientation error. Combined with the robotic collision-free path planning, the automatic peg-in-hole assembly can be implemented in real setting. The assembly precision of the robotic system based on the proposed method is validated and discussed based on experimental results. Then, the minimal peg-in-hole interval relative to the alignment error is modeled through the spatial relation analysis to analyze the applicable condition of the robotic system with the control strategy. Also, the reliability of the proposed strategy is verified through experimental tests under some applicable conditions. Finally, suggestions and plans of future works are discussed for further extension of the application area of the proposed strategy, such as fields of precision and ultraprecision manufacturing. This contribution has the major significance on the automatic peg-in-hole assembly under 3-D operating environment.
Article
Full-text available
Many kinds of peg-in-hole assembly strategies for an industrial robot have been reported in recent years. Most of these strategies are realized by utilizing visual and force sensors to assist robots. However, complex control algorithms that are based on visual and force sensors will reduce the assembly efficiency of a robot. This issue is thoughtless in traditional assembly strategies but is critical to further improve the efficiency of assembly automation. In this work, a new assembly strategy that is based on a displacement sensor and a variable compliant center is proposed to improve robot performance in assembly tasks. First, an elastic displacement device for this assembly strategy is designed, and its performance is analyzed. The displacement signal generated by the displacement sensor is used to detect the contact state of the peg and hole and to guide the robot to adjust the posture. Second, an assembly strategy, including the advantages of passive compliance and active compliance, and a simple assembly control system are designed to improve the assembly efficiency. Last, the effectiveness of the proposed assembly method is experimentally verified using a robot with 6 degrees of freedom and a chamferless peg and hole with a small clearance (0.1 mm). The experimental results show that the assembly strategy can successfully complete the precision peg-in-hole assembly and assist the robot in accurate assembly in industrial applications.
Article
Full-text available
The flexibility of the robot assembly process is critical, and a robot assembly system that is not flexible may damage the workpieces. Most researchers make the assembly process flexible by installing a six-dimensional force/torque sensor at the end of robots, but doing so will result in an increase in the costs of the robotic assembly system. To this end, this paper proposes an external force/torque calculation algorithm based on dynamic model identification to replace the six-dimensional force/torque sensor; the algorithm can reduce the costs while achieving a flexible assembly. In this paper, the impedance model of the environment and the dynamic model of the robot with friction are unified. Based on the unified model, the virtual contact surface is proposed to optimize the assembly. To ensure the accuracy of the assembly, the compliant control method of this paper uses the PD-based position control as the control inner loop and the impedance control as the control outer loop. To verify the accuracy of the compliant control method, a 6-DOF series collaborative robot which is developed in our laboratory is used to complete the peg-in-hole assembly experiment. The experimental results show that the algorithm has good flexibility and positional accuracy.
Article
Uncertainties, especially mismatched uncertainties, pose great challenges to high accuracy tracking controller design for mechanical servosystems. In this paper, a novel recursive robust integral of the sign of the error (RISE) control method is proposed for mechanical servosystems with mismatched uncertainties. In the controller development, two auxiliary error signals are introduced into the recursive backstepping design framework, and then RISE feedbacks are synthesized to eliminate the matched and mismatched uncertainties simultaneously. Moreover, to reduce the design conservatism, an adaptive recursive RISE control law is also developed for mechanical servosystems suffering from both parametric uncertainties and unmodeled disturbances, in which desired-trajectory-based adaptation law is synthesized to achieve compensation for parametric uncertainties. The proposed control methods can theoretically achieve remarkable asymptotic tracking performance with zero steady-state error in spite of matched and mismatched time-variant uncertainties. The proposed controllers are applied to an actual hydraulic servosystem and comparative experiments are performed to verify their effectiveness.
Article
Velocity signal is difficult to obtain in practical electro-hydraulic servomechanisms. Even though it can be approximately derived via numerical differentiation on position measurement, the strong noise effect will greatly deteriorate the achievable control performance. Hence, how to design a high performance tracking controller without velocity measurement is of practical significance. In this paper, a practical adaptive tracking controller without velocity measurement is proposed for electro-hydraulic servomechanisms. To estimate the unmeasurable velocity signal, an extended state observer (ESO) which also provides an estimate of the mismatched disturbance is constructed. The ESO uses the unknown parameter estimates updated by a novel adaptive law which only depends on the actual position and desired trajectory. Moreover, the matched parametric uncertainty is also handled by on-line parameter adaptation and the matched disturbance is suppressed via a robust control law. The proposed ESO-based adaptive controller theoretically achieves an excellent asymptotic tracking performance when existing time-invariant modeling uncertainties. In the presence of time-variant modeling uncertainties, guaranteed transient performance and prescribed final tracking accuracy can also be achieved. The proposed control strategy bridges the gap between adaptive control and disturbance observer based control without using velocity signal and preserves the performance results of both control methods while overcoming their practical performance limitations. Comparative experiments are performed on an actual servovalve controlled double-rod hydraulic actuator to verify the superiority of the proposed control strategy.