Conference PaperPDF Available

Embedded-Based Object Matching and Robot Arm Control

Authors:
  • Contrel Technology co. ltd
Abstract We present an embedded-based robot arm grasp
detection system. This system has two subsystems: an embedded
vision subsystem and an embedded robot arm control
subsystem. In the former, a template matching algorithm is
processed into an Nvidia Jetson TX2 developer kit to detect
objects. And, then, control a robot arm to grasp and place.
Although embedded systems have some preferred benefits, such
as: cost, weight, size and power consumption; the slow
processing speed is a significant drawback. To deal with this
problem, we propose methods to reduce the number of
calculations on the measurement of similarity. After testing on
40 templates with 200 test images, the result shows that the
average of execution time is up to 10x faster than the original.
The average execution time on middle size templates, (100~200)
x (100~200) pixels, obtains 0.176sec. In addition, the angle of
objects is determined with a small angle interval: 1 degree.
Index Terms Embedded vision system, template matching,
robotic grasp detection, fast matching.
I. INTRODUCTION
To solve problems about mobile robots, in recent years,
embedded systems are being used in manufacturing systems.
Embedded platforms are useful in compact, movable and low-
power systems. Throughout the history of automatic system
development, in robotic grasp detection systems, the
controller of the robot arm has usually been connected to a
computer platform. The main reason is because the computer
platform is quite convenient and powerful. However, high
cost and bulkiness are two major weaknesses. As an effective
solution for those problems, embedded systems are
increasingly being applied into grasp and place automatic
systems, especially in mobile robot systems [1].
Vision algorithms are usually integrated into control
systems to detect targets for the grasping of a robot arm [2].
In which, template matching is one of the most used
techniques. A good reason for that is because it is very simple,
sufficient and precise. Template matching uses similarity
measure methods, such as Normalized Cross-Correlation
(NCC), to match a template with targets on test images. It
plays a crucial role for pattern recognition and object
detection systems. A matching process is often used to tackle
the problems of rotation, scaling and translation (RST). When
it comes to such properties, the matching method from Kim
and Araujo et al. [3] is usually taken into account. That is a
robust and invariant for the RST matching. However, the RST
Minh-Tri Le is working toward the Ph.D. degree at the Department of
Computer Science and Information Engineering, National Cheng Kung
University, Tainan, 701 Taiwan R.O.C. (e-mail:
n28057023@mail.ncku.edu.tw).
Professor Chih-Hung G. Li is with the Graduate Institute of
Manufacturing Technology, National Taipei University of Technology,
Taipei, 10608 Taiwan R.O.C. (e-mail: cl4e@ntut.edu.tw).
consumes too much in terms of the computational cost. The
higher the accuracy of location, angle and scale is, the slower
the time is.
In our paper, an embedded-based robot arm grasp
detection system is built (see Fig. 1). The major contributions
in our system are: (1) We propose a method to reduce the
computational cost of the similarity measurement at the
rotation matching process by replacing the NCC formula into
dot product. (2) Moreover, we use a coarse-to-fine approach
to obtain a high matching performance, especially an angle
interval of matching of 1 degree, but do not increase the
processing time. (3) Using an embedded vision to control a
robot arm via TCP/IP is also a considerable result.
The rest of this paper is organized as follows. In Section
II, we discuss related work. We present our system framework
in Section III. Section IV shows the proposed methods to
accelerate the RST algorithm. Then, Section V gives
information about experimental results and discussion.
Finally, we conclude in Section VI.
II. RELATED WORK
In automatic processing, vision systems have had many
remarkable breakthroughs. In particular, template matching
has played a critical role in robotic applications such as:
detecting object to grasp and place, pattern recognition and
measuring similarity between template and target on images
Professor Shu-Mei Guo is with the Department of Computer Science and
Information Engineering, National Cheng Kung University, Tainan, 701
Taiwan R.O.C. (e-mail: guosm@mail.ncku.edu.tw).
Professor Jenn-Jier James Lien is with the Department of Computer
Science and Information Engineering, National Cheng Kung University,
Tainan, 701 Taiwan R.O.C. (e-mail: jjlien@csie.ncku.edu.tw).
Embedded-Based Object Matching and Robot Arm Control
Minh-Tri Le, Chih-Hung G. Li*, Shu-Mei Guo*, and Jenn-Jier James Lien*, *Member, IEEE
Fig. 1. The hardware of the embedded-based robot arm grasp detection. 1)
Nvidia Jetson TX2 developer kit. 2) Monitor. 3) 2D RGB Camera. 4)
6DOF Yaskawa robot arm. 5) Objects for detecting and grasping.
1)
2)
3)
4)
5)
or videos. The processing time, however, has been a
significant challenge. The key part of template matching is the
correlation measurement methods such as Sum of Absolute
(SAD), Sum of Squared Differences (SSD), and Normalized
Cross-Correlation (NCC). SAD and SSD are fast in
measuring similarity, but they are sensitive with brightness
and contrast. Although NCC is a widespread method in the
measurement of similarity, it is a high cost method. Over the
past few previous works, there have been research
achievements to reduce the NCC computational time. Such as
Lewis et al. [4] the NCC computational cost is reduced by
using a summed-area table on the denominator and converting
the numerator to Fast Fourier Transform (FFT). The
drawback of this method is that FFT calculation does not
bring a stable result when applied to matching of rotation and
scale. And integral image method is not effective with rotated
targets. Wu and Toet et al. [5] also use the integral image
approach to accelerate template matching. In that case,
integral image is used to calculate the weak image blocks on
the template and test images. The difference between the
weak-image and template is compared with a limit parameter
to decide if the image window is a candidate or not. As
pointed out by Lewis et al [4], the calculation of integral
image is susceptible to rotation and scale matching.
Using the bounding of NCC formula [6] based on Cauchy-
Schwarz’s inequality to reject inadequate candidates quickly,
the inequality was used to create the upper bound. Instead of
calculating the entire NCC formula, first of all they
considered the similarity between the partial blocks based on
the upper bound. In ring projection transform (RPT) methods
[7], the circular features are converted from 2D to 1D by using
circular projections, then computation is converted into
frequency domain to reduce the cost.
Deep features are also applied in image matching. In Kong
et al.[8], a Siamese network is used to extract features before
measuring the similarity. The training process brings deep
features which help to improve the accuracy of matching
process. However, the use of dataset for training and the
training time are remarkable disadvantages in mobile systems
or embedded systems.
III. THE EMBEDDED-BASED ROBOT ARM GRASP DETECTION
SYSTEM
Fig. 2 shows the global framework of our system. The
hardware has two parts: An Embedded vision subsystem and
an Embedded robot arm control subsystem. The former is
composed by an Nvidia Jetson TX2 board and a Flea3 1.3MP
Color USB 3.0 Point Grey camera. The Nvidia Jetson TX2
developer kit has the following hardware specifications: A
Quad-core 2.0Ghz 64-bit ARMv8 A57, a dual-core 2.0Ghz
ARMv8 Denver, a 256 Cuda cores 1.3Mhz Nvidia Pascal and
an 8Gb memory. The latter consists of a 6DOF Yaskawa
Robot Arm and its controller. The embedded board is
connected to the controller of the robot arm via TCP/IP
protocol. Camera images (I) are processed by the RST
algorithm to determine the 2D coordinates and angles of
targets. After that, the coordinates are converted to 3D
coordinates and sent to the robot arm.
A. Nvidia Jetson TX2-based on Template Matching using
the RST Algorithm.
To match a template (T) and targets on test images (I), we
build our work based on the RST algorithm [3]. In this paper,
this algorithm has 2 steps for training and 4 steps for the test
process. Before assigning to RST algorithm, both the template
and the test image are resized by using the pyramid technique:
down-sampling template (T’) and down-sampling test image
(I’). The level of pyramid (NP) depends on the size of
template. At the training process (see Fig. 3), we want to
extract the features of the template. The circular features are
derived on Step 1 of training. The template is scaled with NS
different sizes. On each scale template, we locate pixels on
concentric circles. The number of concentric circles (NC) depends
on the size of template. The reason why we use the concentric
circles is because those circles are invariant to the rotation.
Particularly, we do not know the rotation angle of targets on
test image in the beginning. After that, we calculate the
Fig. 2. The global framework of the embedded-based robot arm grasp detection system. (1) The image (I) captured from the camera is processed to detect
targets. (2) The positions (
󰆒
󰆒) and the angle () of the targets are transformed to 3D robot arm poses ( 󰇛󰇜). (3)(4) These poses
are sent to the robot arm controller via TCP/IP protocol. (5) Based on the pose , the controller sends commands to control the grasping.
average grayscale pixel values on each circle. Those values
are the circular features (Cq). When collecting the pixel
values, the coordinates of pixels are stored into a Look-up
table (LUT_S). The reason to do that is when we want to
extract circular features on the test image, such coordinates
will be used to locate the pixels rapidly. At Step 2, we want to
find the radial features. On the largest scale template, Nl radial
lines are created. Then, we calculate the average grayscale
pixel values on each radial line. Those are the radial features
(Rq). The angle interval (α) between two radial lines is equal
to 
. Similar to Step 1, the coordinates of pixels are also
stored into a Look-up table (LUT_R) which is used for the test
process.
In Fig. 4, the test process is generally split into four major
steps: 1) Scaling quantization by the circular sampling filter;
2) Rotation quantization by the radial sampling filter; 3)
Affined template matching filter; and 4) Robust matching
filter.
In the scaling quantization step, a search window will scan
from the top-left to the bottom-right of an image. At each
position, we extract the circular features (Ca) on the search
window by calculating the average grayscale values on
concentric circles. The coordinates of pixels are retrieved
from the LUT_S. From the features of NS scale template and
the features on the search window, we use the NCC formula
to measure the similarity, and obtain the highest score. If the
score is greater than a scale threshold (t1), that scale and
location will be the candidates of Step 1.
The first candidates give information about locations and
scales. Those are supplied to the rotation quantization step.
At each location of candidates, a search window with the size
equals to the largest scale template is used. The radial lines
are also generated similar to Step 2 of the training process.
Based on the coordinates from LUT_R, the radial features (Ra)
are calculated by averaging pixel values on those radial lines.
To find the angle candidates, we calculate NCC between Rq
and Ra, we rotate the Rq, and measure the similarity between
it and Ra. Then, we select the maximum NCC score. If that
maximum value is greater than a radial threshold (t2), that
angle will be a second candidate.
At Step 3, the affined template matching filter. From the
second candidates, we have parameters: location, scale and
angle. The main goal of this step is to check the similarity
between T’ and the target candidates qualified from Step 1
and Step 2. The target candidate is the set of pixels which are
inside a boundary on the test image I’. That boundary is built
from the angle, scale and location factors from Step 2. Step 3
will help to reject most of the unsatisfactory candidates. To
do that, we need to do an affine transform on T’. Then,
measuring the similarity between that affined template and the
target candidates. If the similarity score is greater than a
threshold (t3), that candidate will be the third candidate.
Finally, we need to do the robust matching filter. From
Step 1 to Step 3, the matching process is in the condition of
down-sampling on the template and the test image. The angle
interval of the rotation matching is 10 degrees. Note that
although the down-sampling technique helps to reduce the
dimension of matching data, it also decreases the precision.
At this step, we measure the correlation of the original
Fig. 4. The test framework of the RST template matching algorithm.
template (T) and the candidates from Step 3 on the test image
(I). It means that, we consider the similarity in the up-
sampling condition. When we adopt the up-sampling
technique, we need to find the accurate location, scale and
angle factors from the factors of Step 3. First of all, the
location candidates from Step 3 need to be compared with
pixels around them. In that way, we can obtain the correct
locations of the target. For the robust rotation matching, we
take the tolerance of angle into consideration. With an angle
interval of 10 degrees, that tolerance should be  degree. To
have the robustness of scale matching, when we do the robust
of translation, we also test them over all values within the
scale range. After spending on the robust matching filter, the
candidates from Step 4 will be merged together if they are
overlapping.
B. Embedded Robot Arm Control
The outputs of the matching process are the locations,
scales and angles of targets. The locations of targets are 2D
coordinates. In our study, we use the calibration in Zhang et
al. [9] to find the intrinsic (M) and transformation (
)
matrices. The 2D coordinates (u’, v’) will be converted to 3D
coordinates (xB, yB, zB) by using such matrices. After the
conversion, the 3D coordinates and the angles of targets ()
are sent to the controller of the Robot arm from Nvidia Jetson
TX2 board via TCP/IP protocol to control for grasping.
IV. A PROPOSED FAST MATCHING METHOD
A. A Fast Rotation Matching by Converting Traditional
NCC Formula to Dot Product.
At Step 2 of the test process- the rotation quantization step,
we will present a novel way to cut down the calculation on the
NCC formula. The traditional NCC formula for the
measurement of similarity is:
 󰇛
󰇜󰇛
󰇜

󰇛
󰇜
 󰇛
󰇜
 󰇛󰇜
where  is the NCC coefficient score;  are pixel
values of the template and the test image, respectively; region
R is total number of elements on ; and
are their
mean values, respectively.
Expanding of the NCC formula[10]:
 󰇛
󰇜

󰇛
󰇜
 󰇛
󰇜
 󰇛󰇜
󰇛󰇜

󰇛
󰇜
 󰇛
󰇜
 󰇛󰇜
When it comes to the similarity measure on the radial
features, it is easy to recognize that pixels on the radial lines
are located inside a circle with a radius equal to the radius of
the template. When we measure the similarity of rotation, we
need to rotate the radial lines on the search window. In this
case, pixels on the radial lines only rotate their positions, do
not change their values (see Fig. 5). With pixels outside the
circle, we do not take into consideration. In detail, R,
and
the standard deviations: 󰇛
󰇜
 , 󰇛
󰇜
 are
also invariable with the pixels inside that circle. Therefore, we
only need to calculate them once when computing the NCC
formula.
Hence, instead of finding the maximum score of NCC
formula, we only need to find the maximum value of the sum
of products (dot product ) of two scalar vectors x and y,
then divide that value by the standard deviation. By this way,
we can save a significant number of operators on the NCC
formula. For example, with n candidates from Step 1, we can
reduce (n-1) divisions from the calculations of normalization
(dividing by the standard deviation), and (Nl * (n-1))
subtractions the deviation calculations.

 󰇛󰇜
Applying coarse-to-fine approach to reduce the
calculation. Instead of rotating the feature Rq and measuring
with the feature Ra, we double the elements of Rq and scan the
elements of Ra on that. Totally, we need to scan Nl times (same
as the number of radial lines). One stride is corresponding
with a rotation of 10-degree (the angle interval). To reduce the
calculation, we use the coarse-to-fine method to diminish the
number of scans. First of all, we shift with a stride equal to 2.
After finding the maximum score (local maximum), we do a
(a) (b)
Fig. 5. The template and the rotating targets on test image. (a)
Template. (b) Test image with non-rotation and rotation targets
(a) (b) (c)
Fig. 6. The illustration of the expanded pixels after up-sampling with NP = 2. (a) The candidate pixel (pixel 1) and 15 expanded pixels
after up-sampling. (b) First, measure similarity between pixel 1 and pixels 2, 3, 4; and choose the maximum score. (c) The
pixel has the maximum score will be continue to measure with its neigbors.
fine scan with the stride equal to 1 to search the global
maximum point.
B. A Fast and Robust Matching using Coarse-to-Fine
Approach
At Step 4, we will apply coarse-to-fine approach for the
robustness matching. First of all, we consider the location
robustness. When we do the up-sampling technique on the test
image, as an example in Fig. 6, corresponding to the location
candidates (from step 3), with NP = 2, there are 15 expanded
pixels. We need to find the correct candidate among them. In
that example, pixel 1 is the center of the target candidate.
Three pixels 2, 3, 4 are three centers of search windows which
will be measured for similarity with the original template (T).
The center with the maximum score will continue to be
compared with three neighbors. In that way, instead of having
16 NCC computations, it will be reduced to 7 NCC
computations. Secondly, we also apply the coarse-to-fine
approach for finding robust rotation matching. The angle
candidate is compared with a pair of angles around it. The
angle with the maximum score will be a new angle candidate.
The winner will continue to be measured with other pairs.
Tolerance values are decreased gradually:  
after each comparison. For example, with an angle candidate
of 100, the first measure is between 70, 100 and 130. Assuming
that 70 is the winner, the next measure will be between 50, 70
and 90. The final winner is the angle of target on the test
image. Totally, the number of NCC computations is reduced
by the method mentioned above. As a case in point, if NP = 2,
NS = 3 and = 100, with the original method, corresponding
to one candidate from Step 3, we need 480 NCC computations
for robustness matching, while that maximum number is only
17 with the proposed method.
V. EXPERIMENTAL RESULTS AND DISCUSSION
A. Data Collections
Offline experiment. In this experiment, we test the matching
on a database. The catalogue of templates consists of 6 types:
Food, Manufacturing, Metal, PCB, Printing, and Logo. The
size of templates can be separated into 3 kinds. With small
size: (40~100) x (40~100) pixels, medium size: (100~200) x
(100~200) pixels and large size: (>200) x (>200) pixels. This
kind of experiment is implemented on 40 templates and 200
test images (see Table I). Each template is tested by 5 test
images with different angles, scales and locations. Each test
image only has one target. The parameters are used for
training and test process as follows: NS= 0.9~1.1, NC= 10, α=
100, thresholds: t1=0.84, t2=0.7, t3=0.7. If we want to do
matching with a huge different scale of targets, we can change
the parameter NS. The test process takes place through two
ways: using the embedded base original RST[3] (e-RST) and
the proposed method.
Online experiment. The Nvidia Jetson TX2 board is
connected to a camera with resolution of 752x580 pixels. That
camera captures objects on a conveyor belt. The online testing
is performed on 10 samples (see Fig. 7) with two kinds of
materials: wood and metal. Each sample is tested once. The
3D coordinates of samples after conversion are sent to a robot
arm to grasp and place.
B. Experimental Results
Offline experiment. The time of test process is compared
and showed in Table II. In terms of the process time, the table
shows that the proposed method is effective on the medium
templates: 0.176s, while the accuracy is highest with the large
size: 94.3%. The rotation matching results are shown with 5
different rotation test images: 20, 10, 00, 3590 and 3580 (see
Fig. 9). Those results show that the system can operate with a
Fig. 7. The online experiment is carried out 10 samples with different
sizes and shapes; and two kinds of metarials: wood and metal.
a) I(646x492), T(44x42) (b) I(646x492), T(363x170) c) I(640x480), T(120x120) d) I(768x576), T(71x73) e) I(640x512), T(142x170) f) I(768x576), T(92x87)
Fig. 8. The matching results on the complicated images. (a) Small targets, (b) large target, (c) easy confusing targets, (d) multi-rotation angle targets, (e)
blurred image, and (f) noisy Image. (I: Image size (width x height), T: Template size (width x height)).
TABLE I. THREE KINDS OF TEMPLATE SIZES USED FOR OFFLINE TEST
Template size
Small
Medium
Large
(40~100) x
(40~100) pixels
(100~200) x
(100~200) pixels
(>200) x
(>200) pixels
No. of
Templates
6
17
7
No. of Test
Images
30
85
35
TABLE II. THE AVERAGE OF EXECUTION TIME AND ACCURACY OF
TEST PROCESS ON THREE KINDS OF TEMPLATE SIZE
Size
Small
Medium
Large
Embedded-based original RST
Average Time (s)
1.113
1.739
19.940
Accuracy (%)
90.0
88.8
91.4
Proposed method
Average Time (s)
0.302
0.176
2.204
Accuracy (%)
93.3
89.4
94.3
small angle interval: 1 degree. Moreover, our algorithm also
can deal with complicated cases as shown in Fig. 8.
Online experiment. Table III shows the matching time of 10
samples. In general, all samples are matched for grasping with
the average execution time of 0.303s. SP5 gives the fastest
time with 0.074s, whilst SP6 shows the lowest with 1.06s.
C. Discussion
Table II shows the time and accuracy of the offline test on
two methods: e-RST and the proposed method. Generally
speaking, the latter is faster and more accurate than the
former. Specifically, with the medium and large templates the
proposed method is faster by nearly ten times (x10) than the
e-RST, while with the small ones, the improvement is around
four times. Based on a detailed evaluation of the proposed
method with the medium templates, the average execution
time is the shortest: 0.176s, while the large ones consume too
much time: more than 2s. The processing time on the small
templates is slow. The reason is because the level of down-
sampling in that case is 1. If higher levels are applied, the
accuracy will decrease. The information from the table shows
that the accuracies are nearly the same. The accuracy of the
proposed methods fluctuates within 89%~94%. The templates
of the large and medium sizes have a higher accuracy with
94.3%, while the medium ones are 89.4%. Regarding the
angle test (see Fig. 9), although there is a wrong result with
an acceptable error = (in the test case: 20), the result shows
that the proposed method can approach a high precision of
angle interval: 1 degree.
In addition, the online testing demonstrates processes of
matching on real objects. The 3D coordinates, converted from
the 2D coordinates of targets, are compatible with the
coordinates of the robot arm. The processing time varies
around 0.1s~0.3s (except SP6). The test result on SP6 also
presents that the RST is more sensitive with objects which are
too long.
VI. CONCLUSION
In this study, we proposed a fast and high accuracy method
for the template matching on an embedded system. The
average execution time is cuts down notably with a 10x
reduction. The angle matching with a small interval, 1 degree,
is useful for grasping systems. And the connection between
robot arm and an embedded system with a vision algorithm
inside is a remarkable and applicable support for mobile
systems. The online testing also proves that the accuracy and
the process time of the proposed method are effective and
adequate for robotics grasping systems.
ACKNOWLEDGMENT
This work was supported by the Ministry of Science and
Technology (MOST), Taiwan, R.O.C., grant MOST 107-
2221-E-006-218, Tongtai machine & tool Co., Ltd. and
Contrel technology Co., Ltd.
REFERENCES
[1] C. Hu, F. Arvin, C. Xiong, and S. Yue, "Bio-inspired embedded vision
system for autonomous micro-robots: the LGMD case," in IEEE Trans.
Cog. Dev Systems, vol. 9, no. 3, pp. 241-254, 2017.
[2] W.C. Tan, P.C. Goh, A. Causo, I.-M. Chen, and H.K. Tan, "Automated
vision based detection of blistering on metal surface: For robot," in 13th
IEEE Conference on Automation Science and Engineering (CASE),
2017: IEEE, pp. 74-79, 2017.
[3] H.Y. Kim and S.A. De Araújo, "Grayscale template-matching invariant
to rotation, scale, translation, brightness and contrast," in Pacific-Rim
Symposium on Image and Video Technology, pp. 100-113, 2007.
[4] J.P. Lewis, "Fast template matching," in Vision interface, vol. 95, no.
120123, pp. 15-19, 1995.
[5] T. Wu and A. Toet, "Speed-up template matching through integral
image based weak classifiers," in J. Pattern Recognition Reseach,vol.
1, pp. 1-12, 2014.
[6] L. Di Stefano and S. Mattoccia, "A sufficient condition based on the
Cauchy-Schwarz inequality for efficient template matching," in
Proceedings 2003 International Conference on Image Processing (Cat.
No. 03CH37429), vol. 1, pp. 1-269, 2003.
[7] H.Y. Kim, "Rotation-discriminating template matching based on
Fourier coefficients of radial projections with robustness to scaling and
partial occlusion," in Pattern Recognition, vol. 43, no. 3, pp. 859-872,
2010.
[8] B. Kong, J. Supancic, D. Ramanan, and C. Fowlkes, "Cross-domain
image matching with deep feature maps," in I.J. Comput Vision, 2018.
[9] Z. Zhang, "A flexible new technique for camera calibration," in IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no.
11, pp. 1330-1334, 2000.
[10] J.J. Lien, "Automatic recognition of facial expressions using hidden
Markov models and estimation of expression intensity", Ph.D thesis.
Carnegie Mellon University, 1998.
TABLE III. THE MATCHING TIME OF ONLINE TEST PROCESS ON 10 SAMPLES
Samples
SP1
SP2
SP3
SP4
SP5
SP6
SP7
SP8
SP9
SP10
Template size (pixels)
80x86
86x89
88x98
140x138
98x86
148x151
152 x 143
130 x 122
119x122
134x130
Test time (s)
0.169
0.227
0.136
0.263
0.074
1.060
0.309
0.558
0.123
0.110
Fig. 9. The offline experiment on 5 different rotation test images; template size is 331 x 330 pixels.
... On the other hand, embedded vision systems, consisting of a camera directly integrated with a processing board, have a small size, good portability and a low power consumption. As a result, they are nowadays widely applied in many practical applications, including object recognition, tracking, medical image processing, automatic car driving and defect inspection systems [5][6][7][8][9][10][11]. However, they suffer the disadvantages of a long processing time due to their limited central processing unit (CPU) capability and a relatively small memory space. ...
... As shown in Figure 4, the procedure of the RST refinement testing first determined the rotation angle of the PCB by rotation matching (Step 1) and then refined the matching results (Step 2). The initial rotation matching process was performed using radial features such as those described in [10,23]. In addition, the accuracy of the matching results was then improved via a further refinement process. ...
Article
Full-text available
The fiducial-marks-based alignment process is one of the most critical steps in printed circuit board (PCB) manufacturing. In the alignment process, a machine vision technique is used to detect the fiducial marks and then adjust the position of the vision system in such a way that it is aligned with the PCB. The present study proposed an embedded PCB alignment system, in which a rotation, scale and translation (RST) template-matching algorithm was employed to locate the marks on the PCB surface. The coordinates and angles of the detected marks were then compared with the reference values which were set by users, and the difference between them was used to adjust the position of the vision system accordingly. To improve the positioning accuracy, the angle and location matching process was performed in refinement processes. To overcome the matching time, in the present study we accelerated the rotation matching by eliminating the weak features in the scanning process and converting the normalized cross correlation (NCC) formula to a sum of products. Moreover, the scanning time was reduced by implementing the entire RST process in parallel on threads of a graphics processing unit (GPU) by applying hash functions to find refined positions in the refinement matching process. The experimental results showed that the resulting matching time was around 32× faster than that achieved on a conventional central processing unit (CPU) for a test image size of 1280 × 960 pixels. Furthermore, the precision of the alignment process achieved a considerable result with a tolerance of 36.4μm.
... However, MP is not designed to plan through complex manipulation tasks requiring environmental interaction. Furthermore, MP necessitates the specification of a goal state in the robot's frame of reference, which is typically accomplished through manual engineering [4], template matching [5], or an object-specific pose estimator [6] trained on manually labeled data. Deep reinforcement learning (RL), on the other hand, has shown promising outcomes in learning to control a robot for complex manipulation tasks such as grasping [7], [8] and insertion [9]. ...
Preprint
Full-text available
Data efficiency in robotic skill acquisition is crucial for operating robots in varied small-batch assembly settings. To operate in such environments, robots must have robust obstacle avoidance and versatile goal conditioning acquired from only a few simple demonstrations. Existing approaches, however, fall short of these requirements. Deep reinforcement learning (RL) enables a robot to learn complex manipulation tasks but is often limited to small task spaces in the real world due to sample inefficiency and safety concerns. Motion planning (MP) can generate collision-free paths in obstructed environments, but cannot solve complex manipulation tasks and requires goal states often specified by a user or object-specific pose estimator. In this work, we propose a system for efficient skill acquisition that leverages an object-centric generative model (OCGM) for versatile goal identification to specify a goal for MP combined with RL to solve complex manipulation tasks in obstructed environments. Specifically, OCGM enables one-shot target object identification and re-identification in new scenes, allowing MP to guide the robot to the target object while avoiding obstacles. This is combined with a skill transition network, which bridges the gap between terminal states of MP and feasible start states of a sample-efficient RL policy. The experiments demonstrate that our OCGM-based one-shot goal identification provides competitive accuracy to other baseline approaches and that our modular framework outperforms competitive baselines, including a state-of-the-art RL algorithm, by a significant margin for complex manipulation tasks in obstructed environments.
Article
Full-text available
We investigate the problem of automatically determining what type of shoe left an impression found at a crime scene. This recognition problem is made difficult by the variability in types of crime scene evidence (ranging from traces of dust or oil on hard surfaces to impressions made in soil) and the lack of comprehensive databases of shoe outsole tread patterns. We find that mid-level features extracted by pre-trained convolutional neural nets are surprisingly effective descriptors for this specialized domains. However, the choice of similarity measure for matching exemplars to a query image is essential to good performance. For matching multi-channel deep features, we propose the use of multi-channel normalized cross-correlation and analyze its effectiveness. Our proposed metric significantly improves performance in matching crime scene shoeprints to laboratory test impressions. We also show its effectiveness in other cross-domain image retrieval problems: matching facade images to segmentation labels and aerial photos to map images. Finally, we introduce a discriminatively trained variant and fine-tune our system through our proposed metric, obtaining state-of-the-art performance.
Data
Full-text available
Template matching is a widely used pattern recognition method, especially in industrial inspection. However, the computational costs of traditional template matching increase dramatically with both template-and scene imagesize. This makes traditional template matching less useful for many (e.g. real-time) applications. In this paper, we present a method to speed-up template matching. First, candidate match locations are determined using a cascaded blockwise computation of integral image based binary test patterns. Then, traditional template matching is applied at the candidate match locations to de-termine the best overall match. The results show that the proposed method is fast and robust.
Article
Full-text available
Although it is well known that cross correlation can be efficiently implemented in the transform domain, the normalized form of cross correlation preferred for tem- plate matching applications does not have a simple fre- quency domain expression. Normalized cross correla- tion is usually computed in the spatial domain for this reason. This short paper shows that unnormalized cross correlation can be efficiently normalized using precom- puted tables containing the integral of the image and image2 over the search window.
Conference Paper
Full-text available
In this paper, we consider the grayscale template-matching problem, invariant to rotation, scale, translation, brightness and contrast, without previous operations that discard grayscale information, like detection of edges, detection of interest points or segmentation/binarization of the images. The obvious "brute force" solution performs a series of conventional template matchings between the image to analyze and the template query shape rotated by every angle, translated to every position and scaled by every factor (within some specified range of scale factors). Clearly, this takes too long and thus is not practical. We propose a technique that substantially accelerates this searching, while obtaining the same result as the original brute force algorithm. In some experiments, our algorithm was 400 times faster than the brute force algorithm. Our algorithm consists of three cascaded filters. These filters successively exclude pixels that have no chance of matching the template from further processing.
Conference Paper
Full-text available
The paper proposes a technique aimed at reducing the number of calculations required to carry out an exhaustive template matching process based on the normalized cross correlation (NCC). The technique deploys an effective sufficient condition, relying on the recently introduced concept of bounded partial correlation that allows rapid elimination of the points that cannot provide a better cross-correlation score with respect to the current best candidate. In this paper we devise a novel sufficient condition based on the Cauchy-Schwarz inequality and compare the experimental results with those attained using the standard NCC-based template matching algorithm and the already known sufficient condition based on the Jensen inequality.
Article
Full-text available
We propose a flexible technique to easily calibrate a camera. It only requires the camera to observe a planar pattern shown at a few (at least two) different orientations. Either the camera or the planar pattern can be freely moved. The motion need not be known. Radial lens distortion is modeled. The proposed procedure consists of a closed-form solution, followed by a nonlinear refinement based on the maximum likelihood criterion. Both computer simulation and real data have been used to test the proposed technique and very good results have been obtained. Compared with classical techniques which use expensive equipment such as two or three orthogonal planes, the proposed technique is easy to use and flexible. It advances 3D computer vision one more step from laboratory environments to real world use.
Article
In this paper, we present a new bio-inspired vision system embedded for micro-robots. The vision system takes inspiration from locusts in detecting fast approaching objects. Neurophysiological research suggested that locusts use a wide-field visual neuron called lobula giant movement detector (LGMD) to respond to imminent collisions. In this work, we present the implementation of the selected neuron model by a low-cost ARM processor as part of a composite vision module. As the first embedded LGMD vision module fits to a micro-robot, the developed system performs all image acquisition and processing independently. The vision module is placed on top of a microrobot to initiate obstacle avoidance behaviour autonomously. Both simulation and real-world experiments were carried out to test the reliability and robustness of the vision system. The results of the experiments with different scenarios demonstrated the potential of the bio-inspired vision system as a low-cost embedded module for autonomous robots.
Article
We consider brightness/contrast-invariant and rotation-discriminating tem- plate matching that searches an image to analyze A for a query image Q. We propose to use the complex coefficients of the discrete Fourier transform of the radial projec- tions to compute new rotation-invariant local features. These coefficients can be effi- ciently obtained via FFT. We classify templates in "stable" and "unstable" ones and argue that any local feature-based template matching may fail to find unstable tem- plates. We extract several stable sub-templates of Q and find them in A by comparing the features. The matchings of the sub-templates are combined using the Hough transform. As the features of A are computed only once, the algorithm can find quickly many different sub-templates in A, and it is suitable for: finding many query images in A; multi-scale searching and partial occlusion-robust template matching.