Conference PaperPDF Available

Embedded-Based Object Matching and Robot Arm Control

August 2019

August 2019

DOI:10.1109/COASE.2019.8843182

Conference: 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE)

Authors:

Tri Le

Contrel Technology co. ltd

Chih-Hung G. Li

National Taipei University of Technology

James Lien

National Cheng Kung University

The global framework of the embedded-based robot arm grasp detection system. (1) The image (I) captured from the camera is processed to detect targets. (2) The positions (í µí±¢ í µí± ′ , í µí±£ í µí± ′ ) and the angle (í µí¼) of the targets are transformed to 3D robot arm poses (í µí± ̃ = (í µí±¥ í µí°µ , í µí±¦ í µí°µ , í µí± § í µí°µ ), í µí¼). (3)(4) These poses are sent to the robot arm controller via TCP/IP protocol. (5) Based on the pose í µí± ̃ , the controller sends commands to control the grasping.

…

The training framework of the RST template matching algorithm.

…

Figures - uploaded by Tri Le

Content may be subject to copyright.

Content uploaded by Tri Le

Content may be subject to copyright.

 Abstract— We present an embedded-based robot arm grasp

detection system. This system has two subsystems: an embedded

vision subsystem and an embedded robot arm control

subsystem. In the former, a template matching algorithm is

processed into an Nvidia Jetson TX2 developer kit to detect

objects. And, then, control a robot arm to grasp and place.

Although embedded systems have some preferred benefits, such

as: cost, weight, size and power consumption; the slow

processing speed is a significant drawback. To deal with this

problem, we propose methods to reduce the number of

calculations on the measurement of similarity. After testing on

40 templates with 200 test images, the result shows that the

average of execution time is up to 10x faster than the original.

The average execution time on middle size templates, (100~200)

x (100~200) pixels, obtains 0.176sec. In addition, the angle of

objects is determined with a small angle interval: 1 degree.

Index Terms — Embedded vision system, template matching,

robotic grasp detection, fast matching.

I. INTRODUCTION

To solve problems about mobile robots, in recent years,

embedded systems are being used in manufacturing systems.

Embedded platforms are useful in compact, movable and low-

power systems. Throughout the history of automatic system

development, in robotic grasp detection systems, the

controller of the robot arm has usually been connected to a

computer platform. The main reason is because the computer

platform is quite convenient and powerful. However, high

cost and bulkiness are two major weaknesses. As an effective

solution for those problems, embedded systems are

increasingly being applied into grasp and place automatic

systems, especially in mobile robot systems [1].

Vision algorithms are usually integrated into control

systems to detect targets for the grasping of a robot arm [2].

In which, template matching is one of the most used

techniques. A good reason for that is because it is very simple,

sufficient and precise. Template matching uses similarity

measure methods, such as Normalized Cross-Correlation

(NCC), to match a template with targets on test images. It

plays a crucial role for pattern recognition and object

detection systems. A matching process is often used to tackle

the problems of rotation, scaling and translation (RST). When

it comes to such properties, the matching method from Kim

and Araujo et al. [3] is usually taken into account. That is a

robust and invariant for the RST matching. However, the RST

Minh-Tri Le is working toward the Ph.D. degree at the Department of

Computer Science and Information Engineering, National Cheng Kung

University, Tainan, 701 Taiwan R.O.C. (e-mail:

n28057023@mail.ncku.edu.tw).

Professor Chih-Hung G. Li is with the Graduate Institute of

Manufacturing Technology, National Taipei University of Technology,

Taipei, 10608 Taiwan R.O.C. (e-mail: cl4e@ntut.edu.tw).

consumes too much in terms of the computational cost. The

higher the accuracy of location, angle and scale is, the slower

the time is.

In our paper, an embedded-based robot arm grasp

detection system is built (see Fig. 1). The major contributions

in our system are: (1) We propose a method to reduce the

computational cost of the similarity measurement at the

rotation matching process by replacing the NCC formula into

dot product. (2) Moreover, we use a coarse-to-fine approach

to obtain a high matching performance, especially an angle

interval of matching of 1 degree, but do not increase the

processing time. (3) Using an embedded vision to control a

robot arm via TCP/IP is also a considerable result.

The rest of this paper is organized as follows. In Section

II, we discuss related work. We present our system framework

in Section III. Section IV shows the proposed methods to

accelerate the RST algorithm. Then, Section V gives

information about experimental results and discussion.

Finally, we conclude in Section VI.

II. RELATED WORK

In automatic processing, vision systems have had many

remarkable breakthroughs. In particular, template matching

has played a critical role in robotic applications such as:

detecting object to grasp and place, pattern recognition and

measuring similarity between template and target on images

Professor Shu-Mei Guo is with the Department of Computer Science and

Information Engineering, National Cheng Kung University, Tainan, 701

Taiwan R.O.C. (e-mail: guosm@mail.ncku.edu.tw).

Professor Jenn-Jier James Lien is with the Department of Computer

Science and Information Engineering, National Cheng Kung University,

Tainan, 701 Taiwan R.O.C. (e-mail: jjlien@csie.ncku.edu.tw).

Embedded-Based Object Matching and Robot Arm Control

Minh-Tri Le, Chih-Hung G. Li*, Shu-Mei Guo*, and Jenn-Jier James Lien*, *Member, IEEE

Fig. 1. The hardware of the embedded-based robot arm grasp detection. 1)

Nvidia Jetson TX2 developer kit. 2) Monitor. 3) 2D RGB Camera. 4)

6DOF Yaskawa robot arm. 5) Objects for detecting and grasping.

or videos. The processing time, however, has been a

significant challenge. The key part of template matching is the

correlation measurement methods such as Sum of Absolute

(SAD), Sum of Squared Differences (SSD), and Normalized

Cross-Correlation (NCC). SAD and SSD are fast in

measuring similarity, but they are sensitive with brightness

and contrast. Although NCC is a widespread method in the

measurement of similarity, it is a high cost method. Over the

past few previous works, there have been research

achievements to reduce the NCC computational time. Such as

Lewis et al. [4] the NCC computational cost is reduced by

using a summed-area table on the denominator and converting

the numerator to Fast Fourier Transform (FFT). The

drawback of this method is that FFT calculation does not

bring a stable result when applied to matching of rotation and

scale. And integral image method is not effective with rotated

targets. Wu and Toet et al. [5] also use the integral image

approach to accelerate template matching. In that case,

integral image is used to calculate the weak image blocks on

the template and test images. The difference between the

weak-image and template is compared with a limit parameter

to decide if the image window is a candidate or not. As

pointed out by Lewis et al [4], the calculation of integral

image is susceptible to rotation and scale matching.

Using the bounding of NCC formula [6] based on Cauchy-

Schwarz’s inequality to reject inadequate candidates quickly,

the inequality was used to create the upper bound. Instead of

calculating the entire NCC formula, first of all they

considered the similarity between the partial blocks based on

the upper bound. In ring projection transform (RPT) methods

[7], the circular features are converted from 2D to 1D by using

circular projections, then computation is converted into

frequency domain to reduce the cost.

Deep features are also applied in image matching. In Kong

et al.[8], a Siamese network is used to extract features before

measuring the similarity. The training process brings deep

features which help to improve the accuracy of matching

process. However, the use of dataset for training and the

training time are remarkable disadvantages in mobile systems

or embedded systems.

III. THE EMBEDDED-BASED ROBOT ARM GRASP DETECTION

SYSTEM

Fig. 2 shows the global framework of our system. The

hardware has two parts: An Embedded vision subsystem and

an Embedded robot arm control subsystem. The former is

composed by an Nvidia Jetson TX2 board and a Flea3 1.3MP

Color USB 3.0 Point Grey camera. The Nvidia Jetson TX2

developer kit has the following hardware specifications: A

Quad-core 2.0Ghz 64-bit ARMv8 A57, a dual-core 2.0Ghz

ARMv8 Denver, a 256 Cuda cores 1.3Mhz Nvidia Pascal and

an 8Gb memory. The latter consists of a 6DOF Yaskawa

Robot Arm and its controller. The embedded board is

connected to the controller of the robot arm via TCP/IP

protocol. Camera images (I) are processed by the RST

algorithm to determine the 2D coordinates and angles of

targets. After that, the coordinates are converted to 3D

coordinates and sent to the robot arm.

A. Nvidia Jetson TX2-based on Template Matching using

the RST Algorithm.

To match a template (T) and targets on test images (I), we

build our work based on the RST algorithm [3]. In this paper,

this algorithm has 2 steps for training and 4 steps for the test

process. Before assigning to RST algorithm, both the template

and the test image are resized by using the pyramid technique:

down-sampling template (T’) and down-sampling test image

(I’). The level of pyramid (NP) depends on the size of

template. At the training process (see Fig. 3), we want to

extract the features of the template. The circular features are

derived on Step 1 of training. The template is scaled with NS

different sizes. On each scale template, we locate pixels on

concentric circles. The number of concentric circles (NC) depends

on the size of template. The reason why we use the concentric

circles is because those circles are invariant to the rotation.

Particularly, we do not know the rotation angle of targets on

test image in the beginning. After that, we calculate the

Fig. 2. The global framework of the embedded-based robot arm grasp detection system. (1) The image (I) captured from the camera is processed to detect

targets. (2) The positions (

󰆒

󰆒) and the angle () of the targets are transformed to 3D robot arm poses (  󰇛󰇜). (3)(4) These poses

are sent to the robot arm controller via TCP/IP protocol. (5) Based on the pose , the controller sends commands to control the grasping.

average grayscale pixel values on each circle. Those values

are the circular features (Cq). When collecting the pixel

values, the coordinates of pixels are stored into a Look-up

table (LUT_S). The reason to do that is when we want to

extract circular features on the test image, such coordinates

will be used to locate the pixels rapidly. At Step 2, we want to

find the radial features. On the largest scale template, Nl radial

lines are created. Then, we calculate the average grayscale

pixel values on each radial line. Those are the radial features

(Rq). The angle interval (α) between two radial lines is equal

to   

. Similar to Step 1, the coordinates of pixels are also

stored into a Look-up table (LUT_R) which is used for the test

process.

In Fig. 4, the test process is generally split into four major

steps: 1) Scaling quantization by the circular sampling filter;

2) Rotation quantization by the radial sampling filter; 3)

Affined template matching filter; and 4) Robust matching

filter.

In the scaling quantization step, a search window will scan

from the top-left to the bottom-right of an image. At each

position, we extract the circular features (Ca) on the search

window by calculating the average grayscale values on

concentric circles. The coordinates of pixels are retrieved

from the LUT_S. From the features of NS scale template and

the features on the search window, we use the NCC formula

to measure the similarity, and obtain the highest score. If the

score is greater than a scale threshold (t1), that scale and

location will be the candidates of Step 1.

The first candidates give information about locations and

scales. Those are supplied to the rotation quantization step.

At each location of candidates, a search window with the size

equals to the largest scale template is used. The radial lines

are also generated similar to Step 2 of the training process.

Based on the coordinates from LUT_R, the radial features (Ra)

are calculated by averaging pixel values on those radial lines.

To find the angle candidates, we calculate NCC between Rq

and Ra, we rotate the Rq, and measure the similarity between

it and Ra. Then, we select the maximum NCC score. If that

maximum value is greater than a radial threshold (t2), that

angle will be a second candidate.

At Step 3, the affined template matching filter. From the

second candidates, we have parameters: location, scale and

angle. The main goal of this step is to check the similarity

between T’ and the target candidates qualified from Step 1

and Step 2. The target candidate is the set of pixels which are

inside a boundary on the test image I’. That boundary is built

from the angle, scale and location factors from Step 2. Step 3

will help to reject most of the unsatisfactory candidates. To

do that, we need to do an affine transform on T’. Then,

measuring the similarity between that affined template and the

target candidates. If the similarity score is greater than a

threshold (t3), that candidate will be the third candidate.

Finally, we need to do the robust matching filter. From

Step 1 to Step 3, the matching process is in the condition of

down-sampling on the template and the test image. The angle

interval of the rotation matching is 10 degrees. Note that

although the down-sampling technique helps to reduce the

dimension of matching data, it also decreases the precision.

At this step, we measure the correlation of the original

Fig. 3. The training framework of the RST template matching algorithm.

Fig. 4. The test framework of the RST template matching algorithm.

template (T) and the candidates from Step 3 on the test image

(I). It means that, we consider the similarity in the up-

sampling condition. When we adopt the up-sampling

technique, we need to find the accurate location, scale and

angle factors from the factors of Step 3. First of all, the

location candidates from Step 3 need to be compared with

pixels around them. In that way, we can obtain the correct

locations of the target. For the robust rotation matching, we

take the tolerance of angle into consideration. With an angle

interval of 10 degrees, that tolerance should be  degree. To

have the robustness of scale matching, when we do the robust

of translation, we also test them over all values within the

scale range. After spending on the robust matching filter, the

candidates from Step 4 will be merged together if they are

overlapping.

B. Embedded Robot Arm Control

The outputs of the matching process are the locations,

scales and angles of targets. The locations of targets are 2D

coordinates. In our study, we use the calibration in Zhang et

al. [9] to find the intrinsic (M) and transformation ( 

)

matrices. The 2D coordinates (u’, v’) will be converted to 3D

coordinates (xB, yB, zB) by using such matrices. After the

conversion, the 3D coordinates and the angles of targets ()

are sent to the controller of the Robot arm from Nvidia Jetson

TX2 board via TCP/IP protocol to control for grasping.

IV. A PROPOSED FAST MATCHING METHOD

A. A Fast Rotation Matching by Converting Traditional

NCC Formula to Dot Product.

At Step 2 of the test process- the rotation quantization step,

we will present a novel way to cut down the calculation on the

NCC formula. The traditional NCC formula for the

measurement of similarity is:

 󰇛

󰇜󰇛

󰇜



󰇛

󰇜

 󰇛

󰇜

 󰇛󰇜

where  is the NCC coefficient score;  are pixel

values of the template and the test image, respectively; region

R is total number of elements on ; and 



 are their

mean values, respectively.

Expanding of the NCC formula[10]:

 󰇛







󰇜



󰇛

󰇜

 󰇛

󰇜

 󰇛󰇜

󰇛󰇜

 





󰇛

󰇜

 󰇛

󰇜

 󰇛󰇜

When it comes to the similarity measure on the radial

features, it is easy to recognize that pixels on the radial lines

are located inside a circle with a radius equal to the radius of

the template. When we measure the similarity of rotation, we

need to rotate the radial lines on the search window. In this

case, pixels on the radial lines only rotate their positions, do

not change their values (see Fig. 5). With pixels outside the

circle, we do not take into consideration. In detail, R, 



 and

the standard deviations: 󰇛

󰇜

 , 󰇛

󰇜

 are

also invariable with the pixels inside that circle. Therefore, we

only need to calculate them once when computing the NCC

formula.

Hence, instead of finding the maximum score of NCC

formula, we only need to find the maximum value of the sum

of products (dot product ) of two scalar vectors x and y,

then divide that value by the standard deviation. By this way,

we can save a significant number of operators on the NCC

formula. For example, with n candidates from Step 1, we can

reduce (n-1) divisions from the calculations of normalization

(dividing by the standard deviation), and (Nl * (n-1))

subtractions the deviation calculations.

  

 󰇛󰇜

Applying coarse-to-fine approach to reduce the

calculation. Instead of rotating the feature Rq and measuring

with the feature Ra, we double the elements of Rq and scan the

elements of Ra on that. Totally, we need to scan Nl times (same

as the number of radial lines). One stride is corresponding

with a rotation of 10-degree (the angle interval). To reduce the

calculation, we use the coarse-to-fine method to diminish the

number of scans. First of all, we shift with a stride equal to 2.

After finding the maximum score (local maximum), we do a

(a) (b)

Fig. 5. The template and the rotating targets on test image. (a)

Template. (b) Test image with non-rotation and rotation targets

(a) (b) (c)

Fig. 6. The illustration of the expanded pixels after up-sampling with NP = 2. (a) The candidate pixel (pixel 1) and 15 expanded pixels

after up-sampling. (b) First, measure similarity between pixel 1 and pixels 2, 3, 4; and choose the maximum score. (c) The

pixel has the maximum score will be continue to measure with its neigbors.

fine scan with the stride equal to 1 to search the global

maximum point.

B. A Fast and Robust Matching using Coarse-to-Fine

Approach

At Step 4, we will apply coarse-to-fine approach for the

robustness matching. First of all, we consider the location

robustness. When we do the up-sampling technique on the test

image, as an example in Fig. 6, corresponding to the location

candidates (from step 3), with NP = 2, there are 15 expanded

pixels. We need to find the correct candidate among them. In

that example, pixel 1 is the center of the target candidate.

Three pixels 2, 3, 4 are three centers of search windows which

will be measured for similarity with the original template (T).

The center with the maximum score will continue to be

compared with three neighbors. In that way, instead of having

16 NCC computations, it will be reduced to 7 NCC

computations. Secondly, we also apply the coarse-to-fine

approach for finding robust rotation matching. The angle

candidate is compared with a pair of angles around it. The

angle with the maximum score will be a new angle candidate.

The winner will continue to be measured with other pairs.

Tolerance values are decreased gradually:    

after each comparison. For example, with an angle candidate

of 100, the first measure is between 70, 100 and 130. Assuming

that 70 is the winner, the next measure will be between 50, 70

and 90. The final winner is the angle of target on the test

image. Totally, the number of NCC computations is reduced

by the method mentioned above. As a case in point, if NP = 2,

NS = 3 and = 100, with the original method, corresponding

to one candidate from Step 3, we need 480 NCC computations

for robustness matching, while that maximum number is only

17 with the proposed method.

V. EXPERIMENTAL RESULTS AND DISCUSSION

A. Data Collections

Offline experiment. In this experiment, we test the matching

on a database. The catalogue of templates consists of 6 types:

Food, Manufacturing, Metal, PCB, Printing, and Logo. The

size of templates can be separated into 3 kinds. With small

size: (40~100) x (40~100) pixels, medium size: (100~200) x

(100~200) pixels and large size: (>200) x (>200) pixels. This

kind of experiment is implemented on 40 templates and 200

test images (see Table I). Each template is tested by 5 test

images with different angles, scales and locations. Each test

image only has one target. The parameters are used for

training and test process as follows: NS= 0.9~1.1, NC= 10, α=

100, thresholds: t1=0.84, t2=0.7, t3=0.7. If we want to do

matching with a huge different scale of targets, we can change

the parameter NS. The test process takes place through two

ways: using the embedded base original RST[3] (e-RST) and

the proposed method.

Online experiment. The Nvidia Jetson TX2 board is

connected to a camera with resolution of 752x580 pixels. That

camera captures objects on a conveyor belt. The online testing

is performed on 10 samples (see Fig. 7) with two kinds of

materials: wood and metal. Each sample is tested once. The

3D coordinates of samples after conversion are sent to a robot

arm to grasp and place.

B. Experimental Results

Offline experiment. The time of test process is compared

and showed in Table II. In terms of the process time, the table

shows that the proposed method is effective on the medium

templates: 0.176s, while the accuracy is highest with the large

size: 94.3%. The rotation matching results are shown with 5

different rotation test images: 20, 10, 00, 3590 and 3580 (see

Fig. 9). Those results show that the system can operate with a

Fig. 7. The online experiment is carried out 10 samples with different

sizes and shapes; and two kinds of metarials: wood and metal.

a) I(646x492), T(44x42) (b) I(646x492), T(363x170) c) I(640x480), T(120x120) d) I(768x576), T(71x73) e) I(640x512), T(142x170) f) I(768x576), T(92x87)

Fig. 8. The matching results on the complicated images. (a) Small targets, (b) large target, (c) easy confusing targets, (d) multi-rotation angle targets, (e)

blurred image, and (f) noisy Image. (I: Image size (width x height), T: Template size (width x height)).

TABLE I. THREE KINDS OF TEMPLATE SIZES USED FOR OFFLINE TEST

Template size

Small

Medium

Large

(40~100) x

(40~100) pixels

(100~200) x

(100~200) pixels

(>200) x

(>200) pixels

No. of

Templates

No. of Test

Images

TABLE II. THE AVERAGE OF EXECUTION TIME AND ACCURACY OF

TEST PROCESS ON THREE KINDS OF TEMPLATE SIZE

Size

Small

Medium

Large

Embedded-based original RST

Average Time (s)

1.113

1.739

19.940

Accuracy (%)

90.0

88.8

91.4

Proposed method

Average Time (s)

0.302

0.176

2.204

Accuracy (%)

93.3

89.4

94.3

small angle interval: 1 degree. Moreover, our algorithm also

can deal with complicated cases as shown in Fig. 8.

Online experiment. Table III shows the matching time of 10

samples. In general, all samples are matched for grasping with

the average execution time of 0.303s. SP5 gives the fastest

time with 0.074s, whilst SP6 shows the lowest with 1.06s.

C. Discussion

Table II shows the time and accuracy of the offline test on

two methods: e-RST and the proposed method. Generally

speaking, the latter is faster and more accurate than the

former. Specifically, with the medium and large templates the

proposed method is faster by nearly ten times (x10) than the

e-RST, while with the small ones, the improvement is around

four times. Based on a detailed evaluation of the proposed

method with the medium templates, the average execution

time is the shortest: 0.176s, while the large ones consume too

much time: more than 2s. The processing time on the small

templates is slow. The reason is because the level of down-

sampling in that case is 1. If higher levels are applied, the

accuracy will decrease. The information from the table shows

that the accuracies are nearly the same. The accuracy of the

proposed methods fluctuates within 89%~94%. The templates

of the large and medium sizes have a higher accuracy with

94.3%, while the medium ones are 89.4%. Regarding the

angle test (see Fig. 9), although there is a wrong result with

an acceptable error =  (in the test case: 20), the result shows

that the proposed method can approach a high precision of

angle interval: 1 degree.

In addition, the online testing demonstrates processes of

matching on real objects. The 3D coordinates, converted from

the 2D coordinates of targets, are compatible with the

coordinates of the robot arm. The processing time varies

around 0.1s~0.3s (except SP6). The test result on SP6 also

presents that the RST is more sensitive with objects which are

too long.

VI. CONCLUSION

In this study, we proposed a fast and high accuracy method

for the template matching on an embedded system. The

average execution time is cuts down notably with a 10x

reduction. The angle matching with a small interval, 1 degree,

is useful for grasping systems. And the connection between

robot arm and an embedded system with a vision algorithm

inside is a remarkable and applicable support for mobile

systems. The online testing also proves that the accuracy and

the process time of the proposed method are effective and

adequate for robotics grasping systems.

ACKNOWLEDGMENT

This work was supported by the Ministry of Science and

Technology (MOST), Taiwan, R.O.C., grant MOST 107-

2221-E-006-218, Tongtai machine & tool Co., Ltd. and

Contrel technology Co., Ltd.

REFERENCES

[1] C. Hu, F. Arvin, C. Xiong, and S. Yue, "Bio-inspired embedded vision

system for autonomous micro-robots: the LGMD case," in IEEE Trans.

Cog. Dev Systems, vol. 9, no. 3, pp. 241-254, 2017.

[2] W.C. Tan, P.C. Goh, A. Causo, I.-M. Chen, and H.K. Tan, "Automated

vision based detection of blistering on metal surface: For robot," in 13th

IEEE Conference on Automation Science and Engineering (CASE),

2017: IEEE, pp. 74-79, 2017.

[3] H.Y. Kim and S.A. De Araújo, "Grayscale template-matching invariant

to rotation, scale, translation, brightness and contrast," in Pacific-Rim

Symposium on Image and Video Technology, pp. 100-113, 2007.

[4] J.P. Lewis, "Fast template matching," in Vision interface, vol. 95, no.

120123, pp. 15-19, 1995.

[5] T. Wu and A. Toet, "Speed-up template matching through integral

image based weak classifiers," in J. Pattern Recognition Reseach,vol.

1, pp. 1-12, 2014.

[6] L. Di Stefano and S. Mattoccia, "A sufficient condition based on the

Cauchy-Schwarz inequality for efficient template matching," in

Proceedings 2003 International Conference on Image Processing (Cat.

No. 03CH37429), vol. 1, pp. 1-269, 2003.

[7] H.Y. Kim, "Rotation-discriminating template matching based on

Fourier coefficients of radial projections with robustness to scaling and

partial occlusion," in Pattern Recognition, vol. 43, no. 3, pp. 859-872,

2010.

[8] B. Kong, J. Supancic, D. Ramanan, and C. Fowlkes, "Cross-domain

image matching with deep feature maps," in I.J. Comput Vision, 2018.

[9] Z. Zhang, "A flexible new technique for camera calibration," in IEEE

Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no.

11, pp. 1330-1334, 2000.

[10] J.J. Lien, "Automatic recognition of facial expressions using hidden

Markov models and estimation of expression intensity", Ph.D thesis.

Carnegie Mellon University, 1998.

TABLE III. THE MATCHING TIME OF ONLINE TEST PROCESS ON 10 SAMPLES

Samples

SP1

SP2

SP3

SP4

SP5

SP6

SP7

SP8

SP9

SP10

Template size (pixels)

80x86

86x89

88x98

140x138

98x86

148x151

152 x 143

130 x 122

119x122

134x130

Test time (s)

0.169

0.227

0.136

0.263

0.074

1.060

0.309

0.558

0.123

0.110

Fig. 9. The offline experiment on 5 different rotation test images; template size is 331 x 330 pixels.

A PCB alignment system using rst template matching with cuda on embedded gpu board

Article

Full-text available

May 2020
SENSORS-BASEL

The fiducial-marks-based alignment process is one of the most critical steps in printed circuit board (PCB) manufacturing. In the alignment process, a machine vision technique is used to detect the fiducial marks and then adjust the position of the vision system in such a way that it is aligned with the PCB. The present study proposed an embedded PCB alignment system, in which a rotation, scale and translation (RST) template-matching algorithm was employed to locate the marks on the PCB surface. The coordinates and angles of the detected marks were then compared with the reference values which were set by users, and the difference between them was used to adjust the position of the vision system accordingly. To improve the positioning accuracy, the angle and location matching process was performed in refinement processes. To overcome the matching time, in the present study we accelerated the rotation matching by eliminating the weak features in the scanning process and converting the normalized cross correlation (NCC) formula to a sum of products. Moreover, the scanning time was reduced by implementing the entire RST process in parallel on threads of a graphics processing unit (GPU) by applying hash functions to find refined positions in the refinement matching process. The experimental results showed that the resulting matching time was around 32× faster than that achieved on a conventional central processing unit (CPU) for a test image size of 1280 × 960 pixels. Furthermore, the precision of the alignment process achieved a considerable result with a tolerance of 36.4μm.

Efficient Skill Acquisition for Complex Manipulation Tasks in Obstructed Environments

Preprint

Full-text available

Mar 2023

Data efficiency in robotic skill acquisition is crucial for operating robots in varied small-batch assembly settings. To operate in such environments, robots must have robust obstacle avoidance and versatile goal conditioning acquired from only a few simple demonstrations. Existing approaches, however, fall short of these requirements. Deep reinforcement learning (RL) enables a robot to learn complex manipulation tasks but is often limited to small task spaces in the real world due to sample inefficiency and safety concerns. Motion planning (MP) can generate collision-free paths in obstructed environments, but cannot solve complex manipulation tasks and requires goal states often specified by a user or object-specific pose estimator. In this work, we propose a system for efficient skill acquisition that leverages an object-centric generative model (OCGM) for versatile goal identification to specify a goal for MP combined with RL to solve complex manipulation tasks in obstructed environments. Specifically, OCGM enables one-shot target object identification and re-identification in new scenes, allowing MP to guide the robot to the target object while avoiding obstacles. This is combined with a skill transition network, which bridges the gap between terminal states of MP and feasible start states of a sample-efficient RL policy. The experiments demonstrate that our OCGM-based one-shot goal identification provides competitive accuracy to other baseline approaches and that our modular framework outperforms competitive baselines, including a state-of-the-art RL algorithm, by a significant margin for complex manipulation tasks in obstructed environments.

Learning-Based Template Matching for Robot Arm Grasping

Conference Paper

Oct 2021

Illumination-Robust Object Coordinate Detection by Adopting Pix2Pix GAN for Training Image Generation

Conference Paper

Nov 2019

Cross-Domain Image Matching with Deep Feature Maps

Article

Full-text available

Dec 2019
INT J COMPUT VISION

We investigate the problem of automatically determining what type of shoe left an impression found at a crime scene. This recognition problem is made difficult by the variability in types of crime scene evidence (ranging from traces of dust or oil on hard surfaces to impressions made in soil) and the lack of comprehensive databases of shoe outsole tread patterns. We find that mid-level features extracted by pre-trained convolutional neural nets are surprisingly effective descriptors for this specialized domains. However, the choice of similarity measure for matching exemplars to a query image is essential to good performance. For matching multi-channel deep features, we propose the use of multi-channel normalized cross-correlation and analyze its effectiveness. Our proposed metric significantly improves performance in matching crime scene shoeprints to laboratory test impressions. We also show its effectiveness in other cross-domain image retrieval problems: matching facade images to segmentation labels and aerial photos to map images. Finally, we introduce a discriminatively trained variant and fine-tune our system through our proposed metric, obtaining state-of-the-art performance.

Speed-up Template Matching through Integral Image based Weak Classifiers

Data

Full-text available

Jan 2014

Template matching is a widely used pattern recognition method, especially in industrial inspection. However, the computational costs of traditional template matching increase dramatically with both template-and scene imagesize. This makes traditional template matching less useful for many (e.g. real-time) applications. In this paper, we present a method to speed-up template matching. First, candidate match locations are determined using a cascaded blockwise computation of integral image based binary test patterns. Then, traditional template matching is applied at the candidate match locations to de-termine the best overall match. The results show that the proposed method is fast and robust.

Fast Template Matching

Article

Full-text available

Nov 1994

J.P. Lewis

Although it is well known that cross correlation can be efficiently implemented in the transform domain, the normalized form of cross correlation preferred for tem- plate matching applications does not have a simple fre- quency domain expression. Normalized cross correla- tion is usually computed in the spatial domain for this reason. This short paper shows that unnormalized cross correlation can be efficiently normalized using precom- puted tables containing the integral of the image and image2 over the search window.

Grayscale Template-Matching Invariant to Rotation, Scale, Translation, Brightness and Contrast

Conference Paper

Full-text available

Dec 2007

In this paper, we consider the grayscale template-matching problem, invariant to rotation, scale, translation, brightness and contrast, without previous operations that discard grayscale information, like detection of edges, detection of interest points or segmentation/binarization of the images. The obvious "brute force" solution performs a series of conventional template matchings between the image to analyze and the template query shape rotated by every angle, translated to every position and scaled by every factor (within some specified range of scale factors). Clearly, this takes too long and thus is not practical. We propose a technique that substantially accelerates this searching, while obtaining the same result as the original brute force algorithm. In some experiments, our algorithm was 400 times faster than the brute force algorithm. Our algorithm consists of three cascaded filters. These filters successively exclude pixels that have no chance of matching the template from further processing.

A sufficient condition based on the Cauchy-Schwarz inequality for efficient template matching

Conference Paper

Full-text available

Jan 2003
Image Process

The paper proposes a technique aimed at reducing the number of calculations required to carry out an exhaustive template matching process based on the normalized cross correlation (NCC). The technique deploys an effective sufficient condition, relying on the recently introduced concept of bounded partial correlation that allows rapid elimination of the points that cannot provide a better cross-correlation score with respect to the current best candidate. In this paper we devise a novel sufficient condition based on the Cauchy-Schwarz inequality and compare the experimental results with those attained using the standard NCC-based template matching algorithm and the already known sufficient condition based on the Jensen inequality.

A Flexible New Technique for Camera Calibration

Article

Full-text available

Dec 2000

Zhengyou Zhang

We propose a flexible technique to easily calibrate a camera. It only requires the camera to observe a planar pattern shown at a few (at least two) different orientations. Either the camera or the planar pattern can be freely moved. The motion need not be known. Radial lens distortion is modeled. The proposed procedure consists of a closed-form solution, followed by a nonlinear refinement based on the maximum likelihood criterion. Both computer simulation and real data have been used to test the proposed technique and very good results have been obtained. Compared with classical techniques which use expensive equipment such as two or three orthogonal planes, the proposed technique is easy to use and flexible. It advances 3D computer vision one more step from laboratory environments to real world use.

Automated vision based detection of blistering on metal surface: For robot

Conference Paper

Aug 2017

A Bio-inspired Embedded Vision System for Autonomous Micro-robots: the LGMD Case

Article

May 2016

In this paper, we present a new bio-inspired vision system embedded for micro-robots. The vision system takes inspiration from locusts in detecting fast approaching objects. Neurophysiological research suggested that locusts use a wide-field visual neuron called lobula giant movement detector (LGMD) to respond to imminent collisions. In this work, we present the implementation of the selected neuron model by a low-cost ARM processor as part of a composite vision module. As the first embedded LGMD vision module fits to a micro-robot, the developed system performs all image acquisition and processing independently. The vision module is placed on top of a microrobot to initiate obstacle avoidance behaviour autonomously. Both simulation and real-world experiments were carried out to test the reliability and robustness of the vision system. The results of the experiments with different scenarios demonstrated the potential of the bio-inspired vision system as a low-cost embedded module for autonomous robots.

Fast template Matching

Conference Paper

Jan 1995

J.P. Lewis

Rotation-discriminating template matching based on Fourier coefficients of radial projections with robustness to scaling and partial occlusion

Article

Mar 2010
PATTERN RECOGN

Hae Yong Kim

We consider brightness/contrast-invariant and rotation-discriminating tem- plate matching that searches an image to analyze A for a query image Q. We propose to use the complex coefficients of the discrete Fourier transform of the radial projec- tions to compute new rotation-invariant local features. These coefficients can be effi- ciently obtained via FFT. We classify templates in "stable" and "unstable" ones and argue that any local feature-based template matching may fail to find unstable tem- plates. We extract several stable sub-templates of Q and find them in A by comparing the features. The matchings of the sub-templates are combined using the Hough transform. As the features of A are computed only once, the algorithm can find quickly many different sub-templates in A, and it is suitable for: finding many query images in A; multi-scale searching and partial occlusion-robust template matching.

Embedded-Based Object Matching and Robot Arm Control

Figures

Recommended publications

Optimizing industrial robot arms

Robotic Arm Control Through the Use of Human Machine Interfaces and Brain Signals

Decentralized control of multiple manipulators handling an object in coordination based on impedance...

Manipulator Impedance Accuracy in Position-Based Impedance Control Implementations.