Fig 1 - uploaded by Yulia Sandamirskaya
Content may be subject to copyright.
Schematic of the depth from focus system. A vision chip images the scene through a focus tunable lens, with rapidly changing optical power. Within a 'focus sweep', the vision chip analyzes the sharpness of each pixel in N images. This yields a depth map with N depth levels. 

Schematic of the depth from focus system. A vision chip images the scene through a focus tunable lens, with rapidly changing optical power. Within a 'focus sweep', the vision chip analyzes the sharpness of each pixel in N images. This yields a depth map with N depth levels. 

Source publication
Article
Full-text available
Visual input can be used to recover the 3-D structure of a scene by estimating distances (depth) to the observer. Depth estimation is performed in various applications, such as robotics, autonomous driving, or surveillance. We present a low-power, compact, passive, and static imaging system that computes a semi-dense depth map in real time for a wi...

Citations

... The optimal focused image is extracted using the focus measure method based on the Canny sharpness detector [28]. The adjustment of the focal sweep range is achieved by measuring the depth of the object through the Depth-from-Focus (DFF) technique [29,30]. This results in a narrowed focal sweep range around the object's true focus, leading to stable 50 fps well-focused images. ...
Article
Full-text available
Active vision systems (AVSs) have been widely used to obtain high-resolution images of objects of interest. However, tracking small objects in high-magnification scenes is challenging due to shallow depth of field (DoF) and narrow field of view (FoV). To address this, we introduce a novel high-speed AVS with a continuous autofocus (C-AF) approach based on dynamic-range focal sweep and a high-frame-rate (HFR) frame-by-frame tracking pipeline. Our AVS leverages an ultra-fast pan-tilt mechanism based on a Galvano mirror, enabling high-frequency view direction adjustment. Specifically, the proposed C-AF approach uses a 500 fps high-speed camera and a focus-tunable liquid lens operating at a sine wave, providing a 50 Hz focal sweep around the object’s optimal focus. During each focal sweep, 10 images with varying focuses are captured, and the one with the highest focus value is selected, resulting in a stable output of well-focused images at 50 fps. Simultaneously, the object’s depth is measured using the depth-from-focus (DFF) technique, allowing dynamic adjustment of the focal sweep range. Importantly, because the remaining images are only slightly less focused, all 500 fps images can be utilized for object tracking. The proposed tracking pipeline combines deep-learning-based object detection, K-means color clustering, and HFR tracking based on color filtering, achieving 500 fps frame-by-frame tracking. Experimental results demonstrate the effectiveness of the proposed C-AF approach and the advanced capabilities of the high-speed AVS for magnified object tracking.
... Despite all these constraints, several computer vision algorithms have been implemented on SCAMP-5 FPSP. Examples include FAST keypoint detection (Chen et al., 2017a) (Greatwood et al. 2017;Liu et al. 2021), and depth estimation (Martel et al., 2017). However, the design and implementation of complex algorithm for FPSP, such as 6 DoF VO/SLAM, remains as challenging and open problems. ...
Article
Full-text available
Robotics faces a long-standing obstacle in which the speed of the vision system’s scene understanding is insufficient, impeding the robot’s ability to perform agile tasks. Consequently, robots must often rely on interpolation and extrapolation of the vision data to accomplish tasks in a timely and effective manner. One of the primary reasons for these delays is the analog-to-digital conversion that occurs on a per-pixel basis across the image sensor, along with the transfer of pixel-intensity information to the host device. This results in significant delays and power consumption in modern visual processing pipelines. The SCAMP-5—a general-purpose Focal-plane Sensor-processor array (FPSP)—used in this research performs computations in the analog domain prior to analog-to-digital conversion. By extracting features from the image on the focal plane, the amount of data that needs to be digitised and transferred is reduced. This allows for a high frame rate and low energy consumption for the SCAMP-5. The focus of our work is on localising the camera within the scene, which is crucial for scene understanding and for any downstream robotics tasks. We present a localisation system that utilise the FPSP in two parts. First, a 6-DoF odometry system is introduced, which efficiently estimates its position against a known marker at over 400 FPS. Second, our work is extended to implement BIT-VO—6-DoF visual odometry system which operates under an unknown natural environment at 300 FPS.
... These readout modes allow operation at very high frame rates, e.g. over 1000 frames per second (fps), with heavy-lifting done by the PPA array, and the rest of the control algorithm executed on the MCU. (68); (D) Depth from focus, subframes are acquired by sweeping the focus of a liquid lens at 60 Hz, up to 128 sub-frames are processed to determine a combined depth map at 60 fps; in the image, red is near, blue is far; this requires spatial contrast maximisation algorithm running at 7000 fps (67); (E) High dynamic range imaging using tone mapping by combining hundreds of frames; the image is progressively exposed to achieve locally balanced image intensity at each pixel (66). Figure 6 illustrates several basic algorithms, executed by the SCAMP-5 system, and their execution time. ...
... For instance, high-dynamic range (HDR) imaging can be based on acquiring hundreds of images, to generate one combined tone-mapped image frame (66). Monocular depth mapping can be achieved, using a lens with fast-sweeping focus, using the PPA to compute when objects come in and out of focus (67). The locations of corner features can be extracted at a rate of several thousand frames per second, outputting only coordinates of extracted keypoints (68). ...
Article
Vision processing for control of agile autonomous robots requires low-latency computation, within a limited power and space budget. This is challenging for conventional computing hardware. Parallel processor arrays (PPAs) are a new class of vision sensor devices that exploit advances in semiconductor technology, embedding a processor within each pixel of the image sensor array. Sensed pixel data are processed on the focal plane, and only a small amount of relevant information is transmitted out of the vision sensor. This tight integration of sensing, processing, and memory within a massively parallel computing architecture leads to an interesting trade-off between high performance, low latency, low power, low cost, and versatility in a machine vision system. Here, we review the history of image sensing and processing hardware from the perspective of in-pixel computing and outline the key features of a state-of-the-art smart camera system based on a PPA device, through the description of the SCAMP-5 system. We describe several robotic applications for agile ground and aerial vehicles, demonstrating PPA sensing functionalities including high-speed odometry, target tracking, obstacle detection, and avoidance. In the conclusions, we provide some insight and perspective on the future development of PPA devices, including their application and benefits within agile, robust, adaptable, and lightweight robotics.
... Our algorithm is capable of extracting the distances from the captured frames, estimating distances with high accuracy, and dealing with the low frequencies problem by using the input color information, a common issue in DFF algorithms [22]. ...
... There are many different algorithms proposed in the state of the art to approximate the depth estimation from focal stack capture [19,22,[34][35][36][37]. The discussion is presented by choosing three, with the first chosen due to the similarity of the capture system [34], Hui et al. presents a camera with a liquid lens which obtains the different focus positions varying the voltage like the arrangement proposed in this document. ...
Article
Full-text available
This work introduces a real-time full-resolution depth estimation device, which allows integral displays to be fed with a real-time light-field. The core principle of the technique is a high-speed focal stack acquisition method combined with an efficient implementation of the depth estimation algorithm, allowing the generation of real time, high resolution depth maps. As the procedure does not depend on any custom hardware, if the requirements are met, the described method can turn any high speed camera into a 3D camera with true depth output. The concept was tested with an experimental setup consisting of an electronically variable focus lens, a high-speed camera, and a GPU for processing, plus a control board for lens and image sensor synchronization. The comparison with other state of the art algorithms shows our advantages in computational time and precision.
... HERE is much research among the all-in-focus, focused, and depth images. Getting a clear image from a blurred image is called image deconvolution [1], [2], [3], [4], [5]; getting a depth image from a clear image or a focused image is called depth estimation [6], [7], [8]; the study from a series of focused images to a depth map is called shape from focus/defocus (SFF/ SFDF) [9], [10]. However, the relationship among these three is hard to describe intuitively. ...
... Although there has been lots of research on the PSF [1], [11], [12], [13], due to the some optical parameters cannot be measured directly, such as the f-number, the physical size of pixels, etc. and some data being hard to acquire, such as the all-in-focused image, focused image with corresponding focus depth, and depth map in the same view, the precise PSF is hard to get. To complement and advance previous work [10], [11], [13], we aim to obtain a precise PSF model of the camera. ...
... In other words, we solve: where represent the number of focused images. is the optical parameter defined in (6) and the mechanical parameter is defined in (10). Then we use the twodimensional exhaustive search to solve the 2 parameters in the model. ...
Preprint
Full-text available
Point spread function (PSF) plays a crucial role in many fields, such as shape from focus/defocus, depth estimation, and imaging process in fluorescence microscopy. However, the mathematical model of the defocus process is still unclear because several variables in the point spread function are hard to measure accurately, such as the f-number of cameras, the physical size of a pixel, the focus depth, etc. In this work, we develop a precise mathematical model of the camera's point spread function to describe the defocus process. We first derive the mathematical algorithm for the PSF and extract two parameters A and e. A is the composite of camera's f-number, pixel-size, output scale, and scaling factor of the circle of confusion; e is the deviation of the focus depth. We design a novel metric based on the defocus histogram to evaluate the difference between the simulated focused image and the actual focused image to obtain optimal A and e. We also construct a hardware system consisting of a focusing system and a structured light system to acquire the all-in-focus image, the focused image with corresponding focus depth, and the depth map in the same view. The three types of images, as a dataset, are used to obtain the precise PSF. Our experiments on standard planes and actual objects show that the proposed algorithm can accurately describe the defocus process. The accuracy of our algorithm is further proved by evaluating the difference among the actual focused images, the focused image generated by our algorithm, the focused image generated by others. The results show that the loss of our algorithm is 40% less than others on average. The dataset, code, and model are available on GitHub: https://github.com/cubhe/ precise-point-spread-function-estimation.
... Alternatively, a passive illumination method based on multifocus sensing [13][14][15] relies on the capture of a scene with the system focusing at different distances from a camera with the possibility of retrieving 3D information and novel image synthesis [16] from the acquired stack. In [15] an optical system incorporating an electrically focus-tunable lens is used to sweep a scene in depth while Laplacian of Gaussian filtering is used as a focus measure to compute depth from focus. ...
... Alternatively, a passive illumination method based on multifocus sensing [13][14][15] relies on the capture of a scene with the system focusing at different distances from a camera with the possibility of retrieving 3D information and novel image synthesis [16] from the acquired stack. In [15] an optical system incorporating an electrically focus-tunable lens is used to sweep a scene in depth while Laplacian of Gaussian filtering is used as a focus measure to compute depth from focus. Alternatively, other focus measures like those related to other high-frequency content of the scene [13] can be applied to retrieve depth information from a multifocus stack. ...
Article
Full-text available
In augmented reality displays, digital information can be integrated with real-world scenes. We present an augmented reality-based approach for three-dimensional optical visualization and depth map retrieval of a scene using multifocus sensing. From a sequence of images captured with different focusing distances, all-in-focus image reconstruction can be performed along with different point of view synthesis. By means of an algorithm that compares the all-in-focus image reconstruction with each image of the z-stack, the depth map of the scene can also be retrieved. Once the three-dimensional reconstructed scene for different points of view along with its depth map is obtained, it can be optically displayed in smart glasses allowing the user to visualize the real three-dimensional scene along with synthesized perspectives of it and provide information such as depth maps of the scene, which are not possible with conventional augmented reality devices. To the best of our knowledge, this is the first report on combining multifocus sensing and three-dimensional visualization and depth retrieval for applications to augmented reality.
... where f represents focal length, distance between far focus point in DoF space to the lens is (D lo + ∆D 0l ) ( point A in Fig.3(b)) and distance between far focus point in DoFo space to the sensor as (d ls − ∆d 0l ) (point A in Fig.3(b)). Now, we recall the simplified relationship for far point A using the intercept theorem by J.N.P. Martel et al. [34] in-terms of intrinsic and extrinsic parameters as, ...
... Further, we would like to mention that in J.N.P. Martel et al. [34] the real-time depth estimation is carried out by using a focal plane processor array with a tunable focus lens. Naturally, this method is not directly applicable to use the conventional camera because it involves changing both the image sensor and lens. ...
Article
Full-text available
A computational 3D image generation using a single view with multi-color filter aperture (MCA) and multi-plane representation is a cost-effective approach and most useful when there is no option to acquire either stereo or multi-views with orientation at all. Although this approach generates 3D perception image that includes multiple objects with both similar and dissimilar colors having occluded by each other, it may be insufficient for virtual/augmented reality applications due to inaccurate depth. In this article, we obtain a more accurate geometric depth estimation by formulating a suitable relationship between inter-objects depth of the 3D scene in the depth-of-field (DoF) zone and its corresponding inter-image plane depths of a 3D perception image in depth-of-focus (DoFo) zone of a given camera under shallow DoF zone constraint. But, this shallow depth zone is configured to be dependent only on the focal distance between the lens and object while the remaining parameters such as aperture diameter, focal length, and sensor sensitivity are held at constant values. All-in-focus 3D perception image is synthesized from multi-plane images (MPIs) by utilizing the inter-image plane depths computed from the disparities caused across the boundaries and its smooth surface from image textures inside the respective boundaries of the 2D MCA image. The 2.1D sketch is used as a semantic segmentation technique to determine the number of objects in the 3D scene as one in-focus region and the rest as out-of-focus regions due to the circle of confusions (CoCs) on the fixed image sensor plane. The same enables both ordering of the image regions and identifying occlusion wherever applicable. An accurate depth 3D image is synthesized, replacing accurate inter-depths in place of inter-depth between MPIs used for 3D perception image. In the end, the paper summarizes few experimental validations for the proposed approach with some salient examples having depth gaps between 0.5cm to 10.5cm.
... e essence of genetic algorithm (GA) [18][19][20] is to simulate the evolution process of nature. In the evolution process of nature, species must compete with other species in order to survive and reproduce, and excellent populations will survive to adapt to the changes of environment. ...
Article
Full-text available
Microscope vision analysis is applied in many fields. The traditional way is to use the human eye to observe and manually focus to obtain the image of the observed object. However, with the observation object becoming more and more subtle, the magnification of the microscope is required to be larger and larger. The method of manual focusing cannot guarantee the best focusing position of the microscope in use. Therefore, in this paper, we are studying the existing autofocusing technology and the autofocusing method of microscope based on image processing, which is different from the traditional manual focusing method. The autofocusing method of microscope based on image processing does not need the information such as the target position and the focal length of optical system, to directly focus the collected image. First of all, in order to solve the problem of large computation and difficult real time of traditional wavelet based image sharpness evaluation algorithm, this paper proposes an improved wavelet based image sharpness evaluation algorithm; secondly, in view of the situation that the window selected by traditional focusing window selection method is fixed, this paper adopts an adaptive focusing window selection method to increase the focusing window. Finally, this paper studies the extremum search strategy. In order to avoid the interference of the local extremum in the focusing curve, this paper proposes an improved hill-climbing algorithm to achieve the accuracy of focusing search. The simulation results show that the improved wavelet transform image definition evaluation algorithm can improve the definition evaluation performance, and the improved mountain climbing algorithm can reduce the impact of local extremum and improve the accuracy of the search algorithm. All in all, it can be concluded that the method based on image processing proposed in this paper has a good focusing effect, which can meet the needs of anti-interference and extreme value search of microscope autofocus. 1. Introduction Optical microscope [1, 2] plays an important role in human observation and understanding of the micro world. In recent years, with the continuous improvement of computer processing ability and the development of microimaging technology, the computer vision automatic detection system of optical microscope, which is built by computer, micro camera, and optical microscope, has been widely used in various fields, including military, medicine, and biology. The development of this automatic detection system can greatly reduce the workload, improve the detection efficiency and accuracy, and save a lot of time. In the automatic detection system of computer vision, it is one of the key technologies to realize the automatic focusing of microscope [3]. Autofocus refers to the process in which the system automatically adjusts the mechanical structure (image distance or object distance) to make the image clear again when the image is blurred due to defocusing. It is a common task for all the subjects applying computer vision automatic detection system to realize the good autofocusing performance of optical microscope. The traditional autofocus technology is studied from the perspective of focal length measurement, and the new computer-controlled microscope autofocus based on image processing is a multidisciplinary comprehensive application combining machine vision, image processing, optimization theory, and electromechanical technology. Automatic focusing method based on image technology directly applies related image processing technology to the focusing image captured by the camera, analyzes the image quality, obtains the current focusing state of the system, and then drives the mechanism to adjust the focal length to achieve automatic focusing [4]. With the development of camera, autofocus has appeared formally. It has a history of hundreds of years. It is widely used in traditional cameras, but now it is more popular in digital cameras, and autofocus technology is more widely used. Because digital cameras are almost all digital, especially the use of high-performance microprocessors, the autofocus function is easier to be embedded, which makes the development of autofocus technology more rapid. On cameras with a higher level, autofocus has become a technical indicator. The traditional autofocus technology includes trigonometric ranging method [5], infrared ranging method [6], and ultrasonic ranging method [7]. The main function of the traditional autofocus method is to measure the distance between the object to be measured and the sensitive element. Then, the measured distance is substituted into the optical Gauss formula to calculate the object distance, and then the lens is quickly moved to the best object distance according to the calculated object distance. With the development of technology, ranging focusing mode is not suitable for short-range focusing because of the large instruments. Since the 1990s, with the rapid development of image sensor technology such as CCD and CMOS and image processing technology [8], the focusing technology based on image processing has developed rapidly. Although the research time of focusing technology based on image processing is relatively short, with the expansion of the application scope of focusing technology based on image processing, great progress has been made in the assembly of micro parts, cell operation, integrated circuit assembly, etc. Automatic focusing based on image processing is to process the sequence image collected by image sensor in real time, obtain the evaluation value that can represent the image clarity, judge whether the image is clear according to the evaluation value, and give the corresponding feedback signal to drive the motor to control the lens movement until the clearest image is obtained and complete the automatic focusing. It can be seen from the above discussion that the key of autofocus technology based on image processing is to obtain the evaluation value of image clarity, so the definition evaluation function is the key research object of autofocus technology based on image processing [9]. The selection of focusing window and focusing search strategy are also important factors that affect the effectiveness of autofocusing. People have done a lot of research on these two aspects. Autofocusing based on image processing has its own characteristics: First, the selection of focusing criteria is flexible and diverse. In digital image processing, there are many methods to describe and extract image features, so autofocusing based on image processing can select different focusing criteria according to the needs of the imaging system, which is conducive to improving the intelligence of focusing. Second, the driving circuit and moving mechanism of the autofocus system are greatly simplified, which is easy to control the speed of focus and is conducive to the improvement of real-time performance. In addition, because the automatic focusing based on image processing is judged according to the characteristics of the acquired image, compared with the traditional focusing mode, without additional auxiliary equipment, it can reduce the cost, more conducive to the integration and miniaturization of the module, and has a wide application prospect. In this paper, the microscope autofocus method based on image processing is of great significance to the development of microscope and medicine. Based on the above background, this article uses image processing technology to achieve automatic focusing of the microscope. Unlike traditional methods, microscope autofocus technology based on image processing can directly collect images for autofocusing. The selected focusing standards are flexible and diverse, simplifying the structure of the focusing system and improving the real-time focusing of the microscope. In this method, we improved and optimized the image sharpness evaluation algorithm and the autofocus extreme value search algorithm, which are the most important part of the microscope autofocus technology based on image processing, and designed a more effective microscope-based. The autofocus method is in image processing. Aiming at the shortcomings of traditional wavelet transform image definition evaluation algorithms, this algorithm proposes an improved wavelet transform image definition evaluation algorithm, which can well improve the disadvantages of traditional methods that are large in calculation and difficult to realize. In addition, in view of the fact that the window selected by the traditional focus window selection method is fixed, this paper adopts the adaptive focus window selection method to increase the versatility of the focus window selection. In order to avoid the interference of the local extremum on the focus curve on the autofocus, an improved climbing algorithm is proposed to realize the precise search of the microscope autofocus. The simulation results show that this method has a good effect in the field of microscope autofocus and can meet the needs of microscope autofocus [10]. 2. Principle and Method of Autofocus 2.1. Basic Principle of Autofocus In the early imaging systems, most of them use manual way to complete the focusing process. Manual focusing depends on the subjective judgment of human beings, so it cannot guarantee the accuracy of focusing, and the efficiency of focusing is low, which cannot meet the requirements of optical imaging system to capture instant pictures. In the late 1960s, with the rapid development of microelectronics technology, focusing can be automatically completed by computer or intelligent chip control, which is called autofocusing. Good autofocusing control strategy can meet the requirements of modern optical imaging system for focusing accuracy and speed. Simply put, the principle of autofocusing is that the image detector receives the reflected light from the object and converts it into the corresponding signal and then processes it through the intelligent chip or computer according to a certain algorithm to drive the focusing device to adjust the optical system to complete the focusing process. An autofocus system mainly has two function modules, analysis module and control module. The analysis module determines whether to focus by analyzing the input image. If it is defocused, the defocusing degree of the image is calculated, and the relevant information is provided for the control module. The control module adjusts the lens through the driving device, so that the image is in the focusing state. This article looks at the principle of autofocus to see if it can be used on a microscope. 2.2. Autofocusing Method Based on Image Processing 2.2.1. Depth from Defocus Depth from Defocus (DFD) method [11–13] is a method to obtain the depth information of the focus target through the defocused image so as to complete the autofocus. DFD method needs to obtain 2-3 frames of images with different defocusing degrees. Through the analysis and processing of the local area of the image, the fuzzy degree and defocusing depth information of the image is obtained, and then the focus position is determined, the lens motion is continuously driven, and the autofocus is completed. There are two main DFD methods: one is based on image restoration. In this method, the point spread function of the imaging system is estimated by some information, which can represent the important features of the image, and then the original image of the image is restored by the inversion of the image degradation model. This method needs to get representative information in the image, which is not effective for any target, and has limitations. The other is based on fuzzy analysis, which is obtained by analyzing the size of fuzzy circle. This method analyzes and processes 2-3 frames of images obtained under different lens imaging parameters and determines the relationship between the size of blur circle and lens imaging parameters according to the principle of geometric optics and finds the focus position. In this method, the size of fuzzy circle is in pixels, and rounding error will be introduced. DFD method only needs 2-3 frames of images with different defocusing degrees to complete autofocusing, greatly reducing the amount of image acquisition and the time required to drive the motor, so it has a faster focusing speed. In order to use DFD method to focus, it is necessary to know all kinds of parameters of the imaging system and establish the mathematical model of the imaging system in advance, so as to calculate the defocusing depth and judge the focusing position according to a small number of images. However, in practical application, the mathematical model of the imaging system cannot be accurately determined in theory, and a small number of images are used with less information, so DFD method may lead to large focusing error and its accuracy and stability. At present, DFD method is still in theoretical research and experimental application. 2.2.2. Depth from Focus Depth from Focus (DFF) [14–16] method is an autofocusing method based on focusing search mechanism. DFF method uses a certain definition evaluation function to calculate the definition evaluation function values (evaluation values) of images with different defocusing degrees. According to the characteristics of the maximum image evaluation value when focusing accurately, the focusing search algorithm is used to control the lens to move towards the direction with the maximum evaluation value until the accurate focusing position is found. The theoretical premise of DFF method is that the curve of autofocusing evaluation function is unimodal, and the focusing curve is strictly monotonous on both sides of the peak value. The position where the peak value is obtained is the focusing position, and the image of this position is the clearest. In order to obtain the focusing position, the focusing curve should be unimodal and monotonic, so the interference of noise should be minimized; otherwise the local extremum caused by noise will affect the reliability of focusing and even lead to focusing failure. The above method provides a theoretical basis for the autofocus method in this article. The analysis module analyzes and evaluates the defocusing degree of the target image, and the control module adjusts the lens position according to the relevant information provided by the analysis and evaluation module until the target image on the image detector is the clearest and finally stores and outputs the clear image. 3. Method of Autofocusing Microscope 3.1. Image Definition Evaluation Method In the research of image definition evaluation method, this paper designs image definition evaluation method based on lifting wavelet transform, aiming at the problems of traditional wavelet transform, such as large computation and being difficult to realize. Compared with the traditional wavelet transform, the lifting wavelet transform has the following advantages: (1) It has all the advantages of the first generation of wavelet. (2) It completely gets rid of the Fourier transform, abandons the conditions of binary translation and expansion, and completes the construction of wavelet function in the time domain. (3) It improves the speed of wavelet transform, and the operation speed tends to be twice that of traditional wavelet transform. (4) Standard operation, which takes up less memory, can save storage resources [17]. The wavelet lifting scheme realizes the separation of high- and low-frequency signals through three steps: splitting, prediction, and updating.(1)Split In the splitting process, the original signal is divided into two subsets, which are usually split by parity. The even set contains the values of all even positions of the original signal, and the odd set contains the values of all odd positions of the original signal, as shown in(2)Predict Because of the correlation between the data, the even part can be used to predict the odd part. In the prediction process, is used as the prediction value of odd sequence, and then the difference between the actual value and the prediction value of odd sequence is calculated to get the residual signal , which corresponds to the high-frequency part of the original signal after wavelet transform, as shown in the following formula:(3)Update Some properties of the split subset are different from the original signal, so it needs to be updated. In the update process, the update operator acts on the residual signal obtained in the prediction process to generate a subset consistent with the original data, as shown in the following formula: In the formula, corresponds to the low-frequency part after wavelet transform; is the update operator. If multilevel wavelet transform is needed, repeat the above steps for . The process of improving wavelet transform is as follows: Firstly, line transform is carried out, the image is temporarily divided into two bands, the left band is low-frequency band, and the right band is high-frequency band. Then, the two frequency bands are transformed in columns, and the image is divided into four frequency bands: one low-frequency subband (upper left corner) and three high-frequency subbands (upper right corner), (lower left corner), and (lower right corner). , , and , respectively, correspond to the high-frequency components in horizontal, vertical and diagonal directions after the first level lifting wavelet decomposition. Repeat the above process for to realize the two-level decomposition of lifting wavelet transform. One low-frequency subband () and three high-frequency subbands (, , and ) can be obtained. Using the energy of high-frequency coefficients of wavelet transform to construct image definition evaluation function is one of the commonly used methods. From this point of view, this paper designs the definition evaluation function based on lifting wavelet transform. It can be seen from the above introduction that the larger the defocusing amount, the more blurred the image, the more serious the loss of image details, and the less the corresponding high-frequency components. Because of the energy invariability of orthogonal wavelet transform, the energy of the high-frequency coefficient corresponding to the focus image is large, and the energy of the high-frequency coefficient corresponding to the defocused image is small, while the energy of the low-frequency coefficient is opposite. Therefore, the definition evaluation function combining the characteristics of the two changes is proposed as follows: In the above formula, and are the high-frequency subbands after level wavelet decomposition and represents the low-frequency subbands after level wavelet decomposition. 3.2. Selection Method of the Focusing Window In order to adaptively select the target image as the focusing window, the best segmentation threshold of the target and background is obtained according to some image segmentation algorithm, and the binary image is obtained by using the threshold segmentation; then the edge is extracted and the “center of gravity” of the edge image is calculated, and the area with rich edge is obtained as the focusing window. In this paper, the criterion of maximum variance between classes is used to get the best segmentation threshold of target and background, and the fast searching ability of genetic algorithm is used to search the global best segmentation threshold. Using these two methods can reduce the error. 3.2.1. Maximum Variance between Classes’ Criteria The total number of pixels of is , the gray level is , the total number of pixels with the gray value of , and the probability of their occurrence is and , respectively, Suppose that the gray value is the threshold value of target and background segmentation; then the gray value range of background area is , and the gray value range of target area is , and the proportion of the two is, respectively, If the average gray value of the image is , then can be obtained from the average gray value of the background area and the average gray value of the target area, that is, . According to the definition of variance, the variance between the two groups is obtained as follows: with the maximum value of is the best segmentation threshold of target and background. 3.2.2. Genetic Algorithm The essence of genetic algorithm (GA) [18–20] is to simulate the evolution process of nature. In the evolution process of nature, species must compete with other species in order to survive and reproduce, and excellent populations will survive to adapt to the changes of environment. GA combines the survival rules of the fittest in nature with the random information exchange system of the chromosome in the population and gives full play to the advantages of the survival of the fittest. It is a highly parallel, random, and adaptive global optimization search algorithm. It has strong robustness and global search ability, can improve the image search ability of autofocus, and is easy to combine other algorithms. In addition, it also includes the most important concept: fitness function. The fitness function is determined by the goal of the problem. The fitness value represents the reproduction probability of each individual. The larger its value is, the better the individual is and, on the contrary, the worse the individual is. The main process is as follows:(1)Chromosome Coding. The genetic object of genetic algorithm is gene string, so it is necessary to encode the data of problem decision space into gene string structure data. For individuals, the coding can be binary or real. Binary encoding is used in GA.(2)Generation of Initial Population. The operation of genetic algorithm is population operation, which needs to generate initial population randomly before genetic operation. According to the coding mode of chromosomes, chromosomes are randomly generated, each chromosome represents an individual, and GA iteratively evolves with the initial data of this individual.(3)Fitness Evaluation. Fitness is an important data to evaluate the quality of individuals and guide the genetic operation smoothly.(4)Genetic Manipulation, Including Selection, Crossover, and Variation. Among them, the selection operation is replication, according to the size of fitness value to select the individuals who enter the cross-mutation operation. The crossover operation will make two individuals selected from the population exchange part of genes with a larger crossover probability according to single-point consistent crossover or other crossover strategies and produces offspring; the mutation operation will make some genes of individuals mutate according to a certain mutation probability (generally small), and the gene in GA will mutate into its allele.(5)Termination Conditions. The condition of the end of genetic operation cycle in GA is to set the maximum evolution algebra or to converge to the required precision. 3.2.3. Adaptive Focusing Window Selection (1)Set the relevant parameters of genetic algorithm, and take the maximum as the fitness function.(2)The evaluation fitness function values are calculated and sorted.(3)After the cross mutation, the historical optimal individuals and the current optimal individuals of the population are preserved, and the population is updated.(4)Termination algorithm: when the maximum number of iterations or the optimal individual remains unchanged for a long time, it indicates that the optimal individual has been found, and stop searching; otherwise return to step (2).(5)According to the best segmentation threshold obtained by searching, the original image is segmented to get the binary image, and then the edge image is obtained by edge extraction. The “center of gravity” of the edge image is calculated by equation (8), and the size area is taken as the focusing window with this point as the center. In this paper, , , and and represent the numbers of rows and columns of the image, respectively, and the symbol represents rounding.When the image contains noise or the gray edge of the image is not uniform, the location of the “center of gravity” of the image will deviate. Therefore, before calculating the “center of gravity” of the image, the edge image can be divided into several size areas. If the number of edge points in the area is less than the number of global average edge points, the gray value of the pixels in the area is set to 0 to eliminate the influence of individual edge points or noise points on the “center of gravity” of the calculated image. 3.3. Extreme Search Strategy 3.3.1. Mountain Climbing Search Algorithm The ideal focusing function curve is approximately parabola shape, the peak value corresponds to the best imaging position, and the curve on both sides of the peak point is monotonous. The specific process of mountain climbing search algorithm is as follows. The stepper motor drives the lens to move forward in equal steps from left to right. In each step, one frame of image is collected and the sharpness of the image is calculated. If the sharpness of the current image is greater than that of the previous frame, it is considered that the lens has been moving towards the focus direction. When the sharpness of the image decreases for the first time, it is considered that the lens has passed the focus position. At this time, the motor reverses, the step size becomes smaller, and the lens is driven to move to the left. Similarly, when the image clarity is reduced again, it means that the lens can cross the peak position. When the motor reverses at this time, the step size decreases compared with the last time, and the driving lens moves to the right so repeatedly until the step size is less than a certain limit value. In theory, the mountain climbing search algorithm is an ideal extremum search strategy. However, in practical application, the focus function curve changes smoothly in the area far away from the quasi-focus position, and the motor will run for a long time in these areas. Moreover, only through the comparison of the clarity of the two images before and after can it judge the focus direction, which is easy to be interfered by the local extremum, causing misfocus. In addition, in the hill climbing search algorithm, the motor frequently changes direction, which affects the service life of the focusing mechanism. 3.3.2. The Improvement of Mountain Climbing Search Algorithm The main factors that affect the search process are image data acquisition, processing, and motor operation. In contrast, the operation time of motor is much longer than that of image acquisition and processing. Therefore, a good search algorithm should minimize redundant motor motion. Considering the focusing speed and accuracy of the system, this paper explores an optimized hill-climbing search algorithm. Firstly, the algorithm traverses the whole focusing stroke and determines the fine-tuning interval. Then, the hill-climbing algorithm is used to search for the best imaging position in the fine-tuning interval.(1)Large step traversal: set a large step as the driving wheel rotates a circle, the motor drives the lens to start from the starting position, and the large step traverses the whole focusing stroke. In each step of the lens, a frame of image is collected to calculate the definition of the current image. After traversing the whole process, the maximum value of focusing evaluation function and lens position can be obtained.(2)The motor drives the lens to move in reverse direction and brings the lens back to the previous sampling point at the maximum value of the focusing evaluation function, which is the starting point of the fine-tuning interval.(3)Small step climbing search in fine-tuning section: at the starting point of fine-tuning section, the motor driving lens rotates a tooth with the minimum step, that is, the driving wheel, and the climbing method is used to search the quasi-focus position. When four evaluation function values decrease continuously, the curve is determined to be the descending direction. It indicates that the focus position has been crossed. At this time, the motor reverses and drives the lens to move to the position of the first point among the four consecutive descent points, which is the best focus position. 4. Results and Discussion In the research of automatic focusing method of microscope based on image processing, the three key technologies are image definition evaluation method, focusing window selection, and search strategy. Therefore, this paper puts forward corresponding improvements for the problems of the three technologies. Firstly, in the image definition evaluation method, aiming at the problem that the traditional image definition evaluation algorithm based on wavelet transform has a large amount of computation and is difficult to realize, this paper proposes an improved image definition evaluation algorithm based on wavelet transform. In order to verify the performance of the articulation evaluation algorithm proposed in this paper, the present algorithm of focusing image sharpness evaluation is compared with that of the algorithm. The algorithm of image sharpness evaluation includes the algorithm of image sharpness evaluation based on wavelet transform, Brenner evaluation function, and Tenengrad evaluation function. Four evaluation algorithms are compared and analyzed under the background of noise, and the comparison results are shown in Figure 1. It can be seen from Figure 1 that the curve of the image definition evaluation algorithm proposed in this paper is the sharpest. The sharpness can reflect the sharpness of the image, and the side lobe value is far lower than the other three methods, which shows that the algorithm in this paper has the best sharpness, which shows that, under the background of noise, the algorithm in this paper has better definition evaluation performance and better anti-noise performance and sensitivity.
... Initially we had ten packaged dies: a couple were faulty, we broke a couple during the tests, kept a couple in our older test PCBs, and used the rest to build a few functional camera systems, with an integrated controller, USB interface, etc. These systems were then used to explore potential applications in our lab, and by a few collaborators on various projects [36], [37], [38]. These collaborations were very successful, and we started to get more people interested in using the system, so we decided to package all the remaining twenty bare dies from the MPW run, that we still had in the cupboard, to build more systems for people to use. ...
... Different from a conventional image sensor where images are read out and then processed externally to the sensor, the SCAMP-5 features on-board parallel processing, outputting computation results directly to a high-level controller. This on-board processing enables a range of potential applications, such as visual odometry [3], mobile robot tracking [17], proximity estimation [9], real-time depth estimation [30] and CNN inference [4]. Figure 1 illustrates the main hardware components within the SCAMP-5 system. ...
Conference Paper
Full-text available
Performance, storage, and power consumption are three major factors that restrict the use of machine learning algorithms on embedded systems. However, new hardware architectures designed with visual computation in mind may hold the key to solving these bottlenecks. This work makes use of a novel visual device: the pixel processor array (PPA), to embed a convolutional neural network (CNN) onto the focal plane. We present a new high-speed implementation of strided convolutions using binary weights for the CNN on PPA devices, allowing all multiplications to be replaced by more efficient addition/subtraction operations. Image convolutions, ReLU activation functions, max-pooling and a fully-connected layer are all performed directly on the PPA's imaging plane, exploiting its massive parallel computing capabilities. We demonstrate CNN inference across 4 different applications, running between 2,000 and 17,500 fps with power consumption lower than 1.5W. These tasks include identifying 8 classes of plankton, hand gesture classification and digit recognition.