PreprintPDF Available

Deep learning for smart fish farming: applications, opportunities and challenges

Authors:

Abstract and Figures

With the rapid emergence of deep learning (DL) technology, it has been successfully used in various fields including aquaculture. This change can create new opportunities and a series of challenges for information and data processing in smart fish farming. This paper focuses on the applications of DL in aquaculture, including live fish identification, species classification, behavioral analysis, feeding decision-making, size or biomass estimation, water quality prediction. In addition, the technical details of DL methods applied to smart fish farming are also analyzed, including data, algorithms, computing power, and performance. The results of this review show that the most significant contribution of DL is the ability to automatically extract features. However, challenges still exist; DL is still in an era of weak artificial intelligence. A large number of labeled data are needed for training, which has become a bottleneck restricting further DL applications in aquaculture. Nevertheless, DL still offers breakthroughs in the handling of complex data in aquaculture. In brief, our purpose is to provide researchers and practitioners with a better understanding of the current state of the art of DL in aquaculture, which can provide strong support for the implementation of smart fish farming.
Content may be subject to copyright.
Deep learning for smart fish farming: applications,
opportunities and challenges
Xinting Yang1,2,3, Song Zhang1,2,3,5, Jintao Liu1,2,3,6, Qinfeng Gao4, Shuanglin Dong4, Chao Zhou1,2,3*
1. Beijing Research Center for Information Technology in Agriculture, Beijing 100097, China
2. National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
3. National Engineering Laboratory for Agri-product Quality Traceability, Beijing, 100097, China
4. Key Laboratory of Mariculture, Ministry of Education, Ocean University of China, Qingdao, Shandong Province, 266100, China
5. Tianjin University of Science and Technology, Tianjin 300222, China
6. Department of Computer Science, University of Almeria, Almeria, 04120, Spain
*Corresponding author: Chao Zhou E-mail address: supperchao@hotmail.com, zhouc@nercita.org.cn
DOIhttps://doi.org/10.1111/raq.12464
Abstract
The rapid emergence of deep learning (DL) technology has resulted in its successful use in various
fields, including aquaculture. DL creates both new opportunities and a series of challenges for
information and data processing in smart fish farming. This paper focuses on applications of DL in
aquaculture, including live fish identification, species classification, behavioral analysis, feeding
decisions, size or biomass estimation, and water quality prediction. The technical details of DL
methods applied to smart fish farming are also analyzed, including data, algorithms, and performance.
The review results show that the most significant contribution of DL is its ability to automatically
extract features. However, challenges still exist; DL is still in a weak artificial intelligence stage and
requires large amounts of labeled data for training, which has become a bottleneck that restricts further
DL applications in aquaculture. Nevertheless, DL still offers breakthroughs for addressing complex
data in aquaculture. In brief, our purpose is to provide researchers and practitioners with a better
understanding of the current state of the art of DL in aquaculture, which can provide strong support
for implementing smart fish farming applications.
Keywords: Deep learning; Smart fish farming; Advanced analytics; Aquaculture;
Contents
1. Introduction ..................................................................................................................................................................... 3
2. Concepts of deep learning ........................................................................................................................................... 5
2.1 Terms and definitions of deep learning ................................................................................................................ 5
2.2 Learning tasks and models ....................................................................................................................................... 6
3. Applications of deep learning in smart fish farming .......................................................................................... 8
3.1 Live fish identification ............................................................................................................................................... 9
3.2 Species classification ................................................................................................................................................. 14
3.3 Behavioral analysis ................................................................................................................................................... 18
3.4 Size or biomass estimation ...................................................................................................................................... 21
3.5 Feeding decision-making ......................................................................................................................................... 25
3.6 Water quality prediction ......................................................................................................................................... 26
4. Technical details and overall performance ......................................................................................................... 29
4.1 Data ................................................................................................................................................................................ 29
4.2 Algorithms ................................................................................................................................................................... 29
4.3 Performance evaluation indexes and overall performance ......................................................................... 32
5. Discussion ....................................................................................................................................................................... 34
5.1. Advantages of deep learning ................................................................................................................................. 34
5.2. Disadvantages and limitations of deep learning ............................................................................................. 35
5.3. Future technical trends of deep learning in smart fish farming ............................................................... 36
6. Conclusion ...................................................................................................................................................................... 37
Appendix A: Public dataset containing fish ...................................................................................................................... 46
1. Introduction
In 2016, the global fishery output reached a record high of 171 million tons. Of this output, 88% is
consumed directly by human beings and is essential for achieving the Food and Agriculture
Organization of the United Nations (FAO)'s goal of building a world free from hunger and malnutrition
(FAO, 2018). However, as the population continues to grow, the pressure on the world’s fisheries will
continue to increase (Merino et al., 2012 ; Clavelle et al., 2019).
Smart fish farming refers to a new scientific field whose objective is to optimize the efficient use
of resources and promote sustainable development in aquaculture through deeply integrating the
Internet of Things (IoT), big data, cloud computing, artificial intelligence and other modern
information technologies. Furthermore, the real-time data collection, quantitative decision-making,
intelligent control, precise investment and personalized service, have been achieved, finally forming a
new fishery production mode (Figure 1).
Figure 1. The role of deep learning and big data in smart fish farming
In smart fish farming, data and information are the core elements. The aggregation and advanced
analytics of all or part of the data will lead to the ability to make scientifically based decisions.
However, the massive amount of data in smart fish farming imposes a variety of challenges, such as
multiple sources, multiple formats and complex data. Multiple sources include information regarding
the equipment, the fish, the environment, the breeding process and people. The multiple formats
include text, image and audio. The data complexities stem from different cultured species, modes and
stages. Addressing the above high-dimensional, nonlinear and massive data is an extremely
challenging task.
More attention is being paid to data and intelligence in current fish farming than ever before. As
shown in Figure 1, data-driven intelligence methods, including artificial intelligence and big data, have
begun to transform these data into operable information for smart fish farming (Olyaie et al., 2017 ;
Shahriar & McCulluch, 2014). Artificial intelligence, especially machine learning and computer vision
applications, is the next frontier technology of fishery data systems (Bradley et al., 2019). Traditional
machine learning methods, such as the support vector machine (SVM) (Cortes & Vapnik, 1995),
artificial neural networks (ANN) (Hassoun, 1996), decision trees (Quinlan, 1986), and principal
component analysis (Jolliffe, 1987), have achieved satisfactory performances in a variety of
applications (Wang et al., 2018). However, the conventional machine learning algorithms rely heavily
on features manually designed by human engineers (Goodfellow, 2016), and it is still difficult to
determine which features are most suitable for a given task (Min et al., 2017).
As a breakthrough in artificial intelligence (AI), deep learning (DL) has overcome previous
limitations. DL methods have demonstrated outstanding performances in many fields, such as
agriculture (Yang et al., 2018 ; Gouiaa & Meunier, 2017), natural language processing (Li, 2018),
medicine (Gulshan et al., 2016), meteorology (Mao et al., 2019), bioinformatics (Min et al., 2017),
and security monitoring (Dhiman & Vishwakarma, 2019). DL belongs to the field of machine learning
but improves data processing by extracting highly nonlinear and complex features via sequences of
multiple layers automatically rather than requiring handcrafted optimal feature representations for a
particular type of data based on domain knowledge (LeCun et al., 2015 ; Goodfellow, 2016). With
its automatic feature learning and high-volume modeling capabilities, DL provides advanced analytical
tools for revealing, quantifying and understanding the enormous amounts of information in big data to
support smart fish farming (Liu et al., 2019). DL techniques can be used to solve the problems of
limited intelligence and poor performance in the analysis of massive, multisource and heterogeneous
big data in aquaculture. By combining the IoT, cloud computing and other technologies, it is possible
to achieve intelligent data processing and analysis, intelligent optimization and decision-making
control functions in smart fish farming.
This paper provides a comprehensive review of DL and its applications in smart fish farming.
First, the various DL applications related to aquaculture are outlined to highlight the latest advances in
relevant areas, and the technical details are briefly introduced. Then, the challenges and future trends
of DL in smart fish farming are discussed. The remainder of this paper is organized as follows: After
the Introduction, Section 2 introduces basic background knowledge such as DL terminology,
definitions, and the most popular learning models and algorithms. Section 3 describes the main
applications of DL in aquaculture, and Section 4 provides technical details. Section 5 discusses the
advantages, disadvantages and future trends of DL in smart fish farming, and Section 6 concludes the
paper.
2. Concepts of deep learning
2.1 Terms and definitions of deep learning
Machine learning (ML), which emerged together with big data and high-performance computing,
has created new opportunities to unravel, quantify, and understand data-intensive processes. ML is
defined as a scientific field that seeks to give machines the ability to learn without being strictly
programmed (Samuel, 1959 ; Liakos et al., 2018). Deep learning is a branch of machine learning and
is type of representation learning algorithm based on an artificial neural network (Deng & Yu, 2014).
Specifically, DL is a type of machine learning that can be used for many (but not all) AI tasks
(Goodfellow, 2016 ; Saufi et al., 2019).
DL enables computers to build complex concepts from simpler concepts, thus solving the core
problem of representation learning (Bronstein et al., 2017 ; LeCun et al., 2015). Figure 2 shows an
example of how a DL system might represent the concept of a fish in an image by combining simpler
concepts. It is difficult for computers to directly understand the meaning contained in raw sensory
input data, such as an image represented as a set of pixels. The functions that map a set of pixels to an
object are highly complex. It seems impossible to learn or evaluate such a mapping through direct
programming. To solve this problem, DL decomposes this complex mapping into a nested series of
simpler mappings. For example, an image is input in the visible layer, followed by a series of hidden
layers that extract increasingly abstract features from the image. Given a pixel, by comparing the
brightness of adjacent pixels, the first layer could easily identify whether this pixel represents an edge.
Then, the second hidden layer searches for sets of edges that can be recognized as angles and extended
contours. The third hidden layer can then find a specific set of contours and corners that represent an
entire portion of a particular object. Finally, the various objects existing in the image can be identified
(Goodfellow, 2016 ; Zeiler & Fergus, 2014).
Figure 2. An example of a DL model
2.2 Learning tasks and models
In general, a DL method involves a learning process whose purpose is to gain "experience" from
samples to support task execution. DL methods can be divided into two categories: supervised learning
and unsupervised learning (Goodfellow, 2016). In supervised learning, data are presented as labeled
samples consisting of inputs and corresponding outputs. The goal is to construct mapping rules from
the input to output. The convolutional neural network (CNN) and the recurrent neural network (RNN)
are two typical popular model architectures. Inspired by the human visual nervous system, CNNs excel
at image processing (Ravì et al., 2016 ; Saufi et al., 2019 ; Litjens et al., 2017), while an RNN can
process sequential data effectively. In unsupervised learning, the data are not labeled; instead the model
seeks previously undetected patterns in a dataset with no pre-existing labels and with minimal human
supervision (Geoffrey E Hinton, 1999). The generative adversarial network (GAN) is one of the most
promising unsupervised learning approaches. A GAN can produce good output through mutual game
learning of two (at least) modules in the framework: a generative model and a discriminative model.
Many modified or improved models have been derived based on these original DL models, such as the
region convolutional neural network (R-CNN) and long short-term memory (LSTM) models.
Figure 3 shows a comparison of traditional machine learning and DL. In DL, feature learning and
model construction are integrated into a single model via end-to-end optimization. In traditional
machine learning, feature extraction and model construction are performed separately, and each
Output
( object Class)
Visible layer
(pixels)
1st hidden layer
(edges)
2nd hidden layer
(Corners and contours)
3rd hidden layer
(object parts)
Fish
0.98
Shrimp
0.01
Weeds
0.01
module is constructed in a step-by-step manner.
(a) Machine learning
(b) Deep learning
Figure 3. Comparison of DL and machine learning
Compared with the shallow structure of traditional machine learning, the deep hierarchical
structure used in DL makes it easier to model nonlinear relationships through combinations of
functions (Liakos et al., 2018 ; Wang et al., 2018). The advantages of DL are especially obvious
when the amount of data to be processed is large. More specifically, the hierarchical learning and
extraction of different levels of complex data abstractions in DL provides a certain degree of
simplification for big data analytics tasks, especially when analyzing massive volumes of data,
performing data tagging, information retrieval, or conducting discriminative tasks such as
classification and prediction (Najafabadi et al., 2015). Hierarchical architecture learning systems have
achieved superior performances in several engineering applications (Poggio & Smale, 2003 ;
Mhaskar & Poggio, 2016).
The overall structure, process and principles of applying deep learning to fishery management is
depicted in Figure 4. After the data are collected and transmitted, deep learning performs inductive
analysis, learns the experience or knowledge from the samples, and finally formulates rules to guide
management decisions.
Input Feature selection +
manual extraction
Classifier with
shallow structure Output
Hand designed
Features
Feature learning +Classifier(End-to-end learning)
(a) Machine learning
(b) Deep learning
Output
Input
Input Feature selection +
manual extraction
Classifier with
shallow structure Output
Hand designed
Features
Feature learning +Classifier(End-to-end learning)
(a) Machine learning
(b) Deep learning
Output
Input
Figure 4. Deep-learning-enabled advanced analytics for smart fish farming
However, when applying deep learning, the most serious issue is that of hallucination. Another
failure mode of neural networks is overlearning or overfitting. In addition, neural networks can be
tricked into producing completely different outputs after imperceptible perturbations are applied to
their inputs (Belthangady & Royer, 2019 ; Moosavi-Dezfooli et al., 2016).
3. Applications of deep learning in smart fish farming
This review discussed 41 papers related to DL and smart fish farming. The relevant applications
can be divided into 6 categories: live fish identification, species classification, behavioral analysis,
feeding decisions, size or biomass estimation, and water quality prediction. Figure 5 shows the number
of papers related to each application. The most popular fields are live fish identification and species
classification. Notably, all these papers were published in 2016 or later, including 3 in 2016, 3 in 2017,
12 in 2018, 15 in 2019, and 8 in 2020 (through May 2020), indicating that DL has developed rapidly
since 2016. In addition to water quality prediction and sound recognition, most papers involve image
processing. Moreover, while most of the papers are focused on fish, a few works consider lobsters or
other aquatic animals.
Figure 5. Numbers of papers addressing different application scenarios
3.1 Live fish identification
Accurate and automatic live fish identification can provide data support for subsequent
production management; thus, fish identification is an important factor in the development of
intelligent breeding management equipment or systems. Machine vision has the advantages of enabling
long-term, nondestructive, noncontact observation at low cost (Zhou et al., 2018b ; Hartill et al.,
2020). However, the scenes encountered in aquaculture present numerous challenges for image and
video analysis. First, the image quality is easily affected by light, noise, and water turbidity, resulting
in relatively low resolution and contrast (Zhou et al., 2017a). Second, because fish swim freely and
are uncontrolled targets, their behavior may cause distortions, deformations, occlusion, overlapping
and other disadvantageous phenomena (Zhou et al., 2017b). Most current image analysis methods are
adversely affected by these difficulties (Qin et al., 2016 ; Sun et al., 2018).
While many studies have been conducted to investigate the above issues, most emphasized the
extraction of conventional low-level features, which usually involve small details in an image such as
feature points, colors, textures, contours, and shapes of interest (White et al., 2006 ; Yao & Odobez,
2007). In practical applications, the effects of methods based on such features is often unsatisfactory.
DL involves multilevel data representations, from low to high levels, in which high-level features are
built on the low-level features and carry rich semantic information that can be used to recognize and
detect targets or objects in the image. Generally, both types of features are used in convolutional neural
networks: the first few layers of learn the low-level features, and the last few layers learn the high-
level features. This approach has the potential to solve the problems listed above (Sun et al., 2018 ;
Zheng et al., 2017).
Table 1 shows the details of live fish identification using DL. CNNs can be used to extract features
from fish or shrimp images (Hu et al., 2020). By training on a public dataset with real images,
compared with SVM and Softmax, the CNN model identification accuracy improved by 15% and 10%,
respectively, making automatic recognition more accurate (Qin et al., 2016). Although the
aforementioned CNN architecture shows good performance, a CNN detects features using sliding
window, which can waste resources. To overcome the above challenges, a region-based CNN (R-CNN)
can be used to detect freely moving fish in an unconstrained underwater environment. An R-CNN
judges object locations by extracting multiple region proposals and then applying a CNN to only the
best candidate regions, which improves model efficiency (Girshick et al., 2014). The candidate fish-
containing regions can be generated via both fish motion information and from the raw image (Salman
et al., 2019). The advantage of R-CNN is that it improves the accuracy by at least 16% over a Gaussian
mixture model (GMM) on the FCS dataset.
Because classical CNNs are trained through supervised learning, their recognition capability
depends primarily on the quality of the training samples and their annotations (LeCun et al., 2015). A
semisupervised DL model can learn not only from labeled samples but also from unlabeled data. Thus,
a GAN can somewhat alleviate the challenges posed by a lack of labeled training data in practical
applications (Zhao et al., 2018b). Using a synthetic dataset, Mahmood et al. (2019) trained the You
Only Look Once (YOLO) v3 object detector to detect lobsters in challenging underwater images, thus
addressing a problem involving complex body shapes, partially accessible local environments, and
limited training data. In some cases, even when insufficient training data is available, a transfer
framework can be used to effectively learn the characteristics of underwater targets with the help of
data enhancement. Data enhancement improves the data quality by adjusting the contrast, entropy, and
other factors in images or it expands the number of samples via operations such as flipping, translation
or rotation. The increased variety and number of samples allow models to achieve higher accuracy
(Sun et al., 2018).
To meet the needs of some embedded systems, such as underwater drones, real-time performance
by DL models are the key to their practicability. It has been experimentally shown that using an
unmanned aerial vehicle (UAV)-type system to observe objects on the sea surface, a CNN can
effectively recognize a swarm of jellyfish, and can achieve reasonable performance levels (80%
accuracy) for real-world applications (Kim et al., 2016). After DL model training is complete, such
models can show excellent speed for live fish identification purposes. For example, one model required
only 6 s to identify 115 images (Meng et al., 2018); the average time to detect lionfish in each frame
was only 0.097 s (Naddaf-Sh et al., 2018). Therefore, under the premise of reasonable accuracy, a DL
model's recognition speed can satisfy real-time requirements (Villon et al., 2018). Hence, DL can be
effectively applied to identify fish while meeting the rapid response and real-time requirements of
embedded systems.
For identifying live fish, DL is mainly used to solve the problem of whether a given object is a
fish (Ahmad et al., 2016). In this era, where large amounts of visual data can be collected easily, DL
can be a practical machine vision solution. Therefore, it is worth studying the performance levels that
can be achieved by combining DL and machine vision to explore fast and accurate methods. The main
disadvantage of DL is that it requires a large amount of labeled training data, and obtaining and
annotating sufficiently large numbers of images is time-consuming and laborious. Moreover, the
recognition effect depends on the quality of the training samples and annotations.
Table 1 Live fish identification
Model
Frame
work
Data
Preprocessing
augmentation
Transfer
learning
Evaluation
index
Results
Comparisons with other
methods
1
Qin et al.
(2016)
CNN
Caffe
Fish4Knowledge (F4K)
dataset
Resize
Rotation
N
Accuracy
Accuracy: 98.64%
LDA+SVM: 80.14%;
Raw-pixel Softmax: 87.56%;
VLFeat Dense-SIFT: 93.56%
2
Zhao et
al.
(2018b)
DCGAN
Tensor
Flow
F4K dataset, Croatian fish
dataset
Image
segmentation
and
enhancement
N
Accuracy
Accuracy:
83.07%.
Accuracy: CNN: 72.09%, GAN
75.35%
3
Sun et al.
(2018)
CNN
Caffe
F4K dataset
Horizontal
mirroring,
crop
Y
Precision(P),
recall(R)
P: 99.68%; R:
99.45%
P: Gabor: 58.55%;
Dsift-Fisher: 83.37%; LDA:
80.14%; DeepFish: 90.10%; RGB-
Alex-SVM: 99.68%
4
Meng
et al.
(2018)
CNN
NA
4 kinds of fish and 100
images of every kind selected
from Google.
Blur, rotation
N
Accuracy,
speed
Accuracy: 87%
Speed: 115 f/6s.
Accuracy: AlexNet:87%;
GoogLeNet: 85%, LeNet: 67%
5
Naddaf-
Sh et al.
(2018)
CNN
NA
Videos collected with an
ROV camera; 1,500 images
were gathered from online
resources such as ImageNet,
Google and YouTube
Resize
N
True Positive,
False Positive,
speed
TPR:93%;
FPR:4%;
Speed: 0.097s/f
NA
6
Villon et
al. (2018)
CNN
Caffe
5 frames per second were
extracted, leading to a
database of 450,000 frames.
NA
N
Accuracy,
Speed
Accuracy94.9%,
each identification
took 0.06 s.
Average success rate:
Humans:89.3%
7
Kim et
al. (2016)
CNN
NA
The image set was obtained
using a UAV.
NA
N
TPR, FPR
TPR: 0.80
FPR: 0.04
NA
8
Salman
CNN
Tensor
F4K datasetLCF-15 dataset
NA
Y
Accuracy
F4K: 87.44%;
GMM71.01%;
et al.
(2019)
Flow
LCF-15: 80.02%
Optical flow: 56.13%;
R-CNN64.99%
9
Labao
and
Naval
(2019)
R-CNN
NA
10 underwater video
sequences for a total of 300
training frames
NA
N
Precision,
Recall, F-
Score
Accuracy
increased by
correction
mechanism
NA
10
Mahmoo
d et al.
(2019)
Yolo
Darkne
t
The dataset was generated
and synthesized by using the
ImageNet dataset
NA
N
Mean average
precision
The synthetic data
can achieve higher
performance than
the baseline.
NA
11
Guo et al.
(2019)
DRN
PyTorc
h
The dataset was composed of
908 negative and 907
positive samples
resize
N
accuracy
higher than 82%
12
Hu et al.
(2020)
CNN
Keras
16,138 samples were
collected from Google, and
self-shot videos.
Resized,
grayscale
N
Accuracy
95.48%
NA
13
Cao et al.
(2020)
CNN
Tensor
Flow
The video was acquired from
a crab-breeding operation in
Jiangsu province
image
denoising and
enhancement
N
precision
(AP)
AP: 99.01%
F1: 98.74%
APYOLOV393.73%Faster
RCNN99.05%F1: YOLOV3
92.47%Faster RCNN98.56%
HOG + SVM73.18%
3.2 Species classification
Fish are diverse, with more than 33,000 species (Oosting et al., 2019). In aquaculture, species
classification is helpful for yield prediction, production management, and ecosystem monitoring
(Alcaraz et al., 2015 ; dos Santos & Gonçalves, 2019). Fish species can usually be distinguished by
visual features such as size, shape, and color (dos Santos & Gonçalves, 2019 ; Hu et al., 2012).
However, due to changes in light intensity and fish motion as well as similarities in the shapes and
patterns among different species, accurate fish species classification is challenging.
DL models can learn unique visual characteristics of species that are not sensitive to
environmental changes and variations. Table 2 shows some details when using DL. Taking a given
underwater video as an example (Figure 6), an object detection module first generates a series of patch
proposals for each frame F. Each patch is then used as an input to the classifier, and a label distribution
vector is obtained. The tags with the highest probability are regarded as the tags of these patches (Sun
et al., 2018).
Figure 6. An illustration of the fish classification process
A DL model can better distinguish differences in characteristics, categories, and the environment,
which can be used to extract the features of target fish from an image collected in an unconstrained
underwater environment. Fish species can be classified to identify several basic morphological features
(i.e., the head region, body shape, and scales) (Rauf et al., 2019). Most of the DL models show better
results compared with the traditional approaches, reaching classification accuracies above 90% on the
LifeCLEF 14 and LifeCLEF 15 benchmark fish datasets (Ahmad et al., 2016). To avoid the need for
large amounts of annotated data, general deep structures must be fine-tuned to improve the
effectiveness with which they can identify the pertinent information in the feature space of interest.
Accordingly, various DL models for identifying fish species have been developed using a pretrained
approach called transfer learning (Siddiqui et al., 2017 ; Lu et al., 2019 ; Allken et al., 2019). By
fine-tuning pretrained models to perform fish classification using small-scale datasets, these
Class 1Feed
Class 2fish
Classifier
0.969
0.001
0
0.002
0.972
0
Class 1
Class 2
F1
F2
F1F2
Video Frame Class Label
approaches enable the network to learn the features of a target dataset accurately and comprehensively
(Qiu et al., 2018), and achieved sufficiently high accuracy to serve as economical and effective
alternatives to manual classification.
In addition to visual characteristics, different species of grouper produce different sound
frequencies that can be used to distinguish these species. For example, CNN and LSTM models were
used to classify sounds produced by four species of grouper; their resulting classification accuracy was
significantly better than the previous weighted mel-frequency cepstral coefficients (WMFCCs) method
(Ibrahim et al., 2018).
Nevertheless, due to the influence of various interferences and the small sets of available samples,
the accuracy of same-species classification still has considerable room to improve. Most current fish
classification methods are designed to distinguish fish with significant differences in body size or shape;
thus, the classification of similar fish and fish of the same species is still challenging (dos Santos &
Gonçalves, 2019).
Table 2 Species classification
Model
Frame
work
Data
Preprocessing
augmentation
Transfer
learning
Evaluation
index
Results
Comparisons with other
methods
1
Siddiqui
et al.
(2017)
CNN
MatCo
nvNet
Videos were collected
from several baited
remote underwater video
sampling programs
during 2011–2013.
Resized
Y
Accuracy
94.3%
SRC: 65.42%; CNN: 87.46%
2
Ahmad et
al. (2016)
CNN
NA
LifeCLEF14 and
LifeCLEF15 dataset
Resized and
converted to
grayscale.
N
Precision, and
Recall
AC>90%; each fish
image takes
approximately 1 ms
for classification.
SVM, KNN, SRC, PCA-
SVMPCA-KNNCNNSVM
CNN-KNN
3
Ibrahim
et al.
(2018)
LSTM
and
CNN
NA
The dataset contains
60,000 files, and the
audio duration of each
file is 20 s at a sampling
rate of 10 kHz.
NA
N
Accuracy
90%
WMFCC<90%
4
Qiu et al.
(2018)
CNN
NA
ImageNet dataset, F4K
dataset, a small-scale
fine-grained dataset (i.e.,
Croatian or QUT fish
dataset).
Super resolution,
Flip and rotation
Y
Accuracy
83.92%
B-CNNs: 83.52%;
B-CNNs+SE BLOCKS:
83.78%
5
Allken et
al. (2019)
CNN
Tensor
Flow
ImageNet classification
dataset and the images
collected by the Deep
Vision system; a total of
1,216,914 stereo image
pairs from 63 h 19 min of
Resized; Rotation,
translation,
shearing, flipping,
and zooming
Y
Accuracy
94%
NA
data collection.
6
Rauf et
al. (2019)
CNN
NA
Fish-Pak
Resize; Image
background
transparent
Y
Accuracy,
Precision,
Recall, F1-
Score
The proposed
method achieves
state of the art
performance and
outperforms
existing methods
VGG-16, one block VGG, two
block VGG, three block VGG,
LeNet-5, AlexNet, GoogleNet,
and ResNet-50
7
Lu et al.
(2019)
CNN
NA
A total of 16,517 fish
catching images were
provided by Fishery
Agency, Council of
Agriculture (Taiwan)
Resize; Horizontal
flipping, vertical
flipping, width
shifting, height
shift, rotation,
shearing, zoom-in,
and zoom-out
Y
Accuracy
> 96.24%.
NA
8
Jalal et
al. (2020)
YOLO,
CNN
Tensor
Flow
LCF15 datasheet and
UWA datasheet
NA
N
Accuracy
LCF15: 91.64%’
UWA: 79.8%
3.3 Behavioral analysis
Fish are sensitive to environmental changes, and they exhibit a series of responses to changes
environmental factors through behavioral changes (Saberioon et al., 2017 ; Mahesh et al., 2008). In
addition, behavior serves as an effective reference indicator for fish welfare and harvesting (Zion,
2012). Relevant behavior monitoring, especially for unusual behaviors, can provide a nondestructive
understanding and an early warning of fish status (Rillahan et al., 2011). Real-time monitoring of fish
behavior is essential in understanding their status and to facilitate capturing and feeding decisions
(Papadakis et al., 2012).
Fish display behavior through a series of actions that have a certain continuity and time
correlations. Methods of identifying an action from a single image will lose relevance for images
acquired before and after the action. Therefore, it is desirable to use time-series information related to
the prior and subsequent frames in a video to capture action relevance. DL methods have shown strong
ability to recognize visual patterns (Wang et al., 2017). Table 3 shows the details of the behavioral
analysis using DL. In particular, due to their powerful modeling capabilities for sequential data, RNNs
have the potential to address the above problem effectively (Schmidhuber, 2015). Zhao et al. (2018a)
proposed a novel method based on a modified motion influence map and an RNN to systematically
detect, localize and recognize unusual local behaviors of a fish school in intensive aquaculture.
Tracking individuals in a fish school is a challenging task that involves complex nonrigid
deformations, similar appearances, and frequent occlusions. Fish heads have relatively fixed shapes
and colors that can be used to track individual fish (Butail & Paley, 2011 ; Wang et al., 2012). Thus,
data associations can be achieved across frames, and as a result, behavior trajectory tracking can be
implemented without being affected by frequent occlusions (Wang et al., 2017). In addition, data
enhancement and iterative training methods can be used to optimize the accuracy of classification tasks
for identifying behaviors that cannot be distinguished by the human eye (Xu & Cheng, 2017). Finally,
idTracker and further developments in identification algorithms for unmarked animals have been
successful for 2~15 individuals in small groups (Pérez-Escudero et al., 2014). An improved algorithm,
called Idtracker.ai has also been proposed. Using two different CNNs, Idtracker.ai can track all the
individuals in both small and large groups (up to 100 individuals) with a recognition accuracy that
typically exceeds 99.9% (Romero-Ferrero et al., 2019).
When using deep learning to classify fish behavior, crossing, overlapping and blocking caused by
free-swimming fish (Zhao et al., 2018a ; Romero-Ferrero et al., 2019) and low-quality
environmental images (Zhou et al., 2019) form the main challenges to behavior analysis; thus, these
problems need to be solved in the future.
Table 3 Behavior analysis
Field
Model
Frame
work
Data
Preprocessing
augmentation
Transfer
learning
Evaluation
index
Results
Comparisons with
other methods
1
Xu and
Cheng
(2017)
CNN
MatCo
vNet
The head feature maps stored
in the segment in the
trajectory along with the
trajectory ID form the initial
training dataset.
Shifting,
horizontal and
vertical
rotation
N
Precision,
Recall, F1-
measure, MT,
ML,Fragments
, ID Switch
The proposed method
performs significantly well
on all metrics.
NA
2
Zhao et
al.
(2018a)
RNN
Tensor
Flow
The behavior dataset was
made manually following All
Occurrences Sampling.
NA
N
Accuracy
detection, localization and
recognition: 98.91%,
91.67% and 89.89%
Accuracy of OMIM
and OMIM less than
82.45%
3
Wang et
al. (2017)
CNN
MatCo
vNet
Randomly selected 300
frames from each of the 5
datasets and manually
annotated the head point in
each frame.
rotated
N
IR, Miss ratio,
Error ratio,
Precision,
recall, MT,
ML, Frag, IDS
The proposed method
outperforms two state-of-
the-art fish tracking
methods in terms of 7
performance metrics
idTracker
4
Romero-
Ferrero et
al. (2019)
CNN
NA
184 juvenile zebrafish, the
dataset comprised 3,312,000
uncompressed, grayscale,
labeled images.
extracts
‘blobs’, and
then oriented
Y
Accuracy
99.95%
NA
5
Li et al.
(2020)
CNN
Tensor
Flow
The image was collected
from a glass aquarium
Cut and
synthesis
N
Accuracy,
precision and
recall,
Accuracy:99.93%,
precision: 100%, recall:
99.86%
3.4 Size or biomass estimation
It is essential to continuously observe fish parameters such as abundance, quantity, size, and
weight when managing a fish farm (França Albuquerque et al., 2019). Quantitative estimation of fish
biomass forms the basis of scientific fishery management and conservation strategies for sustainable
fish production (Zion, 2012 ; Li et al., 2019 ; Saberioon & Císař, 2018 ; Lorenzen et al., 2016 ;
Melnychuk et al., 2017). However, it is difficult to estimate fish biomass without human intervention
because fish are sensitive and move freely within an environment where visibility, lighting and stability
are typically uncontrollable (Li et al., 2019).
Recent applications of DL to fishery science offer promising opportunities for massive sampling
in smart fish farming. Machine vision combined with DL can enable more accurate estimation of fish
morphological characteristics such as length, width, weight, and area. Most reported applications have
been either semisupervised or supervised (Marini et al., 2018 ; Díaz-Gil et al., 2017). For example,
the Mask R-CNN architecture was used to estimate the size of saithe (Pollachius virens), blue whiting
(Micromesistius poutassou), redfish (Sebastes spp.), Atlantic mackerel (Scomber scombrus), velvet
belly lanternshark (Etmopterus spinax), Norway pout (Trisopterus esmarkii), Atlantic herring (Clupea
harengus) (Garcia et al., 2019) and European hake (Álvarez-Ellacuría et al., 2019). Another method
for indirectly estimating fish size is to first detect the head and tail of fish with a DL model and then
calculate the length of fish on that basis. Although this approach increases the workload, it is suitable
for more complex images (Tseng et al., 2020). The structural characteristics and computational
capabilities of DL models can be fully exploited (Hu et al., 2014) to achieve superior performances
compared with other models. In addition, DL-based methods can eliminate the influence of fish
overlap during length estimation.
The number of fish shoals can also provide valuable input for the development of intelligent
systems. DL has shown comprehensive advantages in animal computing. To achieve automatic
counting of fish groups under high density and frequent occlusion characteristics, a fish distribution
map can be constructed using DL; then, the fish distribution, density and quantity can be obtained.
These values can indirectly reflect fish conditions such as starvation, abnormalities and other states,
thereby providing an important reference for feeding or harvest decisions (Zhang et al., 2020).
The age structure of a fish school is another important input to fishery assessment models. The
current method for determining fish school age structure relies on manual assessments of otolith age,
which is a labor-intensive and expertise-dependent process. Using a DL approach, target recognition
can instead be performed by using a pretrained CNN to estimate fish ages from otolith images. The
accuracy is equivalent to that achieved by human experts and considerably faster (Moen et al., 2018).
Optical imaging and sonar are often used to monitor fish biomass. A DL algorithm can be applied
to automatically learn the conversion relationship between sonar images and optical images, thus
allowing a "daytime" image to be generated from a sonar image and a corresponding night vision
camera image. This approach can be effectively used to count fish, among other applications
(Terayama et al., 2019).
Table 4 Size or biomass estimation
Model
Framew
ork
Data
Preprocessing
and
augmentation
Transfer
learning
Evaluation
index
Results
Comparisons with
other methods
1
Levy et
al.
(2018)
CNN
Keras
ILSVRC12 (Imagenet)
dataset
NA
Y
Accuracy
The method is robust and can
handle different types of data,
and copes well with the
unique challenges of marine
images.
YOLO network
topology
2
Terayam
a et al.
(2019)
GAN
NA
1,334 camera and sonar
image pairs from 10 min
of data at acquired at 3 fps
Resized;
normalized;
flipped
N
NA
The proposed model
successfully generates
realistic daytime images
from sonar and night camera
images.
NA
3
Moen et
al.
(2018)
CNN
TensorFl
ow
The dataset comprises
4,109 images of otolith
pairs and 657 images of
single otoliths, totaling
8,875 otoliths.
Rotated and
normalization
N
MSEMCV
Mean CV: 8.89%:
lowest MSE value: 2.65
Comparing
accuracy to human
experts, mean CV of
8.89%
4
Álvarez-
Ellacuría
et al.
(2019)
R-CNN
NA
COCO dataset; Photos
were obtained with a
single webcam,
resolution: 1,280×760.
NA
Y
Root-mean-
square
deviation
1.9 cm
NA
5
Zhang
et al.
(2020)
CNN
Keras
Data were collected from
the "Deep Blue No. 1" net
cage. The resolution is
1,920×1,080 and frame
Resized and
enhanced;
Gaussian noise
and salt-and-
N
Accuracy
Accuracy: 95.06%
CNN: 89.61%
MCNN: 91.18%
rate is 60 fps.
pepper noise were
added
6
Tseng
et al.
(2020)
CNN
Keras
9,000 fish images were
provided by Fisheries
Agency, Council of
Agriculture (Taiwan).
Another dataset of 154
fish images was acquired
at Nan-Fang-Ao fishing
harbor (Yilan, Taiwan).
Resized; Rotation,
horizontal and
vertical shifting,
horizontal and
vertical flipping,
and scaling
N
Accuracy
Accuracy: 98.78%
NA
7
Fernan
des et
al.
(2020)
CNN
The dataset with 1,653
fish images was acquired
using a Sony
DSCWX220 digital
camera,
NA
R2
R2: BW: 0.96, CW: 0.95
NA
3.5 Feeding decision-making
In intensive aquaculture, the feeding level of fish directly determines the production efficiency
and breeding cost (Chen et al., 2019). In actual production, the feed cost for some varieties of fish
accounts for more than 60% of the total cost (de Verdal et al., 2017 ; Føre et al., 2016 ; Wu et al.,
2015). Thus, unreasonable feeding will reduce production efficiency, while insufficient feeding will
affect fish growth. Excessive feeding also reduces the feed conversion efficiency, and the residual bait
will pollute the environment (Zhou et al., 2018a). Therefore, large economic benefits can be gained by
optimizing the feeding process (Zhou et al., 2018c). However, many factors affect fish feeding,
including physiological, nutritional, environmental and husbandry factors; consequently it is difficult
to detect the real needs of fish (Sun et al., 2016).
Traditionally, feeding decisions depend primarily on experience and simple timing controls (Liu
et al., 2014b). At present, most research on making feeding decisions using DL has focused mostly on
image analysis. By using machine vision, an improved feeding strategy can be developed in
accordance with fish behavior. Such a system can terminate the feeding process at more appropriate
times, thereby reducing unnecessary labor and improving fish welfare (Zhou et al., 2018a). The feeding
intensity of fish can also be roughly graded and used to guide feeding. A combination of CNN and
machine vison has proved to be an effective way to assess fish feeding intensity characteristics (Zhou
et al., 2019); the trained model accuracy was superior to that of two manually extracted feature
indicators: flocking index of fish feeding behavior (FIFFB) and snatch intensity of fish feeding
behavior (SIFFB) (Zhou et al., 2017b ; Chen et al., 2017). This method can be used to detect and
evaluate fish appetite to guide production practices. Due to recent advances in CNNs, it would be
interesting to consider the use of newer neural network frameworks for both spatial and motion feature
extraction. When combined with time-series information, such models may enable better feeding
decisions. Based on this idea, Måløy et al. (2019) considered both temporal and spatial flow by
combining a three-dimensional CNN (3D-CNN) and an RNN to form a new dual deep neural network.
The 3D-CNN and RNN were used to capture spatial and temporal sequence information, respectively,
thereby achieving recognition of both feeding and nonfeeding behaviors. A comparison showed that
the recognition results achieved with this dual-flow structure were better than those of either individual
CNN or RNN models.
The studies discussed above focused primarily on images. However, many factors affect fish
feeding (Sun et al., 2016); consequently, considering only images is insufficient. In the future,
additional data, such as environmental measurements and fish physiological data, will need to be
incorporated to achieve more reasonable feeding decisions.
Table 5 Feeding decisions
Model
Frame
work
Data
Preprocessing
augmentation
Transfer
learning
Results
Performance
comparison
1
Måløy
et al.
(2019)
RNN
Tensor
Flow
76 videos taken
at a resolution of
224×224 pixels
with RGB color
channels and at
24 f/sec.
NA
N
Accuracy
: 80%
NA
2
Zhou et
al.
(2019)
CNN
NA
Image was
collected from a
laboratory at 1
f/sec.
RST
N
Accuracy
:90%;
SVM: 73.75%;
BPNN: 81.25%;
FIFFB: 86.25%;
SIFFB: 83.75%
3.6 Water quality prediction
It is essential to be able to predict changes in water quality parameters to identify abnormal
phenomena, prevent disease, and reduce the corresponding risks to fish (Hu et al., 2015). In real-world
aquaculture, the water environment is characterized by many parameters that affect each other, causing
considerable inconvenience in the prediction process (Liu et al., 2014a). The traditional machine-
learning-based prediction models lack robustness when applied to big data, resulting in a general lack
of long-term modeling capability and generalizability, and they cannot fully reflect the essential
characteristics of the data (Liu et al., 2019 ; Ta & Wei, 2018). In contrast, DL offers good capabilities
in terms of nonlinear approximation, self-learning, and generalization. In recent years, prediction
methods based on DL have been widely used (Roux & Bengio, 2008).
Dissolved oxygen is one of the most important parameters and is important in intelligent
management and control in smart fish farming (Rahman et al., 2019). Due to the time lag between the
implementation of control measures for dissolved oxygen and their regulation effects, it is necessary
to predict future changes in dissolved oxygen to maintain a stable water quality (Ta & Wei, 2018). DL-
based models such as a CNN or a deep belief network (DBN) can extract the relationships between
quantitative water characteristics and water quality variables (Lin et al., 2018). Such models have been
used to predict water quality parameters for the intensive culturing of fish or shrimp. The results show
that the accuracy and stability of such models are sufficient to meet actual production needs (Ta & Wei,
2018).
However, most current methods have achieved good results only for short-term water quality
predictions. In recent years, scholars have paid increasing attention to longer-term predictions. The
key to long-term prediction is to extract the spatiotemporal relationships between water quality and
external factors. Therefore, spatiotemporal models such as LSTM networks and RNNs are quite
popular (Hu et al., 2019). For example, an attention-based RNN model can achieve a clear and
effective representation of time-space relationships and its learning ability is superior to that of other
methods for both short- and long-term predictions of dissolved oxygen (Liu et al., 2019). These models
can be continuously optimized during the prediction process to improve their prediction accuracies
(Deng et al., 2019).
The prediction of dissolved oxygen and other water quality parameters is closely related to time.
Attention-equipped, LSTM, DBN, and other DL models are able to mine the time sequence
information well and achieve satisfactory results. Therefore, how to use DL models to avoid or reduce
the negative impact of uncertainty factors on prediction results will be an important development
direction in water quality prediction tasks.
Table 6 Water quality prediction
Field
Model
Frame
work
Data
Preprocessing
augmentation
Transfer
learning
Evaluation
index
Results
Comparisons with other
methods
1
Ta and
Wei
(2018)
CNN
LSTM
Tensor
Flow
4,500 samples were
collected from Mingbo
Aquatic Products Co. Ltd.
NA
N
MSE
The accuracy and
stability are
sufficient to meet
actual demands.
BP (traditional BP, MSE = 0.04,
Holt-Winters α = 0.4, MSE = 0.06)
2
Liu et al.
(2019)
RNN
PyTorc
h
A total of 5,006 sets were
collected from a pond.
NA
N
RMSE,
MAPE, MAE
The attention-based
RNN can achieve
more accurate
prediction
SVR-linear, SVR-rbf, MLP,
LSTM, Encoder-decoder, Input-
Attn, DARNN, GeoMAN,
Temporal-Attn, Spatiotemporal
3
Lin et al.
(2018)
DBN
NA
708 water samples were
collected in twelve shrimp
culture ponds.
NA
N
RMSEWQI
Accuracy of model
is satisfied
NA
4
Hu et al.
(2019)
LSTM
Tensor
flow
Data collection was
achieved by deploying
sensor devices in a cage.
Data filling
and correction
N
Accuracy,
time cost
prediction accuracy:
pH: 95.76%;
temperature:
96.88%
The proposed method can achieve
a higher prediction accuracy and
lower time cost than the RNN-
based prediction model
5
Deng et
al.
(2019)
LSTM
NA
The data are three
representative shrimp
ponds
Data
normalization
N
Accuracy
DopLstm achieves
the highest accuracy
CF, AR, NN, SVM, and GM
6
Ren et al.
(2020)
DBN
NA
Sensors were set up to
collect data collect every
10 min with a result of
12,700 instances of data.
VMD algorithm
N
R2
0.9336
Bagging: 0.9014; Adaboost:
0.9262; Decision tree: 0.9189;
CNN: 0.8811
4. Technical details and overall performance
The data and algorithms used are the two main elements of AI (Thrall et al., 2018). These elements
are all necessary conditions for AI to achieve success.
4.1 Data
In DL, an annotated dataset is critical to ensure a model’s performance (Zhuang et al., 2019).
However, in practice, dataset construction is often affected by issues related to both quantity and
quality. Before any images or specific features can be used as the input to a DL model, some effort is
usually necessary to prepare the images through preprocessing and/or augmentation. The most
common preprocessing procedure is to adjust the image size to meet the requirements of the DL model
being applied (Sun et al., 2018 ; Siddiqui et al., 2017). In addition, the learning process can be
facilitated by highlighting the regions of interest (Wang et al., 2017 ; Zhao et al., 2018b), or by
performing background subtraction, foreground pixel extraction, image denoising enhancement (Qin
et al., 2016 ; Zhao et al., 2018b ; Siddiqui et al., 2017) and other steps to simplify image annotation.
Additionally, some related studies have applied data augmentation techniques to artificially
increase the number of training samples. Data augmentation can be used to generate new labeled data
from existing labeled data through rotation, translation, transposition, and other methods (Meng et al.,
2018 ; Xu & Cheng, 2017). These additional data can help to improve the overall learning process;
and such data augmentation is particularly important for training DL models on datasets that contain
only small numbers of images (Kamilaris & Prenafeta-Boldú, 2018).
In addition, to avoid being constrained by the limited availability of annotation data, some
scholars have directly used pretrained DL models to conduct fish classification, thus avoiding the need
to acquire a large volume of annotated data (Ahmad et al., 2016). However, this approach has many
limitations, such as negative transfer (Pan & Yang, 2010), learning or not learning from holistic images
(Sun et al., 2019), and is consequently difficult to implement satisfactorily for specific applications;
hence, it is typically suitable only for theoretical algorithm research.
4.2 Algorithms
(1) Models. From a technical point of view, various CNN models are still the most popular (29
papers, 71%). However, 2 of the papers reviewed here use a GAN, 3 use an RNN, 2 use an LSTM, 2
use both an LSTM and a CNN, and 2 papers use a DBN and YOLO, respectively. Some CNN models
are combined with output-layer classifiers, such as SVM and Softmax (Qin et al., 2016 ; Sun et al.,
2018) or Softmax (Zhao et al., 2018b ; Naddaf-Sh et al., 2018) classifiers.
(2) Frameworks. Caffe and TensorFlow are the most popular frameworks. One possible reason
for the widespread use of Caffe is that it includes a pretrained model that is easy to fine-tune using
transfer learning (Bahrampour et al., 2015). Whether used for specific commercial applications or
experimental research, the combination of DL and transfer learning helps to reduce the need for a large
amount of data while saving significant training time (Erickson et al., 2017). In addition, a variety of
other DL frameworks and datasets exist that users can use easily. In particular, because of its strong
support for graphical processing unit (GPU), the PyTorch framework has been used extensively in
relatively recent literature (Ketkar, 2017 ; Liu et al., 2019).
In fact, much of the research reviewed here (9/41) uses transfer learning (Siddiqui et al., 2017 ;
Levy et al., 2018 ; Sun et al., 2018), which involves using existing knowledge from related tasks or
fields to improve model learning efficiency. The most common transfer learning technique is to use
pretrained DL models that have been trained on related datasets with different categories. These models
are then adapted to the specific challenges and datasets (Lu et al., 2015). Figure 7 shows a typical
example of transfer learning. First, the network is trained on the source task with the labeled dataset.
Then, the trained parameters of the model are transferred to the target tasks (Sun et al., 2018 ; Oquab
et al., 2014).
Figure 7. Typical example of transfer learning
(3) Model inputs. Although some studies use fish audio and water quality data, most of the model
inputs are images (34, 83%). This situation reflects the significant advantage offered by DL in data
processing, especially image processing. The inputs include public datasets such as the ImageNet
dataset, the Fish4Knowledge (F4K) dataset, and the Croatian and Queensland University of
Technology (QUT) fish datasets. Other datasets include data collected and produced in the field or
obtained through Internet search engines, such as Google (Meng et al., 2018 ; Naddaf-Sh et al., 2018).
Combining optical sensors and machine vision with DL systems provides possibilities for developing
faster, cheaper and noninvasive methods for in situ monitoring and post-harvesting quality monitoring
in aquaculture (Saberioon et al., 2017). However, whether these datasets consist of text, audio, or
image/video data, they typically hold large volumes of data. Such large amounts of data are particularly
important when the problem to be solved is complex or when the difference between adjacent classes
is small.
(4) Model outputs. Among the models used for classification, the outputs range from 4 to 16
classes. For example, one study considers images of 16 species of fish, and another considers 4 types
of fish sound files. Among the other papers, 13 targeted live fish recognition where the outputs were
fish and nonfish; 7 were size or biomass estimations; 2 were quantifications of fish feeding intensity;
6 were water quality predictions; and 5 were behavior analyses. However, from a technical point of
view, the boundaries for identification, classification, and biomass estimation based on these
classification models are quite vague. In these papers, the output and input classes for each model are
the same. Each output consists of a set of probabilities that each input belongs to each class, and the
model finally selects the class with the highest output probability for each input as the predicted class
of that input.
4.3 Performance evaluation indexes and overall performance
4.3.1 Performance evaluation indexes
A variety of model performance evaluation indexes used in the literature are listed in Table 7.
Most recognition and classification studies use common machine learning evaluation indicators such
as accuracy and precision (Siddiqui et al., 2017 ; Qin et al., 2016). In behavior trajectory tracking,
indicators such as the miss ratio (MR) are used (Wang et al., 2017 ; Xu & Cheng, 2017). When water
quality prediction is performed, additional indicators such as the mean absolute percentage error
(MAPE) and root mean square error (RMSE) are used (Liu et al., 2019). Moreover, a program's
running speed is also an important performance indicator, especially when high real-time performance
is required (Villon et al., 2018 ; Zhou et al., 2017a).
Because of the differences in the models, raw data, hardware operating environments, and
parameters used in different studies, it is unscientific to compare different models based on only one
parameter (Tripathi & Maktedar, 2019). However, in general, most of the studies in which the accuracy
is used as a performance evaluation index report values above 90%, some even reach almost 100%
(Banan et al., 2020 ; Romero-Ferrero et al., 2019), indicating that these method perform well. Among
the papers using precision and recall as evaluation indexes, the highest results to date are 99.68% and
99.45%, respectively, which illustrates the advantages of DL models.
Table 7 Performance evaluation indexes for DL models
Performance
evaluation index
Description
Accuracy
Accuracy is the ratio of the number of correctly predicted fish to the total
number of predicted samples.
Precision
The ratio of correctly identified fish to the ground truth.
Recall
The ratio of correctly identified fish to the total identified objects.
Speed
The running time of the algorithm.
Intersection-
over-Union
IOU is the overlap rate between candidate area and ground truth area. The
ideal scenario is complete overlap (i.e., the ratio is unity).
(IOU)
False positive
rate (FPR)
FPR is the proportion of negative instances divided into positive classes
to all negative instances.
Mean Squared
Error (MSE)
The mean squared error is the expected value of the square of the
difference between the parameter estimate and the true value.
Mean Coefficient
of Variation
(MCV)
The ratio of the standard deviation to the mean. The MCV reflects the
degree of dispersion of two sets of data.
Mostly Tracked
Trajectories
(MT)
Percentage of ground truth which are correctly tracked more than 80% in
length. Larger values are better
Mostly Lost
Trajectories
(ML)
Percentage of ground truth instances correctly tracked at less than 20% of
their length. Smaller values are better.
Fragments (Frag)
Percentage of trajectories correctly tracked at less than 80% but at more
than 20% of their length.
ID Switch
Average total number of times that a resulting trajectory switches its
matched ground truth identity with another trajectory, the smaller the
better.
Miss ratio (MR)
Percentage of fish that are undetected in all frames.
Error ratio
Percentage of wrongly detected fish in all frames.
Root Mean
Square Error
(RMSE)
RMSE is the square root of MSE.
F1-measure
The harmonic mean of precision and recall.
4.3.2 Performance comparisons with other approaches
An important aspect of this review is to consider comparisons between DL and other existing
approaches. However, most DL methods are related to image analysis, 7 DL models have been
proposed based on water quality and audio data. These studies show that DL can handle a variety of
data types in smart fish farming rather than only images. In general, a DL model can be considered
better than other compared models only with regard to the same dataset and the same task.
When performing fish identification tasks, CNN models show an accuracy 18.5% higher than that
of SVM models (Qin et al., 2016), a precision 41.13% higher than that of Gabor filters and other
similar feature extraction methods, and a precision 19.54% higher than that of linear discriminant
analysis (LDA) and manual extraction (Sun et al., 2018). In addition, a CNN model has been shown
to be superior to id.Tracker (Wang et al., 2017). Compared with the accuracy achievable by human
experts (89.3%), the accuracy of a CNN model was been shown to be superior (95.7%) (Villon et al.,
2018). When estimating the age of the fish population in Moen et al. (2018), a CNN also showed better
performance than human experts. The achieved mean coefficient of variation (CV) was 8.89%, which
is considerably lower than the reported mean CV of human readings. This may be due to the
availability of datasets in these areas, as well as to the unique characteristics of fish and other
background features.
Compared with a backpropagation neural network (BPNN), a CNN model was measured to be
6.25% more accurate in feed intensity classification. The model evaluation index of this CNN model
also improved compared with those of traditional manual feature extraction methods, such as FIFFB
and SIFFB (Zhou et al., 2017b ; Zhou et al., 2019). Furthermore, the results of water quality
prediction indicate that LSTM and attention-based RNN models achieve higher accuracy than has been
achieved with a BPNN model, Holt-Winters forecasting, or a support vector regression (SVR) model
based on either a linear function kernel (SVR-linear) or a radial basis function kernel (SVR-RBF) (Ta
& Wei, 2018 ; Liu et al., 2019).
In addition, GAN models typically achieve better overall performances in fish recognition
compared with a CNN (Zhao et al., 2018b).
5. Discussion
5.1. Advantages of deep learning
The key advantage of DL in aquaculture is that DL models perform better than do the traditional
methods. This may be because traditional machine learning algorithms require the manual feature
extraction from images. Manually selecting features is a laborious, heuristic approach, and the effect
is highly dependent on both luck and experience (Mohanty et al., 2016). In contrast, a DL algorithm
can automatically learn and extract the essential features from images in a sample dataset. Such
algorithms offer high accuracy and strong stability for irregular target recognition in complex
environments (Daoliang & Jianhua, 2018), and they can effectively learn mappings and correlations
between a sample and objects from that sample. In addition, useful features can be learned
automatically using a general-purpose learning procedure (LeCun et al., 2015).
For example, in fish recognition, a DL model can effectively extract essential fish features. Such
models have shown strong stability under challenging conditions such as low light and high noise, and
they perform better than do traditional artificial feature extraction methods (Sun et al., 2018). In
behavioral analysis research, a DL model can effectively address problems related to occlusion (Wang
et al., 2017). In addition, a DL model can be used not only to monitor unknown objects or anomalies
but also to predict parameters such as water quality.
Although DL models require more computing power and longer training times than do traditional
methods (such as the SVM and random forest methods), after training is complete, the trained DL
models are highly efficient at performing test tasks. For example, in a fish recognition study (Villon et
al., 2018), using for 900,000 images, the training process lasted 8 days on a computer with 64 GB of
RAM, an i7 CPU @3.50 GHz, and a Titan X GPU card. However, after training was complete, the
recognition time for each frame was only 0.06 s. In a study by Ahmad et al. (2016), training the CNN
model required 5~6 h without a GPU implementation. However, during testing, each fish image
required only approximately 1 ms for classification, making this model fully compatible with real-time
processing needs.
5.2. Disadvantages and limitations of deep learning
At present, DL technology is still in a weak AI stage (Lu et al., 2018). While weak AI systems
can simulate the functions of the mind though a computationally system; however, they cannot yet
artificially recreate a mind (Di Nucci & McHugh, 2006). The ability of DL models to constantly learn
and improve is still very weak. In smart fish farming, DL models are used only as “black boxes”.
Because DL models are excessively reliant on sample data and have low interpretability, they can
typically gain experience only from a specific dataset. Moreover, when faced with unbalanced training
data, most models will tend to ignore some important features (Zhang & Zhu, 2018).
(1) Incomplete data. One of the most significant drawbacks of DL is the large amount of data
required during training. For example, when using DL to identify fry size, not only are there many
kinds of fish but their body shape and posture of each growth stage are also quite different, which
necessitates high requirements for data collection and DL training. However, in traditional fisheries,
no such datasets exist, or the available datasets are not sufficiently comprehensive. Thus, in this initial
stage, much basic data collection work remains to be done.
Although data augmentation technology can be used to add some labeled samples to an existing
dataset, when dealing with complex problems (e.g., multiclass problems with high precision
requirements), more diversified training data are needed to improve accuracy (Patrício & Rieder, 2018).
Because data annotation is a necessary operation in most cases, some complex tasks require experts to
annotate data, and such expert volunteers are prone to make mistakes during data annotation, especially
for challenging tasks such as fish species identification (Hanbury, 2008 ; Bhagat & Choudhary, 2018).
Furthermore, data preprocessing is often a necessary and time-consuming task in DL, whether for
image or text data (Choi et al., 2018). In addition, some existing datasets do not fully represent the
problems toward which they are oriented. Finally, in the field of smart fish farming, researchers do not
have access to many publicly available datasets; thus, in many cases, they need to develop custom
image sets, which can take hours or days of work.
(2) High cost. Whether the people involved are AI technicians or farmers, a large number of
sensors need to be deployed when collecting data, making the up-capital cost investment in the early
stage substantial. Another limitation is that DL models demand high levels of computing power; in
fact the available common CPUs are typically unable to meet the requirements of DL (Shi et al., 2016).
Instead, GPUs and tensor processing units (TPUs) are the mainstream sources of computing power
suitable for DL; consequently, the hardware requirements are very high, and the cost is also quite high
(Wei & Brooks, 2019). In the absence of expected results, it is more difficult to persuade farmers to
invest in the intelligent breeding industry, which is true in every country.
5.3. Future technical trends of deep learning in smart fish farming
(1) The applications of DL in aquaculture will continue to expand or emerge. Various existing
applications of DL in smart fish farming are covered in this review. The current application fields
include live fish identification, species classification, behavioral analysis, feeding decisions, size or
biomass estimation, and water quality prediction. Other possible application areas with great potential
include fish disease diagnosis, aquatic product quality safety control and traceability, although no
relevant research has been reported to date. For example, tools that automatically diagnose fish
diseases and provide reasonable suggestions for managing identified diseases are expected to be an
important application area, especially in relation to image processing.
(2) Available dataset is becoming increasingly important. Datasets are an increasingly dominant
concern in DL sometimes even more important than algorithms. With the improved transparency of
aquaculture information and the establishment of open fishery databases, researchers will be able to
access a broader variety of sample data more easily. Although the number of publicly available datasets
is still small, Appendix A lists some datasets that are freely available for download. Researchers can
use these datasets to test their DL models or to pretrain DL models and then adapt them to more specific
future challenges. Due to the limited dataset availability and the difficulty of collecting real data,
methods of improving the recognition rate from small numbers of samples represents an inevitable
direction for future research. Transfer learning can be used to ameliorate the problem of insufficient
sample data. Additionally, the necessary preprocessing and augmentation of datasets will become
increasingly important.
(3) More advanced and complex models will continue to improve the performance of deep
learning tasks. A combination model can be used to solve many of the problems faced by single models;
as a result, more complex architectures will emerge. All types of DL models and classifiers as well as
handcrafted features can be combined to improve the overall results. CNNs are widely used, but they
consider each frame independently and ignore the time correlations between adjacent frames.
Therefore, it is necessary to consider models that can account for spatiotemporal sequences. It is
expected that in the future, more methods similar to LSTM networks or other RNN models will be
adopted to achieve higher classification or prediction performances that capitalize on the time
dimension. Examples of such applications include estimating fish growth based on previous
continuous observations, assessing fish water demands, developing measures to avoid disease, and fish
behavior analysis. Such models can also be applied in environmental studies to predict changes in
water quality. Finally, some of the solutions discussed in this paper may become commercially
available soon.
6. Conclusion
This paper conducted a deep and comprehensive investigation of the current applications of deep
learning (DL) for smart fish farming. Based on a review of the recent literature, the current applications
can be divided into six categories: live fish identification, species classification, behavioral analysis,
feeding decisions, size or biomass estimation, and water quality prediction. The technical details of the
reported methods were comprehensively analyzed in accordance with the key elements of artificial
intelligence (AI): data and algorithms. Performance comparisons with traditional methods based on
manually extracted features indicate that the greatest contribution of DL is its ability to automatically
extract features. Moreover, DL can also output high-precision processing results. However, at present,
DL technology is still in a weak AI stage and requires a large amount of labeled data for training. This
requirement has become a bottleneck restricting further applications of DL in smart fish farming.
Nevertheless, DL still offers breakthroughs for processing text, images, video, sound and other data,
all of which can provide strong support for the implementation of smart fish farming. In the future, DL
is also expected to expand into new application areas, such as fish disease diagnosis; data will become
increasingly important; and composite models and models that consider spatiotemporal sequences will
represent the main research direction. In brief, our purpose in writing this review was to provide
researchers and practitioners with a better understanding of the current applications of DL in smart fish
farming and to facilitate the application of DL technology to solve practical problems in aquaculture.
Acknowledgments
The research was supported by the National Key Technology R&D Program of China
(2019YFD0901004), the Youth Research Fund of Beijing Academy of Agricultural and Forestry
Sciences (QNJJ202014), and the Beijing Excellent Talents Development Project
(2017000057592G125).
References
Ahmad S, Ahsan J, Faisal S, Ajmal M, Mark S, James S, Euan H (2016) Fish species classification in unconstrained
underwater environments based on deep learning.
Limnol. Oceanogr. Methods,
14, 570-585.
Alcaraz C, Gholami Z, Esmaeili HR, García-Berthou E (2015) Herbivory and seasonal changes in diet of a highly
endemic cyprinodontid fish (Aphanius farsicus).
Environ. Biol. Fishes,
98, 1541-1554.
Allken V, Handegard NO, Rosen S, Schreyeck T, Mahiout T, Malde K (2019) Fish species identification using a
convolutional neural network trained on synthetic data.
ICES J. Mar. Sci.,
76, 342-349.
Álvarez-Ellacuría A, Palmer M, Catalán IA, Lisani J-L (2019) Image-based, unsupervised estimation of fish size from
commercial landings using deep learning.
ICES J. Mar. Sci.
Bahrampour S, Ramakrishnan N, Schott L, Shah MJCS (2015) Comparative Study of Deep Learning Software
Frameworks.
arXiv preprint arXiv:1511.06435, 2015.
Banan A, Nasiri A, Taheri-Garavand A (2020) Deep learning-based appearance features extraction for automated
carp species identification.
Aquacult. Eng.,
89, 102053.
Belthangady C, Royer LA (2019) Applications, promises, and pitfalls of deep learning for fluorescence image
reconstruction.
Nat. Methods,
16, 1215-1225.
Bhagat PK, Choudhary P (2018) Image annotation: Then and now.
Image Vision Comput.,
80, 1-23.
Boom BJ, Huang X, He J, Fisher RB (2012) Supporting Ground-Truth annotation of image datasets using clustering.
In:
Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012)
. IEEE, Tsukuba,
Japan
,
pp. 1542-1545.
Bradley D, Merrifield M, Miller KM, Lomonico S, Wilson JR, Gleason MG (2019) Opportunities to improve fisheries
management through innovative technology and advanced data systems.
Fish Fish.,
20, 564-583.
Bronstein MM, Bruna J, LeCun Y, Szlam A, Vandergheynst P (2017) Geometric deep learning: going beyond
euclidean data.
ISPM,
34, 18-42.
Butail S, Paley DA (2011) Three-dimensional reconstruction of the fast-start swimming kinematics of densely
schooling fish.
J R Soc Interface,
9, 77-88.
Cao S, Zhao D, Liu X, Sun Y (2020) Real-time robust detector for underwater live crabs based on deep learning.
Comput. Electron. Agric.,
172, 105339.
Chen C, Du Y, Zhou C, Sun C (2017) Evaluation of feeding activity of fishes based on image texture.
Transactions of
the Chinese Society of Agricultural Engineering,
33, 232-237.
Chen L, Yang X, Sun C, Wang Y, Xu D, Zhou C (2019) Feed intake prediction model for group fish using the MEA -
BP neural network in intensive aquaculture.
Information Processing in Agriculture
.
Choi K, Fazekas G, Sandler M, Cho K (2018) A comparison of audio signal preprocessing methods for deep neural
networks on music tagging. In:
2018 26th European Signal Processing Conference (EUSIPCO)
. IEEE, Rome,
Italy
,
pp. 1870-1874.
Clavelle T, Lester SE, Gentry R, Froehlich HE (2019) Interactions and management for the future of marine
aquaculture and capture fisheries.
Fish Fish.,
20, 368-388.
Cortes C, Vapnik V (1995) Support-vector networks.
MLear,
20, 273-297.
Daoliang L, Jianhua B (2018) Research progress on key technologies of underwater operation robot for aquaculture.
Transactions of the Chinese Society of Agricultural Engineering,
36, 1-9.
de Verdal H, Komen H, Quillet E, Chatain B, Allal F, Benzie JAH, Vandeputte M (2017) Improving feed efficiency in
fish using selective breeding: a review.
Reviews in Aquaculture,
10, 833-851.
Deng H, Peng L, Zhang J, Tang C, Fang H, Liu H (2019) An intelligent aerator algorithm inspired-by deep learning.
Mathematical Biosciences and Engineering,
16, 2990-3002.
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In:
2009 IEEE conference on computer vision and pattern recognition
. Ieee, Miami, FL, USA
,
pp. 248-255.
Deng L, Yu D (2014) Deep learning: methods and applications.
Foundations Trends® in Signal Processing,
7, 197-
387.
Dhiman C, Vishwakarma DK (2019) A review of state-of-the-art techniques for abnormal human activity recognition.
Eng. Appl. Artif. Intell.,
77, 21-45.
Di Nucci E, McHugh C (2006)
Content, Consciousness, and Perception: Essays in Contemporary Philosophy of Mind,
Cambridge Scholars Press.
Díaz-Gil C, Smee SL, Cotgrove L, Follana-Berná G, Hinz H, Marti-Puig P, Grau A, Palmer M, Catalán IA (2017) Using
stereoscopic video cameras to evaluate seagrass meadows nursery function in the Mediterranean.
Mar.
Biol.,
164, 137.
dos Santos AA, Gonçalves WN (2019) Improving Pantanal fish species recognition through taxonomic ranks in
convolutional neural networks.
Ecol Inform,
53, 100977.
Erickson BJ, Korfiatis P, Akkus Z, Kline T, Philbrick K (2017) Toolkits and Libraries for Deep Learning.
J. Digit. Imaging,
30, 400-405.
FAO (2018) The State of World Fisheries and Aquaculture 2018Meeting the sustainable development goals. FAO
Rome, Italy.
Fernandes AFA, Turra EM, de Alvarenga ÉR, Passafaro TL, Lopes FB, Alves GFO, Singh V, Rosa GJM (2020) Deep
Learning image segmentation for extraction of fish body measurements and prediction of body weight
and carcass traits in Nile tilapia.
Comput. Electron. Agric.,
170, 105274.
Føre M, Alver M, Alfredsen JA, Marafioti G, Senneset G, Birkevold J, Willumsen FV, Lange G, Espmark Å, Terjesen BF
(2016) Modelling growth performance and feeding behaviour of Atlantic salmon (Salmo salar L.) in
commercial-size aquaculture net pens: Model details and validation through full-scale experiments.
Aquaculture,
464, 268-278.
França Albuquerque PL, Garcia V, da Silva Oliveira A, Lewandowski T, Detweiler C, Gonçalves AB, Costa CS, Naka
MH, Pistori H (2019) Automatic live fingerlings counting using computer vision.
Comput. Electron. Agric.,
167, 105015.
Garcia R, Prados R, Quintana J, Tempelaar A, Gracias N, Rosen S, Vågstøl H, Løvall K (2019) Automatic segmentation
of fish using deep learning with application to fish size measurement.
ICES J. Mar. Sci.
Geoffrey E Hinton TJS (1999)
Unsupervised Learning: Foundations of Neural Computation
.
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and seman tic
segmentation. In:
Proceedings of the IEEE conference on computer vision and pattern recognition,
pp.
580-587.
Goodfellow IaB, Yoshua and Courville, Aaron (2016)
Deep learning,
MIT press.
Gouiaa R, Meunier J (2017) Learning cast shadow appearance for human posture recognition.
Pattern Recog. Lett.,
97, 54-60.
Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, Venugopalan S, Widner K, Madams T, Cuadros
J, Kim R, Raman R, Nelson PC, Mega JL, Webster R (2016) Development and Validation of a Deep Learning
Algorithm for Detection of Diabetic Retinopathy in Retinal Fundus Photographs.
JAMA-J. Am. Med. Assoc.,
316, 2402-2410.
Guo X, Zhao X, Liu Y, Li D (2019) Underwater sea cucumber identification via deep residual networks.
Information
Processing in Agriculture,
6, 307-315.
Hanbury A (2008) A survey of methods for image annotation.
J. Vis. Lang. Comput.,
19, 617-627.
Hartill BW, Taylor SM, Keller K, Weltersbach MS (2020) Digital camera monitoring of recreational fishing effort:
Applications and challenges.
Fish Fish.,
21, 204-215.
Hassoun MH (1996) Fundamentals of Artificial Neural Networks.
Proc. IEEE,
10, 906.
Hu H, Wen Y, Chua T, Li X (2014) Toward Scalable Systems for Big Data Analytics: A Technology Tutorial.
IEEE Access,
2, 652-687.
Hu J, Li D, Duan Q, Han Y, Chen G, Si X (2012) Fish species classification by color, texture and multi-class support
vector machine using computer vision.
Comput. Electron. Agric.,
88, 133-140.
Hu J, Wang J, Zhang X, Fu Z (2015) Research status and development trends of information technologies in
aquacultures.
Transactions of the Chinese Society for Agricultural Machinery,
46, 251-263.
Hu WC, Wu HT, Zhang YF, Zhang SH, Lo CH (2020) Shrimp recognition using ShrimpNet based on convolutional
neural network.
J. Ambient Intell. Humaniz. Comput.
, 8.
Hu Z, Zhang Y, Zhao Y, Xie M, Zhong J, Tu Z, Liu J (2019) A Water Quality Prediction Method Based on the Deep
LSTM Network Considering Correlation in Smart Mariculture.
SeAcA,
19.
Ibrahim AK, Zhuang HQ, Cherubin LM, Scharer-Umpierre MT, Erdol N (2018) Automatic classification of grouper
species by their sounds using deep neural networks.
J. Acoust. Soc. Am.,
144, EL196-EL202.
Jäger J, Simon M, Denzler J, Wolff V (2015) Croatian Fish Dataset: Fine-grained classification of fish species in their
natural habitat. In:
Machine Vision of Animals and their Behaviour (MVAB 2015),
pp. 6.1-6.7.
Jalal A, Salman A, Mian A, Shortis M, Shafait F (2020) Fish detection and species classification in underwater
environments using deep learning with temporal information.
Ecol Inform,
57, 101088.
Jolliffe I (1987) Principal component analysis.
Chemometrics Intellig. Lab. Syst.,
2, 37-52.
Kamilaris A, Prenafeta-Boldú FX (2018) Deep learning in agriculture: A survey.
Comput. Electron. Agric.,
147, 70-
90.
Ketkar N (2017) Introduction to PyTorch. In:
Deep Learning with Python: A Hands-on Introduction
. Apress, Berkeley,
CA
,
pp. 195-208.
Kim H, Koo J, Kim D, Jung S, Shin J-U, Lee S, Myung H (2016) Image-Based Monitoring of Jellyfish Using Deep
Learning Architecture.
IEEE Sens. J.,
16, 2215-2216.
Labao AB, Naval PC (2019) Cascaded deep network systems with linked ensemble components for underwater fish
detection in the wild.
Ecol Inform,
52, 103-121.
LeCun Y, Bengio Y, Hinton G (2015) Deep learning.
Nature,
521, 436.
Levy D, Belfer Y, Osherov E, Bigal E, Scheinin AP, Nativ H, DanTchernov, Treibitz T (2018) Automated Analysis of
Marine Video With Limited Data. In:
The IEEE Conference on Computer Vision and Pattern Recognition
(CVPR)
pp. 1385-1393.
Li D, Hao Y, Duan Y (2019) Nonintrusive methods for biomass estimation in aquaculture with emphasis on fish: a
review.
Li H (2018) Deep learning for natural language processing: advantages and challenges.
National Science Review,
5,
24-26.
Li J, Xu C, Jiang LX, Xiao Y, Deng LM, Han ZZ (2020) Detection and Analysis of Behavior Trajectory for Sea Cucumbers
Based on Deep Learning.
Ieee Access,
8, 18832-18840.
Liakos KG, Busato P, Moshou D, Pearson S, Bochtis D (2018) Machine Learning in Agriculture: A Review.
Sensors,
18, 2674.
Lin Q, Yang W, Zheng C, Lu KH, Zheng ZM, Wang JP, Zhu JY (2018) Deep-learning based approach for forecast of
water quality in intensive shrimp ponds.
Indian J. Fish.,
65, 75-80.
Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, van der Laak JAWM, van Ginneken B, Sánchez CI
(2017) A survey on deep learning in medical image analysis.
Med. Image Anal.,
42, 60-88.
Liu S, Xu L, Jiang Y, Li D, Chen Y, Li Z (2014a) A hybrid WACPSO-LSSVR model for dissolved oxygen content
prediction in crab culture.
Eng. Appl. Artif. Intell.,
29, 114-124.
Liu Y, Zhang Q, Song L, Chen Y (2019) Attention-based recurrent neural networks for accurate short-term and
long-term dissolved oxygen prediction.
Comput. Electron. Agric.,
165, 104964.
Liu Z, Li X, Fan L, Lu H, Liu L, Liu Y (2014b) Measuring feeding activity of fish in RAS using computer vision.
Aquacult.
Eng.,
60, 20-27.
Lorenzen K, Cowx IG, Entsua-Mensah R, Lester NP, Koehn J, Randall R, So N, Bonar SA, Bunnell DB, Venturelli P,
Bower SD, Cooke SJ (2016) Stock assessment in inland fisheries: a foundation for sustainable use and
conservation.
Rev. Fish Biol. Fish.,
26, 405-440.
Lu H, Li Y, Chen M, Kim H, Serikawa S (2018) Brain Intelligence: Go beyond Artificial Intelligence.
Mobile Networks
Applications,
23, 368-375.
Lu J, Behbood V, Hao P, Zuo H, Xue S, Zhang GQ (2015) Transfer learning using computational intelligence: A
survey.
Knowledge-Based Syst.,
80, 14-23.
Lu Y, Tung C, Kuo Y (2019) Identifying the species of harvested tuna and billfish using deep convolutional neural
networks.
ICES J. Mar. Sci.
Mahesh S, Manickavasagan A, Jayas DS, Paliwal J, White NDG (2008) Feasibility of near-infrared hyperspectral
imaging to differentiate Canadian wheat classes.
Biosys. Eng.,
101, 50-57.
Mahmood A, Bennamoun M, An S, Sohel F, Boussaid F, Hovey R, Kendrick G (2019) Automatic detection of Western
rock lobster using synthetic data.
ICES J. Mar. Sci.
Måløy H, Aamodt A, Misimi E (2019) A spatio-temporal recurrent network for salmon feeding action recognition
from underwater videos in aquaculture.
Comput. Electron. Agric.
, 105087.
Mao B, Han LG, Feng Q, Yin YC (2019) Subsurface velocity inversion from deep learning-based data assimilation.
JAG,
167, 172-179.
Marini S, Fanelli E, Sbragaglia V, Azzurro E, Del Rio Fernandez J, Aguzzi J (2018) Tracking Fish Abundance by
Underwater Image Recognition.
Sci. Rep.,
8, 13748.
Melnychuk MC, Peterson E, Elliott M, Hilborn R (2017) Fisheries management impacts on target species status.
Proceedings of the National Academy of Sciences,
114, 178-183.
Meng L, Hirayama T, Oyanagi S (2018) Underwater-Drone With Panoramic Camera for Automatic Fish Recognition
Based on Deep Learning.
Ieee Access,
6, 17880-17886.
Merino G, Barange M, Blanchard JL, Harle J, Holmes R, Allen I, Allison EH, Badjeck MC, Dulvy NK, Holt J (2012) Can
marine fisheries and aquaculture meet fish demand from a growing human population in a changing
climate?
Global Environ. Change,
22, 795-806.
Mhaskar HN, Poggio T (2016) Deep vs. shallow networks: An approximation theory perspective.
Analysis and
Applications,
14, 829-848.
Min S, Lee B, Yoon S (2017) Deep learning in bioinformatics.
Brief. Bioinform.,
18, 851-869.
Moen E, Handegard NO, Allken V, Albert OT, Harbitz A, Malde K (2018) Automatic interpretation of otoliths using
deep learning.
PLoS ONE,
13, 14.
Mohanty SP, Hughes DP, Salathé M (2016) Using Deep Learning for Image-Based Plant Disease Detection.
Frontiers
in plant science,
7.
Moosavi-Dezfooli S-M, Fawzi A, Frossard P (2016) Deepfool: a simple and accurate method to fool deep neural
networks. In:
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
pp. 2574-2582.
Naddaf-Sh MM, Myler H, Zargarzadeh H (2018) Design and Implementation of an Assistive Real-Time Red Lionfish
Detection System for AUV/ROVs.
Complexity
, 10.
Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications
and challenges in big data analytics.
Journal of Big Data,
2, 1.
Olyaie E, Abyaneh HZ, Mehr AD (2017) A comparative analysis among computational intelligence techniques for
dissolved oxygen prediction in Delaware River.
Geoscience Frontiers,
8, 517-527.
Oosting T, Star B, Barrett JH, Wellenreuther M, Ritchie PA, Rawlence NJ (2019) Unlocking the potential of ancient
fish DNA in the genomic era.
evolutionary applications,
12, 1513-1522.
Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using
convolutional neural networks. In:
Proceedings of the IEEE conference on computer vision and pattern
recognition,
pp. 1717-1724.
Pan SJ, Yang Q (2010) A Survey on Transfer Learning.
IEEE Transactions on Knowledge and Data Engineering,
22,
1345-1359.
Papadakis VM, Papadakis IE, Lamprianidou F, Glaropoulos A, Kentouri M (2012) A computer-vision system and
methodology for the analysis of fish behavior.
Aquacult. Eng.,
46, 53-59.
Patrício DI, Rieder R (2018) Computer vision and artificial intelligence in precision agriculture for grain crops: A
systematic review.
Comput. Electron. Agric.,
153, 69-81.
Pérez-Escudero A, Vicente-Page J, Hinz RC, Arganda S, de Polavieja GG (2014) idTracker: tracking individuals in a
group by automatic identification of unmarked animals.
Nat. Methods,
11, 743-748.
Poggio T, Smale S (2003) The mathematics of learning: Dealing with data.
2005 International Conference on Neural
Networks and Brain,
50, 537-544.
Qin H, LI X, Liang J, Peng Y, Zhang C (2016) DeepFish: Accurate underwater live fish recognition with a deep
architecture.
Neurocomputing,
187, 49-58.
Qiu C, Zhang S, Wang C, Yu Z, Zheng H, Zheng B (2018) Improving Transfer Learning and Squeeze- and-Excitation
Networks for Small-Scale Fine-Grained Fish Image Classification.
IEEE Access,
6, 78503-78512.
Quinlan JR (1986) Induction of decision trees.
Machine learning,
1, 81-106.
Rahman A, Dabrowski J, McCulloch J (2019) Dissolved oxygen prediction in prawn ponds from a group of one step
predictors.
Information Processing in Agriculture
.
Rauf HT, Lali MIU, Zahoor S, Shah SZH, Rehman AU, Bukhari SAC (2019) Visual features based automated
identification of fish species using deep convolutional neural networks.
Comput. Electron. Agric.
, 105075.
Ravì D, Wong C, Deligianni F, Berthelot M, Andreu-Perez J, Lo B, Yang G-Z (2016) Deep learning for health
informatics.
IEEE Journal of Biomedical and Health Informatics,
21, 4-21.
Ren Q, Wang X, Li W, Wei Y, An D (2020) Research of dissolved oxygen prediction in recirculating aquaculture
systems based on deep belief network.
Aquacult. Eng.,
90, 102085.
Rillahan C, Chambers MD, Howell WH, Watson WH (2011) The behavior of cod (Gadus morhua) in an offshore
aquaculture net pen.
Aquaculture,
310, 361-368.
Romero-Ferrero F, Bergomi MG, Hinz RC, Heras FJH, de Polavieja GG (2019) idtracker.ai: tracking all individuals in
small or large collectives of unmarked animals.
Nat. Methods,
16, 179-182.
Roux NL, Bengio Y (2008) Representational power of restricted boltzmann machines and deep belief networks
Neural Comput.,
20, 1631-1649.
Saberioon M, Císař P (2018) Automated within tank fish mass estimation using infrared reflection system.
Computers electronics in agriculture,
150, 484-492.
Saberioon M, Gholizadeh A, Cisar P, Pautsina A, Urban J (2017) Application of machine vision systems in aquaculture
with emphasis on fish: state-of-the-art and key issues.
Reviews in Aquaculture,
9, 369-387.
Salman A, Siddiqui SA, Shafait F, Mian A, Shortis MR, Khurshid K, Ulges A, Schwanecke U (2019) Automatic fish
detection in underwater videos by a deep neural network-based hybrid motion learning system.
ICES J.
Mar. Sci.
Samuel AL (1959) Some Studies in Machine Learning Using the Game of Checkers.
IBM J. Res. Dev.,
3, 210-229.
Saufi SR, Ahmad ZAB, Leong MS, Lim MH (2019) Challenges and Opportunities of Deep Learning Models for
Machinery Fault Detection and Diagnosis: A Review.
IEEE Access,
7, 122644-122662.
Schmidhuber J (2015) Deep learning in neural networks: An overview.
Neural Networks,
61, 85-117.
Shahriar MS, McCulluch J (2014) A dynamic data-driven decision support for aquaculture farm closure.
Procedia
Computer Science,
29, 1236-1245.
Shi S, Wang Q, Xu P, Chu X (2016) Benchmarking state-of-the-art deep learning software tools. In:
2016 7th
International Conference on Cloud Computing and Big Data (CCBD)
. IEEE
,
pp. 99-104.
Siddiqui SA, Salman A, Malik MI, Shafait F, Mian A, Shortis MR, Harvey ES (2017) Automatic fish species classification
in underwater videos: exploiting pre-trained deep neural network models to compensate for limited
labelled data.
ICES J. Mar. Sci.,
75, 374-389.
Sun M, Hassan SG, Li D (2016) Models for estimating feed intake in aquaculture: A review.
Comput. Electron. Agric.,
127, 425-438.
Sun R, Zhu X, Wu C, Huang C, Shi J, Ma L (2019) Not All Areas Are Equal: Transfer Learning for Semantic
Segmentation via Hierarchical Region Selection. In:
2019 IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR),
pp. 4355-4364.
Sun X, Shi J, Liu L, Dong J, Plant C, Wang X, Zhou H (2018) Transferring deep knowledge for object recognition in
Low-quality underwater videos.
Neurocomputing,
275, 897-908.
Ta X, Wei Y (2018) Research on a dissolved oxygen prediction method for recirculating aquaculture systems based
on a convolution neural network.
Comput. Electron. Agric.,
145, 302-310.
Terayama K, Shin K, Mizuno K, Tsuda K (2019) Integration of sonar and optical camera images using deep neural
network for fish monitoring.
Aquacult. Eng.,
86, 102000.
Thrall JH, Li X, Li Q, Cruz C, Do S, Dreyer K, Brink J (2018) Artificial intelligence and machine learning in radiology:
opportunities, challenges, pitfalls, and criteria for success.
Journal of the American College of Radiology,
15, 504-508.
Tripathi MK, Maktedar DD (2019) A role of computer vision in fruits and vegetables among various horticulture
products of agriculture fields: A survey.
Information Processing in Agriculture
.
Tseng C-H, Hsieh C-L, Kuo Y-F (2020) Automatic measurement of the body length of harvested fish using
convolutional neural networks.
Biosys. Eng.,
189, 36-47.
Villon S, Mouillot D, Chaumont M, Darling ES, Subsol G, Claverie T, Villeger S (2018) A Deep learning method for
accurate and fast identification of coral reef fishes in underwater images.
Ecol Inform,
48, 238-244.
Wang J, Ma Y, Zhang L, Gao RX, Wu D (2018) Deep learning for smart manufacturing: Methods and applications.
Journal of Manufacturing Systems,
48, 144-156.
Wang SH, Zhao JW, Chen YQ (2017) Robust tracking of fish schools using CNN for head identification.
Multimedia
Tools and Applications,
76, 23679-23697.
Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-end text recognition with convolutional neural networks. In:
Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012)
. IEEE
,
pp. 3304-3308.
Wei G-Y, Brooks D (2019) Benchmarking tpu, gpu, and cpu platforms for deep learning.
arXiv preprint
arXiv:1907.10701
.
White DJ, Svellingen C, Strachan NJC (2006) Automated measurement of species and length of fish by computer
vision.
Fisheries Research,
80, 203-210.
Wu T-H, Huang Y-I, Chen J-M (2015) Development of an adaptive neural-based fuzzy inference system for feeding
decision-making assessment in silver perch (Bidyanus bidyanus) culture.
Aquacult. Eng.,
66, 41-51.
Xu Z, Cheng XE (2017) Zebrafish tracking using convolutional neural networks.
Sci. Rep.,
7, 42815.
Yang Q, Xiao D, Lin S (2018) Feeding behavior recognition for group-housed pigs with the Faster R-CNN.
Comput.
Electron. Agric.,
155, 453-460.
Yao J, Odobez J (2007) Multi-Layer Background Subtraction Based on Color and Texture. In:
2007 IEEE Conference
on Computer Vision and Pattern Recognition,
pp. 1-8.
Zeiler MD, Fergus R (2014) Visualizing and Understanding Convolutional Networks. Springer International
Publishing, Cham
,
pp. 818-833.
Zhang QS, Zhu SC (2018) Visual interpretability for deep learning: a survey.
Front. Inform. Technol. Elect. Eng.,
19,
27-39.
Zhang S, Yang X, Wang Y, Zhao Z, Liu J, Liu Y, Sun C, Zhou C (2020) Automatic fish population counting by machine
vision and a hybrid deep neural network model.
Animals,
10, 364.
Zhao J, Bao W, Zhang F, Zhu S, Liu Y, Lu H, Shen M, Ye Z (2018a) Modified motion influence map and recurrent
neural network-based monitoring of the local unusual behaviors for fish school in intensive aquaculture.
Aquaculture,
493, 165-175.
Zhao J, Li Y, Zhang F, Zhu S, Liu Y, Lu H, Ye Z (2018b) Semi-Supervised Learning-Based Live Fish Identification in
Aquaculture Using Modified Deep Convolutional Generative Adversarial Networks.
Transactions of the
ASABE,
61, 699-710.
Zheng L, Yang Y, Tian Q (2017) SIFT meets CNN: A decade survey of instance retrieval.
IEEE transactions on pattern
analysis machine intelligence,
40, 1224-1244.
Zhou C, Lin K, Xu D, Chen L, Guo Q, Sun C, Yang X (2018a) Near infrared computer vision and neuro-fuzzy model-
based feeding decision system for fish in aquaculture.
Comput. Electron. Agric.,
146, 114-124.
Zhou C, Sun C, Lin K, Xu D, Guo Q, Chen L, Yang X (2018b) Handling Water Reflections for Computer Vision in
Aquaculture.
Transactions of the ASABE,
61, 469-479.
Zhou C, Xu D, Chen L, Zhang S, Sun C, Yang X, Wang Y (2019) Evaluation of fish feeding intensity in aquaculture
using a convolutional neural network and machine vision.
Aquaculture,
507, 457-465.
Zhou C, Xu D, Lin K, Sun C, Yang X (2018c) Intelligent feeding control methods in aquaculture with an emphasis on
fish: a review.
Reviews in aquaculture,
10, 975-993.
Zhou C, Yang X, Zhang B, Lin K, Xu D, Guo Q, Sun C (2017a) An adaptive image enhancement method for a
recirculating aquaculture system.
Sci. Rep.,
7, 6243.
Zhou C, Zhang B, Lin K, Xu D, Chen C, Yang X, Sun C (2017b) Near-infrared imaging to quantify the feeding behavior
of fish in aquaculture.
Comput. Electron. Agric.,
135, 233-241.
Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, Xiong H, He Q (2019) A Comprehensive Survey on Transfer Learning.
arXiv preprint arXiv:1911.02685.
Zion B (2012) The use of computer vision technologies in aquaculture A review.
Comput. Electron. Agric.,
88, 125-
132.
Appendix A: Public dataset containing fish
NO
Dataset
URL
Description
References
1
Fish4-
Knowledge
http://groups.inf.ed.ac.uk/f4k
/index.html
This underwater live fish dataset was acquired from a live video dataset captured in the open
sea. It contains a total of 27,370 verified fish images in 23 clusters. Each cluster is represents
a single species.
Boom et al.
(2012)
2
Croatian fish
dataset
http://www.inf-cv.uni-
jena.de/fine_grained_recogn
ition.html#datasets
This dataset contains 794 images of 12 different fish species collected in the Adriatic sea in
Croatia. All the images show fishes in real-world situations recorded by high definition
cameras.
Jäger et al.
(2015)
3
LifeCLEF14
and
LifeCLEF15
dataset
http://www.imageclef.org/
The LCF-14 dataset for fish contains approximately 1,000 videos. Labels are provided for
approximately 20,000 detected fish in the videos. A total of 10 different fish species are
included in this dataset. LifeCLEF 2015 (LCF-15) was taken from Fish4Knowledge. LCF-
15 consists of 93 underwater videos covering 15 species and provides 9,000 annotations
with species labels.
Ahmad et al.
(2016)
4
Fish-Pak
https://doi.org/10.17632/n3y
dw29sbz.3#folder-
6b024354-bae3-460aa758-
352685ba0e38
This is a dataset consisting of images of 6 different fish species i.e., Catla (Thala),
Hypophthalmichthys molitrix (Silver carp), Labeo rohita (Rohu), Cirrhinus mrigala (Mori),
Cyprinus carpio (Common carp) and Ctenopharyngodon idella (Grass carp).
Rauf et al.
(2019)
5
ImageNet
http://www.image-net.org/
ImageNet is an image database organized according to the WordNet hierarchy (currently
only nouns), in which each node in the hierarchy is associated with hundreds or thousands
of images. ImageNet currently has an average of over five hundred images per node.
Deng et al.
(2009)
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
In intensive aquaculture, the number of fish in a shoal can provide valuable input for the development of intelligent production management systems. However, the traditional artificial sampling method is not only time consuming and laborious, but also may put pressure on the fish. To solve the above problems, this paper proposes an automatic fish counting method based on a hybrid neural network model to realize the real-time, accurate, objective, and lossless counting of fish population in far offshore salmon mariculture. A multi-column convolution neural network (MCNN) is used as the front end to capture the feature information of different receptive fields. Convolution kernels of different sizes are used to adapt to the changes in angle, shape, and size caused by the motion of fish. Simultaneously, a wider and deeper dilated convolution neural network (DCNN) is used as the back end to reduce the loss of spatial structure information during network transmission. Finally, a hybrid neural network model is constructed. The experimental results show that the counting accuracy of the proposed hybrid neural network model is up to 95.06%, and the Pearson correlation coefficient between the estimation and the ground truth is 0.99. Compared with CNN- and MCNN-based methods, the accuracy and other evaluation indices are also improved. Therefore, the proposed method can provide an essential reference for feeding and other breeding operations.
Article
Full-text available
Shrimp is a world’s important trade goods with high economic value and also one of the most important sources of animal protein. Considering the costs of calculation and hardware, this paper presents a convolutional neural network (CNN) architecture (named as ShrimpNet) to obtain shrimp recognition. The proposed ShrimpNet is an important part of the intelligent shrimp aquaculture which is great helpful for the shrimp aquaculture. The proposed ShrimpNet includes two CNN layers and two fully-connected layers. The collected data set includes six different categories of shrimp that are used to train and test the performance of proposed ShrimpNet. Experimental results show that the proposed ShrimpNet has 95.48% accuracy in shrimp recognition. Therefore, the proposed ShrimpNet is a useful tool with good performance for shrimp recognition.
Article
Full-text available
The motion trajectory of sea cucumbers reflects the behavior of sea cucumbers, and the behavior of sea cucumbers reflects the status of the feeding and individual health, which provides the important information for the culture, status detection and early disease warning. Different from the traditional manual observation and sensor-based automatic detection methods, this paper proposes a detection, location and analysis approach of behavior trajectory based on Faster R-CNN for sea cucumbers under the deep learning framework. The designed detection system consists of a RGB camera to collect the sea cucumbers’ images and a corresponding sea cucumber identification software. The experimental results show that the proposed approach can accurately detect and locate sea cucumbers. According to the experimental results, the following conclusions are drawn: (1) Sea cucumbers have an adaptation time for the new environment. When sea cucumbers enter a new environment, the adaptation time is about 30 minutes. Sea cucumbers hardly move within 30 minutes and begin to move after about 30 minutes. (2) Sea cucumbers have the negative phototaxis and prefers to move in the shadows. (3) Sea cucumbers have a tendency to the edge. They like to move along the edge of the aquarium. When the sea cucumber is in the middle of the aquarium, the sea cucumber will look for the edge of the aquarium. (4) Sea cucumbers have unidirectional topotaxis. They move along the same direction with the initial motion direction. The proposed approach will be extended to the detection and behavioral analysis of the other marine organisms in the marine ranching.
Article
Recirculating aquaculture has received more and more attention because of its high efficiency of treatment and recycling of aquaculture wastewater. The content of dissolved oxygen is an important indicator of control in recirculating aquaculture, its content and dynamic changes have great impact on the healthy growth of fish. However, changes of dissolved oxygen content are affected by many factors, and there is an obvious time lag between control regulation and effects of dissolved oxygen. To ensure the aquaculture production safety, it is necessary to predict the dissolved oxygen content in advance. The prediction model based on deep belief network has been proposed in this paper to realize the dissolved oxygen content prediction. A variational mode decomposition (VMD) data processing method has been adopted to evaluate the original data space, it takes the data which has been decomposed by the VMD as the input of deep belief network (DBN) to realize the prediction. The VMD method can effectively separate and denoise the raw data, highlight the relations among data features, and effectively improve the quality of the neural network input. The proposed model can quickly and accurately predict the dissolved oxygen content in time series, and the prediction performance meets the needs of actual production. When compared with bagging, AdaBoost, decision tree and convolutional neural network, the VMD-DBN model produces higher prediction accuracy and stability.
Article
It is important for marine scientists and conservationists to frequently estimate the relative abundance of fish species in their habitats and monitor changes in their populations. As opposed to laborious manual sampling, various automatic computer-based fish sampling solutions in underwater videos have been presented. However, an optimal solution for automatic fish detection and species classification does not exist. This is mainly because of the challenges present in underwater videos due to environmental variations in luminosity, fish camouflage, dynamic backgrounds, water murkiness, low resolution, shape deformations of swimming fish, and subtle variations between some fish species. To overcome these challenges, we propose a hybrid solution to combine optical flow and Gaussian mixture models with YOLO deep neural network, an unified approach to detect and classify fish in unconstrained underwater videos. YOLO based object detection system are originally employed to capture only the static and clearly visible fish instances. We eliminate this limitation of YOLO to enable it to detect freely moving fish, camouflaged in the background, using temporal information acquired via Gaussian mixture models and optical flow. We evaluated the proposed system on two underwater video datasets i.e., the LifeCLEF 2015 benchmark from the Fish4Knowledge repository and a dataset collected by The University of Western Australia (UWA). We achieve fish detection F-scores of 95.47% and 91.2%, while fish species classification accuracies of 91.64% and 79.8% on both datasets respectively. To our knowledge, these are the best reported results on these datasets, which show the effectiveness of our proposed approach.
Article
Image analysis technology has drawn dramatic attention and developed rapidly because it enables a non-extractive and non-destructive approach to data acquisition of crab aquaculture. Owing to the irregular shape, multi-scale posture and special underwater environment, it is very challenging to adopt the traditional image recognition methods to detect crabs quickly and effectively. Consequently, we propose a real-time and robust object detector, Faster MSSDLite, for detecting underwater live crabs. Lightweight MobileNetV2 is selected as the backbone of a single shot multi-box detector (SSD), and standard convolution is replaced by depthwise separable convolution in the prediction layers. A feature pyramid network (FPN) is adopted at low extra cost to improve the detection precision of multi-scale crabs and make up for the deficiency of SSD to force different network layers to learn the same features. More significantly, the unified quantized convolutional neural network (Quantized-CNN) framework is applied to quantify the error correction of the improved detector for further accelerating the computation of convolutional layers and compressing the parameters of fully-connected layers. The test results show that Faster MSSDLite has better performance than traditional SSD. The average precision (AP) and F1 score of detection are 99.01% and 98.94%, respectively. The detection speed can reach 74.07 frames per second in commonly configured microcomputers (~8× faster than SSD). The computation amount of floating-point numbers required by the detection is reduced to only 0.32 billion (~49× smaller than SSD), and the size of the model is compressed into 4.84 MB (~28× smaller than SSD). The model is also more robust, which can stably detect underwater live crabs in real-time, estimate the live crab biomass in water bodies automatically, and provide reliable feedback information for the fine feeding of automatic feeding boats.
Article
Individual measurement of traits of interest is extremely important in aquaculture, both for production systems and for breeding programs. Most of the current methods are based on manual measurements, which are laborious and stressful to the animals. Therefore, the development of fast, precise and indirect measurement methods for traits such as body weight (BW) and carcass weight (CW) is of great interest. An appealing way to take noninvasive measurements is through computer vision. Hence, the objectives in the current work were to: (1) devise a computer vision system (CVS) for autonomous measurement of Nile tilapia body area (A), length, height, and eccentricity, and (2) develop linear models for prediction of fish BW, CW, and carcass yield (CY). Images from 1653 fish were taken at the same time as their BW and CW were measured. A set of 822 images had pixels labeled into three classes: background, fish fins, and A. This labeled dataset was then used for training of Deep Learning Networks for automatic segmentation of the images into those pixel classes. In a subsequent step, the segmentations obtained from the best network were used for extraction of A, length, height, and eccentricity. These variables were then used as covariates in linear models for prediction of BW, CW, and CY. A network with an input image of 0.2 times the original size and four encoder/decoder layers achieved the best results for intersection over union on the test set of 99, 90 and 64 percent for background, fish body and fin areas, respectively. The overall best predictive model included A and its square as predictor variables and achieved R² of 0.96 and 0.95 for fish BW and CW, respectively. Overall, the devised CVS was able to correctly differentiate fish body from background and fins, and the extracted area of the fish body could be successfully used for prediction of body and carcass weights.
Article
Fish species identification is vital for aquaculture and fishery industries, stock management of water bodies and environmental monitoring of aquatics. Traditional fish species identification approaches are costly, time consuming, expert-based and unsuitable for large-scale applications. Hence, in this study, a deep learning neural network as a smart, real-time and non-destructive method was developed and applied to automate the identification of four economically important carp species namely common carp (Cyprinus carpio), grass carp (Ctenopharingodon idella), bighead carp (Hypophtalmichthys nobilis) and silver carp (Hypophthalmichthys molitrix). The obtained results proved that our approach, evaluated through 5-fold cross-validation, achieved the highest possible accuracy of 100 %. The achieved high level of classification accuracy was due to the ability of the suggested deep model to build a hierarchy of self-learned features, which was in accordance with the hierarchy of these fish’s identification keys. In conclusion, the proposed convolutional neural network (CNN)-based method has a single and generic trained architecture with promising performance for fish species identification.
Article
Body lengths of harvested fish are key indices for marine resource management. Some fisheries management organisations require fishing vessels to report the lengths of harvested fish. Conventionally, body lengths of fish are measured manually using rulers or tape measures. Such methods are, however, time consuming, labour intensive, and subjective. This study proposes an automated method to determine the snout-to-fork length of a fish in complex images. In this approach, images of fish bodies and colour plates with a known dimension were acquired. A convolutional neural network (CNN) classifier was then developed to detect the regions of fish head, tail fork, and colour plate in the images. Snout and fork points of the fish were next determined in the fish head and tail fork regions, respectively, using image processing. Fish body length was subsequently estimated as the distance between the snout and fork points using the pixel-to-distance ratio obtained from the colour plate. The developed CNN classifier reached an accuracy of 98.78% in detecting the regions of fish head, fish fork, and colour plate. The proposed approach reached a mean absolute error and a mean absolute relative error of 5.36 cm and 4.26%, respectively, in estimating the body length of fish.