ArticlePDF Available

Recognition method for aggressive behavior of group pigs based on deep learning

Authors:

Abstract

Pigs like to fight with each other to form a hierarchy relationship in groups. Aggressive behaviors, mostly fighting, are frequently found in intensive pig raising facilities. Strong aggressive behaviors can cause other pigs lack of food and water, growing slowly, wounds, sick and even dead in serious situation. This considerably reduces health and welfare of pigs and further decreases economic benefits of pig industries. Monitoring and recognizing aggressive behaviors among pig group is the first step to manage the aggressive behaviors in group pigs effectively. Traditional human recording method is time-consuming and labor-intensive. This method can't be used 24 hours a day, 7 days a week. Machine vision technique brings an automatic monitoring method to solve this problem. In this paper, we introduced a new method for aggressive behaviors monitoring based on deep learning. The experiments were held under controlled environments, which were achieved in an environment-controlled chamber designed previously. The details of the chamber were depicted in a published paper written by our research group. Nursery pigs were fed under three different concentration levels of NH3 gas, which were <3.80 mg/m3, 15.18 mg/m3, 37.95 mg/m3, with a suitable temperature of around 27 ℃ and the comfortable humidity between 50%-70%. Each nursery group had six pigs and were weight around 9.6 kg. During each 28 days' experiment of three concentration levels of NH3, videos were taken from the top of the chamber. An end-to-end network, named 3D CONVNet, was proposed for aggressive behavior recognition of group pigs in this paper, which based on a C3D network and built with 3D convolution kernels. The network structure of the 3D CONVNet was improved in both width and depth dimensions. The number of main convolutional layers was increased to 19, extra batch normalization and dropout layers were added to deepen the network. Furthermore, the multi-scale feature fusion method was introduced to widen the network. This improvement had bettered the performance of the algorithm considerably. To train the 3D CONVNet, 380 aggressive (14 074 frames) and 360 none-aggressive videos (13 040 frames) were chosen from experimental videos recording in experiments of two concertation levels. These videos were randomly divided into training set and validation set, and the ratio of each set is 3:1. Another 556 aggressive videos and 510 none-aggressive videos from the three experimental batches were chosen to build the testing set. There was no overlap among training set, validation set, and testing set. Results showed a total of 981 videos, including aggressive and non-aggressive behaviors, was correctly recognized from the whole 1066 testing videos. The precision of the 3D CONVNet was proved to be 92.03% on testing set. Among them, the precision, recall rate and F1-Score for aggressive behaviors were 94.86%, 89.57%, and 92.14%, respectively. The precision for different NH3 concentration experimental levels were 94.29%, 89.44%, and 85.91%, respectively, which showed the generalization performance of the 3D CONVNet. With the similar heat environments, the 3D CONVNet also showed the good performances under different illumination condition. The comparison with C3D,C3D_1 (19 layers) and C3D_2 (BN) networks resulted in 95.7% on validation set, 43.27 percent higher than the C3D network. The recognition on single image using the 3D CONVNet was only 0.5 s, which was much faster than the other three networks. Therefore, the 3D CONVNet was effective and robust in aggressive behavior recognition among group pigs. The algorithm provides a new method and technique for aggressive behavior auto-monitoring of group pigs and helps improve establishment of auto-monitoring system in pig farms and manage level of pig industry.
35 23 Vol.35 No.23
192 2019 12 Transactions of the Chinese Society of Agricultural Engineering Dec. 2019
群养猪侵略性行为的深度学习识别方法
1,2,陈 1,廖慧敏 1,雷明 2,3,黎 1,2,李 1,罗俊杰 1
1. 华中农业大学工学院,武汉 4300702. 生猪健康养殖协同创新中心,武汉 430070
3. 华中农业大学动物科技学院动物医学院,武汉 430070
要:为了解决因传统机器视觉和图像处理方法的局限性以及复杂的猪体姿态和猪舍环境导致对群养猪侵略性行为识
别的有效性、准确率较低的问题,该文基于深度学习的方法,提出使用 3D CONV 的群养猪侵略性行为识别算法-
3DConvNet 3个批次采集 18 9.6 kg 左右的大白仔猪视频图像,选用第一批次中包含 28 d 内各个时段的撕咬、撞击、
追逐、踩踏 4大类,咬耳、咬尾、咬身、头撞头、头撞身、追逐以及踩踏 7小类侵略性行为以及吃食、饮水、休息等非
侵略性行为共计 740 段(27 114 帧)视频作为训练集和验证集,训练集和验证集比例为 3:1。结果表明,3D ConvNet
络模型在训练集上的识别准确度达 96.78%在验证集上识别准确度达 95.70%该文算法模型对于不同训练集批次的猪只
以及不良照明条件下依然能准确识别侵略性行为,算法模型泛化性能良好。 C3D 模型进行对比,该文提出的网络模型
准确率高出 43.27 个百分点,单帧图像处理时间为 0.50 s可满足实时检测的要求。研究结果可为猪场养殖环境中针对猪
只侵略性行为检测提供参考。
关键词:卷积神经网络;机器视觉;模型;行为识别;侵略性行为;深度学习;群养猪
doi10.11975/j.issn.1002-6819.2019.23.024
中图分类号:TP391.41 文献标志码:A 文章编号:1002-6819(2019)-23-0192-09
高 云,陈 斌,廖慧敏,雷明刚,黎 煊,李 静,罗俊杰. 群养猪侵略性行为的深度学习识别方法[J]. 农业工程学
报,2019,35(23):192-200. doi10.11975/j.issn.1002-6819.2019.23.024 http://www.tcsae.org
Gao Yun, Chen Bin, Liao Huimin, Lei Minggang, Li Xuan, Li Jing, Luo Junjie. Recognition method for aggressive behavior of group
pigs based on deep learning[J]. Transactions of the Chinese Society of Agricultural Engineering (Transactions of the CSAE), 2019,
35(23): 192200. (in Chinese with English abstract) doi10.11975/j.issn.1002-6819.2019.23.024 http://www.tcsae.org
0 引 言1
集约化的养猪中侵略性行为(包括打斗,追逐等)
经常在猪群中发生,侵略性的行为易对猪只身体造成伤
害,在恶劣的猪舍环境中引起猪只感染,严重时导致死
亡,对猪场造成损失[1-3]。侵略性行为的发生还会影响猪
舍内的食物配给,猪群体系中弱势等级的猪只缺失食物
和饮水进给,导致猪只生长缓慢,影响母猪繁殖力,造
成严重的经济损失[4-7]。目前猪场对于猪只侵略性行为的
监测识别主要是依靠人工观察记录,这在集约化养殖的
猪场内会产生高昂的人工成本,且针对侵略性行为的观
察记录会不可避免的造成大量的漏检,也无法保证集约
化的猪场内对于侵略性行为监测的实时性,准确性和高
效性。集约化环境下的群养猪侵略性行为的自动检测识
别,是提高猪只福利及经济效益的重要基础[8-11]
目前,国内外已有许多研究者针对猪只侵略性行为
的检测识别做了相应的研究。Oczak 等利用图像处理技术
和人工神经网络针对猪舍内发生的高、中程度的侵略性
行为的分类问题进行了研究,实现了高、中程度的侵略
性行为的分类,但是需要对图像两帧的像素变化进行计
收稿日期:2019-07-10 修订日期:2019-10-29
基金项目:“十三五”国家重点研发计划项目(2016YFD0500506;中央高
校自主创新基金(2662018JC0032662018JC0102662017JC028
作者简介: 云,副教授,博士,主要从事农业智能检测与控制方面的研
究。Emailangelclouder@mail.hzau.edu.cn
算提取特征,会产生大量的计算,无法对大批量数据的
处理保持高效性[12]Viazzi 等提出的采用线性判别分
linear discriminant analysis, LDA)对猪只运动历史图像
中提取的特征进行分类来识别侵略行为,该方法需要先
获取所有猪只的历史运动图像然后从中提取相关运动强
度特征,同样导致了当样本量很大时产生大量的计算代
价,且 LDA 对于样本分类过分依赖均值信息,实际猪舍
环境内猪只会发生很多运动强度很小的侵略性行为,如
咬尾、咬耳等,其泛化能力还有待验证[13]Chen 等使用
层次聚类算法提取猪只的加速度特征,用于识别侵略性
行为。猪只的加速度是侵略性行为的重要特征,但是在
该方法中作者仅关注了视频帧中最先开始发生侵略性行
为的猪只,丢弃了其余非侵略性的猪只,这导致其余猪
只的信息被完全抛弃,而这部分信息是有可能发生侵略
性行为的[14]Jonguk 等使用支持向量机(support vector
machineSVM)处理运动猪只速度有关的 5个特征,实
现了对侵略性行为发生与否的识别,虽然该研究实现了
较高的识别准确度,但是仅是针对追逐和敲击两种侵略
性行为,且提取猪只速度的特征需进行额外计算,也存
在一定计算代价问题,难以实现实时检[15]。以上学者
关于猪只侵略性行为识别的研究都是基于图像处理技术
提取猪只图像中的某一特征,再结合机器学习等手段对
特征进行处理。但在实际应用中,传统的图像处理技术
需要额外提取特征,存在效率低、工作量大的问题。由
于不同种类猪只个体差异性大,随着猪只质量的增长,
23 云等:群养猪侵略性行为的深度学习识别方法
193
猪只非刚性的躯体也会随着发生变化,故提取的特征可
能不具有普适性。此外由于猪舍内猪体粘连、遮挡、不
良照明条件以及猪只的侵略性行为复杂等原因,传统的
方法难以在集约化养殖的猪场内做到对于猪只侵略性行
为的实时、高效的检测。
近几年深度学习相比于传统方法在图像和视觉领域
展现了其强大的优越性,深度学习通过对低维特征到高
维特征的提取学习,能够做到对绝大部分场景下各类任
务进行检测识[16-20]。正是在其他领域展现了其强大
学习泛化能力,深度学习也在猪只行为检测方面得到了
大量的应用。Yang 等基于 Fast R-CNN 检测猪只进食行为
[21]Yang 等使用全卷积神经网络的方法分割母猪和仔猪,
使用母猪几何特征和仔猪的动态计算乳房区域以及提取
对应的空间信息,再从视频帧中提取运动强度和占领指
数以识别母猪母性护理行为[22] 。杨秋妹等使用卷积神经
网络针对个体猪只饮水行为做出相应的研[23] Zheng
等使用 Faster R-CNN 来对母猪的站立、躺卧等行为做
识别[24]。深度学习在猪只的简单行为上均展现了优异的
性能,但目前针对存在多头猪只状态交互的较高级的侵
略性行为研究还比较少见[8]
本文采用深度学习的方法,搭建 3D 卷积神经网络模
型,并将其用于对群养猪中侵略性行为的识别,避免了
传统图像处理方法中复杂、繁琐的特征选择、处理等问
题。通过对网络模型进行训练,得到了一个端到端的,
有效识别侵略性行为的模型,并通过对不同批次猪只、
视频长短、不良照明条件的试验,验证算法的泛化性与
可行性。
1 材料与方法
1.1 猪只侵略性行为的定义
猪只的侵略性行为涉及到群养猪中多头猪只的状态
交互,是一个复杂,渐进的行为。在发生侵略性行为的
初期,猪只通过鼻子嗅闻、轻推等方式进行初步试探,
随后侵略性行为逐渐加剧,往往伴随着更激烈的挤压,
咬,撞击等行为[2]。在侵略性行为最剧烈的时候,会出现
咬耳朵,咬身体等现象。撕咬通常持续时间长,往往会
造成皮肤损伤,创口等[1,6,22]。故在此项研究中定义的侵
略性行为是撕咬,挤压,撞击,追逐,如表 1所示。
1 侵略性行为的定义
Table 1 Definition of aggressive behavior
侵略种类
Type of
aggressive
侵略性行为名称
Name of aggressive
behavior
行为描述
Behavior description
咬耳朵 Bite the ear
咬另外一只猪只的耳朵
咬尾 Bite the tail 咬另外一只猪只的尾巴
撕咬 Bite
咬身体
Bite the body
咬另外一只猪只的身体部位,包含嘴
部,颈部等
头对头撞击
Head to head knock
猪只用头部撞击另一只猪的头
撞击 Knock
头对身撞击
Head to body knock
猪只用头部撞击另一只猪的身体部位
追逐 Chase
追逐 Chase 在发生任一侵略性事件后导致的追逐
踩踏 Tread
踩踏 Tread 猪只用脚踩踏其他猪只的头部、身体等
1.2 数据采集
1.2.1 试验条件
试验数据采集分别于 2018 312 日-49日,
2018 419 日-516 日以及 2018 69日-7
6日在湖北省武汉市华中农业大学试验猪场内进 3
个批次的数据采集。3个批次氨气浓度分别为<3.80
15.1837.95 mg/m3其中<3.80 mg/m3组为模拟仔猪舍
通风良好的条件下的氨气浓度。因猪舍内总有粪便,会
产生一定的氨气,因此将试验环境的氨气控制到
<3.80 mg/m3作为对照。试验对象为 18 9.6 kg 左右的大
白保育猪,猪只外观颜色均匀。18 头猪养殖在课题组前
期研究设计的环境多变量控制养殖箱中[25 ],养殖箱内部
空间长、宽、高为 21.52 m底部粪槽深度设计为 0.45 m
养殖箱内部设有进食槽、饮水槽。试验中温度控制在
27.027.4 ℃,相对湿度控制在 50%70%为保证猪只
正常生活习性,每天 8:00 1700 喂食两次,饲喂模式
一致。箱内采用自动控制的 LED 灯照明,照明时间设定
7:00 18:00,其余时间 LED 灯关闭。
1.2.2 视频采集
试验通过 Kinect V2 摄像头采集 RGB 视频,位于养
殖箱的上部。镜头距养殖箱内部地板高度约为 1.8 m,采
用顶视角度的方式,可以获取整个养殖箱内部 6头猪只
的全部信息,不会出现漏拍猪只现象。摄像头连接一台
便携式笔记本电脑,将采集的养殖箱内部 6头猪只的
视角度彩色视频存储在 SEAGATE 移动硬盘上,出于
储成本的考虑,录制视频帧率为 5/s,存储像素为
1 920×1 080,存储为 AVI 格式。数据采集平台及养殖箱
如图 1所示。
1 数据采集平台
Fig.1 Data acquisition platform
1.2.3 视频数据预处理与标注
为了训练和评估侵略行为识别模型的性能,需要将
获取的数据进行标注。通过人工查阅采集的约 900 h 的视
频数据,对视频进行处理,分为侵略性行为和非侵略性
行为 2种类别,视频长度取决于猪只侵略性行为持续的
时间。根据前文中侵略性行为的定义,为区分侵略性行
为与非侵略行为,仅将至少持续 5帧的侵略行为进行标
注。在一个侵略性行为发生之后的 5 s 内若再有侵略性行
为发生,则将其归为同一次侵略性行为,即设置侵略性
行为间隔为 5 s。将录制的视频中存在掉帧现象的视频段
舍弃。最终定义侵略性行为视频时长最少为 3 s
农业工程学报(http://www.tcsae.org 2019
194
通过在 Python 3.6 上编辑的程序代码,对视频进行标
注。将标注的数据集随机划分为训练集,验证集以及测
试集,并获取对应的训练集、验证集以及测试集文档目
录,网络模型将会依赖这个目录实现对数据集的读取。
1.2.4 数据集介绍
根据表 1中定义的侵略性行为,对视频图像进行了
剪辑分段处理。在群养猪侵略性行为发生过程中,常涉
及到多种侵略性行为同时发生,或是一个行为结束后另
一个行为的再次发生,故视频中至少包含一种所定义的
侵略性行为,经过细致的分类后最终得到的 3批数据中
对应的侵略性行为发生的次数统计如表 2所示。
2 各类侵略性行为统计
Table 2 Various aggressive behaviors counting
行为分类
Behavior classes
第一批数据中
次数
Number of times
in the first batch
第二批数据中次数
Number of times in
the second batch
第三批数据中次数
Number of times in
the third batch
咬耳朵
Bite ear 364 91 146
咬尾 Bite tail
120 37 32
咬身体
Bite body 268 76 107
头对头撞击
Head to head hit
161 64 79
头对身撞击
Head to body hit
149 58 68
追逐 Chase 104 37 53
踩踏 Trample
50 32 29
根据表 2可以看出,在所有的行为类别中咬耳和咬
身体行为出现的次数最多,在 3期试验完成后均发现多
数猪只的耳部和部分猪只身上都有伤口,无疑侵略性行
为对猪只的健康和福利造成了严重的影响。
为了训练和评估所搭建的网络,将采集的第一批在
2018 312 日-49日,在 37.95 mg/m3的氨气浓
度下采集的试验数据按照 60%20%20%的比例作为网
络的训练集、验证集与测试集。后面两批试验数据2018
419 日-516 日,15.18 mg/m32018 69
日-76日,<3.80 mg/m3)中选取一部分侵略性行为
与非侵略性行为制作为测试集,数据集详情如表 3所示。
3 数据集划分
Table 3 Dataset partition
数据集类别
Dataset category
行为分类
Behavior classes
数量(段/帧数)
Number
(epoch/frames)
获取时间
Acquisition time
侵略性行为
Aggressive 285/10 556
训练集
Train dataset
非侵略性行为
None-Aggressive
270/9 364
侵略性行为
Aggressive 95/3 518
验证集
Validation
dataset 非侵略性行为
None-Aggressive
90/3 676
侵略性行为
Aggressive 175/6 230
测试集一
Test dataset 1
非侵略性行为
None-Aggressive
170/6 770
2018 312 日-
49
侵略性行为
Aggressive 161/4 862
测试集二
Test dataset 2
非侵略性行为
None-Aggressive
160/4 860
2018 419 日-
516
侵略性行为
Aggressive 220/6 727
测试集三
Test dataset 3
非侵略性行为
None-Aggressive
180/6 098
2018 69日-
76
1.3 侵略性行为识别算法
1.3.1 2D 3D CONV 网络模
现有的采用深度学习方法针对猪群行为的研究,通
常是使用常规的 2D 卷积核(2D CONV)搭建卷积神经
网络。2D CONV 是针对单张图片进行卷积操作,提取的
是图片的空间特征。针对猪只的分割,识别,行为检测
取得了很好的成果[21-24,26]但是针对猪只的侵略性行为识
别而言,仅通过单张图片进行识别并不准确,侵略性行
为是一个随时间进行的一个完整的行为,如果只在一帧
图像上得出结果对侵略性行为做出判断,这样将会丢失
了侵略性行为在时间维度上的运动信息,导致很高的错
误识别率,难以做出有效的判断。所以针对猪只侵略性
行为的识别,需要结合时间和空间维度上的信息。
3D 卷积核(3D CONV是一种在 2D CONV 中加入
了时间维度信息的卷积, 2所示是 3D CONV 对于视
频帧进行卷积运算时的操作,在对图像中的特征进行学
习的卷积运算过程中,3D CONV 多了在时间维度上的运
算,其卷积核大小为 d×d×k。卷积核在对当前帧图像进
行卷积操作时,还会在时间维度上对接下来 k - 1 帧图像
进行相应的卷积操作,即提取了时间序列上 k帧图像的信
息。d,k 取决于网络所定义的卷积核的尺寸。3D CONV
使得卷积提取的特征融合了在时间维度上的附近时间域
上的信息,保留了运动信息,为模型提取时间和空间上
的特征奠定了基础。3D CONV 搭建的卷积神经网络依然
具有 2D CONV 的局部连接,权值共享,层次结构等优
点,这使得使用 3D CONV 的卷积神经网络具有很强大的
学习能力[27]
注:点、线代表均代表卷积运算中的计算过程。
Note: The points and lines represent the calculation process in the convolution
operation.
2 3D 卷积计算时间维度为 3的运算
Fig.2 Process of 3D convolution operation with the time
dimension is three
1.3.2 3D CONV 侵略性行为识别网络模型
本文在 C3D[28]网络结构的基础上, C3D 网络结构
进行重新构建和优化,通过比较不同网络层数和卷积核
大小对网络模型准确度的影响,确定最佳的识别群养猪
侵略性行为的网络参数和模型。
如图 3所示,最终提出的模型为 3D CONVNet,在
C3D 的基础上做了 4项改进:
1C3D 网络只有 8层,对于卷积神经网络而言,网
络的层数越深所能提取到的特征越多,所学习到的特征
也越来越抽象。针对本文的侵略性行为识别的任务而言,
23 云等:群养猪侵略性行为的深度学习识别方法
195
需要低级和高级抽象的特征相结合才能对任务做出准确
识别。所以在充分考虑计算代价和模型性能的情况下,
将网络层数加深到了 19 层。
23D CONV 2D CONV 相比,在时间维度上多了
一个深度为 k大小的卷积运算,导致运算量剧增,故需
大量数据进行训练得到一个较优的模型。在数据量不充
足的情况下,网络通常会遭受过拟合的风险。所以在网
络结构中加入了 Dropout[29]这不仅可以有效的防止过拟
合,还显著减少了计算代价,可以更容易地去添加卷积
层数以学习更有意义的特征,还增强了网络的鲁棒性。
3)在网络结构中添加了 Batch NormalizationBN
[30],这是为了保证各层的参数数据分布不会随着卷积
操作的进行发生剧烈的变化,网络在一个不变的数据分
布范围内更容易学习到有用的特征。此外,可有效避免
在深层卷积神经网络中可能出现的梯度消失问题,还可提
高网络训练速度。
4)在网络中采用了多尺度特征融合的方法,多尺度
特征融合在最新的目标检测算法 SSD YOLO v3 中都得
到了应用[31-33]SSD YOLO v3 都是目前在目标检测
面最先进的算法。在网络中融合不同尺度的特征是提高
模型性能的重要手段,采用的特征融合让模型充分利用
到了各个阶段所提取的时间和空间上的特征,在学习更
抽象、更高级的特征时,仍然结合了低层的信息。这有
效利用了各层卷积核所提取特征的侧重点不同的特性。
多尺度特征融合的引入,让网络融合更多的低层信息,
对于网络模型待解决的涉及到低层与高层特征相关联的
识别任务而言,起到了关键作用。
如图 3a 所示,所提出的侵略性行为识别网络由前
特征提取网络,中间特征融合提取网络以及最后的输出
网络三部分组成。
前置特征提取网络由 3个卷积块组成,前置特征提
取网络及卷积块的构成如图 3b 所示。前置特征提取网络
在网络进行更进一步的特征提取和特征融合之前,提取
出一些有效的特征,减少有可能存在的噪声,减少一些
无效信息对于模型性能的影响。前置特征提取网络第一
个卷积块卷积核大小为 3×3×1第二、三卷积块大小均
3×3×3,卷积块输出通道数逐渐增多,分别为 16
3264。为了在进行特征融合之前,保留更多的有效特
征信息,所以在第一个卷积操作时,并不对时间序列上
的运动信息进行采集,更多的利用当前帧的信息,且采
用的 Max-pooling 在前置特征提取网络的步长为2,2,1
这会让网络更多的保留当前所提取的特征图信息,而不
是和后序时序特征融合,让网络保留更多的当前帧图像
的特征。卷积层后增加了 Batch Normalization 层、Relu
激活层,以及 Max-pooling 层。
多尺度特征融合由特征融合提取网络完成,如图 3c
所示。在这个特征融合提取阶段,主干网络上设置 3
个卷积阶段,网络仍然会继续提取更深层次的特征。为
了避免一些无效的特征被多次计算,导致计算代价的提
高和网络模型性能的下降,仅在一个卷积阶段完成后再
进行特征融合。在每个卷积阶段内设置了 5层卷积,
积核的大小均为 3×3×3,卷积通道数依次增长到 64
128256512,卷积步长均为111)。在跨越式
的特征融合里,因为特征的通道数并不相同,所以并不
能直接进行融合,在特征传递的连接中设置了尺寸 1
×1×1的卷积核,以保证特征融合时的通道数一致。
a. 网络组成
a. Network composition
b. 前置特征提取网络
b. Pre-feature extraction network
c. 特征融合提取网络
c. Feature extraction and fusion network
d. 行为预测输出网络
d. Behavior prediction output network
3 网络结构
Fig.3 Network structure
农业工程学报(http://www.tcsae.org 2019
196
输出网络如图 3d 所示,在做最终的输出预测之前,
网络对融合的特征进行了一次卷积操作,卷积核大小为 1
×1×1,步长为(111),并将网络的通道数提升到
1 024,这有利于网络充分整合时间和空间上的信息。在
连接到全连接层之前对特征图进行平均池化处理,然后
经全连接层后输入到 Softmax 完成最终的类别预测和置
信度计算。
网络的任务是识别侵略性行为与非侵略性行为,属
于一个二分类问题。网络采用 Adam 梯度下降法的反向
播更新优化模型,网络的损失函数采用
categorical_crossentropy loss,如式(1)所示。
ˆ
ˆ
( lg (1 ) lg(1 ))
L y y y y
1
式中 L为损失函数;y为真实的样本标签,01
ˆ
y
模型预测的样本标签。
1.3.3 训练参数设置
网络采用的激活函数均为整流线性单元(rectified
linear unit, relu),采用的优化算法为 Adam 梯度下降法,
batch_size 设定为 32momentum 0.9,设置迭代次数
20基础学习率为 0.005Dropout 失活率为 0.5使用
L2 正则化函数,正则化权重衰减系数 weight_decay
0.005
1.4 评价指标
为了全面、合理的对网络模型性能做出评价,采用
准确率Accuracy查准率(Precision)召回率Recall
F1 4个指标来评估模型的性能,如式(2)~(6
所示。
TP TN
Accuracy
TP FP FN TN
2
TP
P recision
TP FP
3
TP
Re call
TP FN
4
TN-TP
TP2
1F
样本总数 5
式中 TP 是正确识别侵略性行为的个数,TN 是正确识别
非侵略性行为的个数,FP 是将非侵略性行为识别位侵
性行为的个数,FN 是将侵略性行为识别为非侵略性行为
的个数。
1.5 试验流程
群养猪侵略性行为的识别检测试验步骤如下。
1)从采集的数据中提取出包含侵略性行为的视频片
段,制作出训练集,验证集和测试集;2)搭建群养猪侵
略性行为识别检测网络;3)将制作好的训练集作为网络
的输入进行训练;4)使用验证集对网络模型进行测试
得到 loss 函数以及网络识别准确度;5)根据得到的网络
识别准确度作为评价指标,调整网络参数,如学习率,
Batch_sizeweight_decay 等;6)调整参数后再次训练网
络,重复 35),直至 Loss 曲线收敛,训练集与验证
集识别准确度相近。
2 结果与分析
2.1 模型性能分析
在试验平台上,使用搭建的 3D CONVNet 对训练集
数据集进行 20 次迭代的训练,同时也采用验证集对网络
模型训练状况做一个初步的评价。网络在包含 380 个侵
略性行为片段(14 074 帧),360 个非侵略性行为片段
13 040 帧)的数据集上的训练集和验证集识别准确度和
模型 loss 曲线如图 4所示。
a. 模型准确度曲线
a. Model accuracy curve
b. 模型 loss 曲线
b. Model loss curve
4 模型训练曲线
Fig.4 Model training curve
由图 4可以看出,网络存在一个明显的训练优化
程,随着迭代次数不断增加,训练集和验证集的分类误
Loss 曲线逐渐降低,迭代至 15 次时,Loss 曲线趋近
于收敛。模型在训练集上的准确度达到了 96.78%,在验
证集上的准确度也由刚最初的 37%逐渐上涨至 95.70%
从第 10 次迭代过后,训练集和验证集的准确度的差距逐
渐减小,最终两者的准确度差在一个良好范围内,在迭
20 次过后,Loss 和识别准确度基本不再变化。识别准
确度曲线以及 Loss 曲线展现了一个良好的深度学习模型
的训练过程,模型逐渐学习到正确识别群养猪侵略性行
为的特征,达到了一个较好的训练效果,且模型没有陷
入过拟合和局部最优。
模型对群养猪侵略性行为有较好的识别性能。如图 5
所示,是随机抽取的模型对图像帧中侵略性行为和非侵
略性行为的识别效果,图 5a 是猪只在进食状态中,系统
判定为非侵略性行为(Nor),置信度为 0.967;图 5b
发生进食行为之外又发生撕咬行为,系统判定为侵略性
23 云等:群养猪侵略性行为的深度学习识别方法
197
行为(Attack),置信度为 0.821
对于追逐、撞击、踩踏、咬身体等侵略性行为,模
型均具有良好的表现。模型对于不同环境状态下的侵略
性行为识别有很好的鲁棒性,如在群养的 6头猪只中
侵略性行为仅发生在 2头猪只中,其余 4头猪只处于
食,饮水等非侵略性行为下,模型可准确识别出侵略性
行为的发生;在一次侵略性行为发生后的几秒钟之后,
猪群中再次发生侵略性行为,模型也可准确识别出侵略
性行为;在涉及到多头猪只的侵略性行为亦可准确识别。
此外,模型还可以对当前行为做出判断的同时给出一个
置信度分数,这更有利于网络模型对于识别是否为侵略
性行为提供依据和可解释性。
a. 模型对非侵略性行为的识别
a. Model identification of
non-aggressive behavior
b. 模型对侵略性行为的识别
b. Model identification of aggressive
behavior
5 模型识别结果
Fig.5 Model identification result
2.3 实际效果的检验
为进一步验证模型性能,需要在实际猪舍环境下检
测算法的有效性以及泛化性能。本节试验以相同养殖环
境下不同猪只、不同视频段时长、不良照明条件等因素,
对模型性能进行试验评估。
2.3.1 不同批次猪只对模型性能的影响
在实际的群猪养殖环境中,不同批次的同类猪只虽
总体上没有明显区别,但是不同猪只受到环境等外界因
素影响仍存在一定差异,如形体,身体部位轮廓,行为
习惯等。针对不同批次猪只,使用了测试集一、测试集
二以及测试集三对模型做出评价,表 4是模型在 3个测
试集上的测试结果的混淆矩阵。
4 模型在测试集上结果的混淆矩阵
Table 4 Confusion matrix of model’s result on test set
分类结果
The result of classification
数据集
Dataset
行为类别
Behavior classes
侵略性行为
Aggressive
非侵略性行为
None-aggressive
总计
Total
侵略性行为
Aggressive 165 10 175
测试集一
Test dataset 1
非侵略性行为
None-aggressive
13 157 170
侵略性行为
Aggressive 144 17 161
测试集二
Test dataset 2
非侵略性行为
None-aggressive
9 151 160
侵略性行为
Aggressive 189 31 220
测试集三
Test dataset 3
非侵略性行为
None-aggressive
5 175 180
侵略性行为
Aggressive 498 58 556
总计
Total 非侵略性行为
None-aggressive
27 483 510
从表 3中统计出在全部测试集的 1 066 个视频片段
中,总共 981 段(侵略性+非侵略性)视频被正确分类,
85 段(侵略性+侵略 频分 准确
92.03%,侵略性行为的查准率指标为 94.86%,侵略性行
为召回率指标为 89.57%,调和 Recall Precision F1
值为 92.14%。测试集一在 3个测试集中取得了最优的
现性能,侵略性行为识别准确度为 94.29%。这是由于测
试集一与训练集是来源于同一批次猪只,故模型对该批
次猪只数据拟合得更好。测试集二、三与训练集虽是来
源于不同批次的猪只,但是通过统计计算得到测试集二、
三的准确度指标分别为 89.44%85.91%。这充分说明了
模型的泛化性能良好,在针对同一养殖环境下的不同猪
只,该模型依旧可以以较高的准确地识别侵略性行为的
发生与否。
2.3.2 视频长短对模型性能的影响
因为侵略性行为发生的动因、种类以及猪只个数差
异等原因,导致发生侵略性行为的时间长短不一。为分
析侵略性行为持续时长对模型识别侵略性行为的性能的
影响,将测试集二、测试集三中的侵略性行为视频按时
间段进行分类,不同时间段的分布及分类测试结果如图 6
所示。
6 不同侵略性行为持续时长的测试结果
Fig.6 Test result of different aggressive behavior duration
从图 6中统计计算得出,视频段时长主要分布在
47 s 内(220 段)。这一时长段的视频在测试集总数中
的占比达到了 57.74%,且识别准确度达到了 89.55%
中又以>45 s 内的视频段量最多,占比达到了 24.41%
识别准确度达到了 93.55%34 s 内的视频与>67 s
内的视频占比基本一致,且识别准确度均达 80 %以上。
但是从 67 s 78 s 的视频中,视频段占比呈逐渐
小的趋势,识别准确度也是基本逐渐降低的趋势, 7 s
及以上的视频段中仅取得了 73.33%的识别准确度。
对于视频误识别的原因,主要有三点,一是在侵略
性行为发生的过程中,受侵略猪只的身体部位特征被侵
略性猪只所遮挡,如耳朵,尾部等部位,模型没有检测
到受侵害部位,所以无法判断是否发生侵略性行为;二
是侵略性行为过于温和,帧与帧之间基本无明显变化,
模型无法捕获时间维度上的运动信息,做出了错误的识
别结果;三是对于较长的视频,3D CONVNet 对视频的
逐帧处理时间过长,容易丢失时间维度上的运动信息,
导致时间维度上的运动信息对模型进行识别的作用有
农业工程学报(http://www.tcsae.org 2019
198
限,网络过分依赖于空间维度上的信息,导致了对长视
频的较高的误识别率。
2.3.3 不良照明条件对模型性能的影响
试验中,根据猪只作息习惯提供照明的 LED
7:0018:00 开启,其余时间仅通过养殖箱侧壁窗口的自
然光照明,但在实际的观察中发现有部分侵略性行为发
生在照明条件不佳的情况下。为了评估模型在不良照明
条件下的侵略性行为识别的表现,在测试集中将不良照
明条件下的视频段进行筛选,将这批数据单独进行测试,
最终得到的结果如表 5所示。
5 不良照明条件视频识别结果
Table 5 Result of poor-lighting condition video
行为类别
Behavior Classes
总计
Total
正确识别
Correct
classification
错误识别
Wrong
classification
准确率
Accuracy/%
侵略性行为
Aggressive 43 34 9 79.07
非侵略性行为
None-aggressive
27 27 0 100
总计 Total 70 61 9 87.14
由表 5可以看出,模型对于不良照明条件下侵略
行为识别准确率依然达到了 79.07%。在实际的猪只养殖
过程中,不良照明现象是普遍存在的,而本文所提出的
侵略性行为检测模型,对于猪只侵略性行为识别依然可
以在不良照明条件下取得较优的结果,体现了模型对于
光照条件的良好的适应性,更能促进模型在实际的集约
化养殖猪场内运用。
2.4 模型参数设置
本文提出的 3D ConvNet 特征融合提取网络中卷积核
大小为 3×3×3,卷积块内部层数为 5,该网络参数设置
是对特征融合提取网络的不同网络参数进行试验验证的
基础上,确定的最优的识别侵略性行为的网络参数。通
过对比试验发现,当卷积核大小为 3×3×3卷积块内部
卷积层数为 5时,网络取得了 95.70%的识别准确度。
积核大小为 3×3×1时,识别准确度仅为 49.22%。卷积
块内部卷积层数设置为 9层时,模型待训练参数增加至
7 401×103,识别准确度仅为 63.67%
2.5 不同模型对比分析
将本文网络模型与 C3D 模型以及其他基于 C3D 网络
的改进模型的检测识别效果进行对比。采用的训练集与
验证集均为前文所述的数据集。训练参数的设置均保持
一致。4个网络模型在验证集上检测识别性能如表 6所示。
C3D 网络模型在本文的数据的验证集上仅取得了
52.23%的识别准确度。在将 C3D 网络的卷积层加深至 19
层后,得到 C3D_1 模型,识别准确度仅提升至 64.58%
在实际训练 C3D_1 模型花费了大量的训练时间。在
C3D_1 的基础上,在网络中增加了 BN 层得到 C3D_2
BN)模型,在同样的数据集上,C3D_2 模型随仅取得
了少量的准确度提升,达到 65.63%,但是在实际训练中
加入 BN 层加速了网络收敛速度,网络仅运行了 5epoch
loss 函数便趋近于收敛。3D ConvNet 模型与 C3D 模型
相比,在验证集上的准确度上提升了 43.47 个百分点,
同样主干网络结构的 C3D_2 模型以及 C3D_1 模型相比
3D ConvNet 在引入了多尺度特征融合过后,识别准确度
提高至 95.70 %且实际训练模型至收敛的时间大大缩短
6 不同模型的性能比较
Table 6 Comparison of performances of different recognition
networks
模型
Model
准确率
Accuracy/%
模型待训练
参数数量
Model training
parameter size/103
单帧图像平均
识别时间
Single frame
average recogn
ition
time/s
C3D 52.23 78 003 2.3
C3D_119 层)
64.58 116 603 3.1
C3D_2BN
65.63 116 616 3.0
3D ConvNet 95.70 1 741 0.5
对照各算法的单帧图像平均识别时间,3D ConvNet
C3D 网络的基础上同时加深了网络的宽度和深度后,
由于 Dropout 以及 Batch Normalization 层的加入使网络
模型待训练参数数量减小至 1 741×103单帧图像平均识
别时间是 0.5 s,为所有对比网络中最短,且比次短时间
C3D 模型的 2.3 s 减少了 1.8 s,大大提升了识别效率。
针对本研究的猪只侵略性行为识别的目的,在集约
化养殖猪场内,要对群养猪的侵略性行为做到实时监测,
模型大小和单帧图像的平均识别时间极其重要,模型过
大不易加载和运行,且在移动端占用内存严重,检测识
别时间过长,无法实现实时检测。本文提出的网络模型
训练过后的大小仅为 76.3 MB在移动端上的移植不会受
到限制。且在 CPU 端(IntelRCoreTMi5-7500
的单帧图像检测时间为 0.50 s基本满足集约化群养猪侵
略性行为实时检测的要求。
3 结 论
本文基于深度学习研究了对群养猪侵略性行为进行
识别的网络模型,该网络模型在识别测试集的 1 066 个视
频片段中取得到较好的效果,具体结论如下:
1基于 C3D 网络提出了一种用于对群养猪侵略性
行为进行识别的 3D ConvNet 网络模型,网络结构在宽度
和深度两个方面进行改进。在深度上加深卷积层的数量,
并添加 BN 层和 Dropout 层;在宽度上,在网络模型中设
置多尺度特征融合,实现了对侵略性行为发生与否的判
断以及置信度的输出。
23D ConvNet 网络模型在测试集上取得了 92.03%
的准确率,在测试集的 1 066 个片段中,总共 981 段(侵
略性+非侵略性)视频被正确分类。侵略性行为的查准率
指标为 94.86%侵略性行为召回率指标为 89.57%调和
Recall Precision F1 值为 92.14%且在相同环境的不
同批次猪只的测试集上以及在照明不良条件下表现出良
好的泛化能力。
3 C3D 网络,C3D_119 层)网络和 C3D_2BN
网络相对比,在相同训练集和验证集的条件下,
3D ConvNet 在验证集上的识别准确率超过 C3D 网络
C3D_119 层)和 C3D_2BN)网络,达到 95.70 %
23 云等:群养猪侵略性行为的深度学习识别方法
199
在处理单帧图像识别速度最快仅需 0.5 s。网络模型在高
准确度的同时提高了图像检测时间,具有良好的有效性
和实时性。
结果说明基于 3D 卷积核的群养猪侵略性行为识别
网络的模型是稳定有效的。该算法为群养猪侵略性行为
识别提供方法和思路,为后续针对集约化养殖环境下的
猪只行为自动监测识别打下了基础。
[参 考 文 献]
[1] Turner S P, Farnworth M J, White I M S, et al. The
accumulation of skin lesions and their use as a predictor of
individual aggressiveness in pigs[J]. Applied Animal
Behaviour Science, 2006, 96(3/4): 245259.
[2] Kongsted, Grete A . Stress and fear as possible mediators of
reproduction problems in group housed sows: a review[J].
Acta Agriculturae Scandinavica, Section A-Animal Science,
2004, 54(2): 5866.
[3] 朱志谦. 工厂化养猪对猪行为及性能的影响与对策[J].
牧与兽医,2007(12)4041.
[4] Verdon M, Hansen C F, Rault J L, et al. Effects of group
housing on sow welfare: A review[J]. Journal of Animal
Science, 2015, 93(5): 1999.
[5] 施正香,李保明,张晓颖,等. 集约化饲养环境下仔猪行
为的研究[J]. 农业工程学报,200420(2)220225.
Shi Zhengxiang, Li Baoming, Zhang Xiaoying, et al.
Behaviour of weaning piglets under intensive farm
environment[J]. Transactions of the Chinese Society of
Agricultural Engineering (Transactions of the CSAE), 2004,
20(2): 220225. (in Chinese with English abstract)
[6] 张振玲,Rachel S E PedenSimon P Turner, .猪混群攻
击行为研究进展[J]. 猪业科学,201835(12)3437.
[7] 杨飞云,曾雅琼,冯泽,等. 畜禽养殖环境调控与智能养
殖装备技术研究进展[J]. 中国科学院院刊,201934(2)
163173.
[8] 何东健,刘冬,赵凯旋. 精准畜牧业中动物信息智能感知
与行为检测研究进展[J]. 农业机械学报,201647(5)
231244.
He Dongjian, Liu Dong, Zhao Kaixuan. Review of perceiving
animal information and behavior in precision livestock
farming[J]. Transactions of the Chinese Society for
Agricultural Machinery, 2016, 47(5): 231244. (in Chinese
with English abstract)
[9] 马丽,纪滨,刘宏申,. 单只猪轮廓图的侧视图识别[J].
农业工程学报,201329(10)168174.
Ma Li, Ji Bin, Liu Hongshen, et al. Differentiating profile
based on single pig contour[J]. Transactions of the Chinese
Society of Agricultural Engineering (Transactions of the
CSAE), 2013, 29(10): 168174. (in Chinese with English
abstract)
[10] 张萌,钟南,刘莹莹. 基于生猪外形特征图像的瘦肉率估
测方法[J]. 农业工程学报,201733(12)308314.
Zhang Meng,Zhong Nan,Liu Yingying.Estimation method of
pig lean meat percentage based on image of pig shape
characteristics[J]. Transactions of the Chinese Society of
Agricultural Engineering (Transactions of the CSAE), 2017,
33(12): 308314. (in Chinese with English abstract)
[11] 刘龙申,沈明霞,柏广宇,等. 基于机器视觉的母猪分娩
检测方法研究[J]. 农业机械学报,201445(3)237242.
Liu Longshen, Shen Mingxia, Bo Guangyu, et al. Sows
parturition detection method based on machine vision[J].
Transactions of the Chinese Society for Agricultural
Machinery, 2014, 45(3): 237242. (in Chinese with English
abstract)
[12] Oczak M, Viazzi S, Ismayilova G, et al. Classification of
aggressive behaviour in pigs by activity index and multilayer
feed forward neural network[J]. Biosystems Engineering,
2014, 119: 8997.
[13] Viazzi S, Ismayilova G, Oczak M, et al. Image feature
extraction for classification of aggressive interactions among
pigs[J]. Computers and Electronics in Agriculture, 2014, 104:
5762.
[14] Chen C, Zhu W, Ma C, et al. Image motion feature extraction
for recognition of aggressive behaviors among group-housed
pigs[J]. Computers and Electronics in Agriculture, 2017, 142:
380387.
[15] Jonguk L, Long J, Daihee P, et al. Automatic recognition of
aggressive behavior in pigs using a kinect depth sensor[J].
Sensors, 2016, 16(5): 631641.
[16] 孙钰,周焱,袁明帅,. 基于深度学习的森林虫害无人
实时监测方[J]. 农业工程学报201834(21)7481.
Sun Yu, Zhou Yan, Yuan Mingshuai, et al. UAV real-time
monitoring for forest pest based on deep learning[J].
Transactions of the Chinese Society of Agricultural
Engineering (Transactions of the CSAE), 2018, 34(21): 74
81. (in Chinese with English abstract)
[17] Krizhevsky A, Sutskever I, Hinton G E. ImageNet
classification with deep convolutional neural networks[C]//
International Conference on Neural Information Processing
Systems. Curran Associates Inc. 2012: 10971105.
[18] Ren S, He K, Girshick R, et al. Faster R-CNN: Towards
real-time object detection with region proposal networks[J].
IEEE Transactions on Pattern Analysis & Machine
Intelligence, 2017, 39(6): 11371149.
[19] Zeiler M D, Fergus R. Visualizing and understanding
convolutional networks[C]//European Conference on
Computer Vision. Springer, Cham, 2014: 818833.
[20] Zhang Z, Fidler S, Urtasun R. Instance-Level Segmentation
for Autonomous Driving with Deep Densely Connected
MRFs[C]// Computer Vision & Pattern Recognition. 2016.
[21] Yang Qiumei, Xiao Deqin, Lin Sicong. Feeding behavior
recognition for group-housed pigs with the Faster R-CNN[J].
Computers and Electronics in Agriculture, 2018, 144: 453
460.
[22] Yang Aqing, Huang Huasheng, Zheng Chan. High-accuracy
image segmentation for lactating sows using a fully
convolutionalnetwork[J]. Biosystems Engineering, 2018,
1763647.
[23] 杨秋妹,肖德琴,张根兴. 猪只饮水行为机器视觉自动识
[J]. 农业机械学报,201849(6)232238.
Yang Qiumei, Xiao Deqin, Zhang Genxin. Automatic pig
drinking behavior recognition with machine
vision[J].Transactions of the Chinese Society for Agricultural
Machinery,2018,49(6):232-238. (in Chinese with English
abstract)
[24] Zheng Chan, Zhu Xunmu, Yang Xiaofan. Automatic
recognition of lactating sow postures from depth images by
deep learning detector[J]. Computers and Electronics in
Agriculture, 2018, 147: 5163.
[25] 高云,陈震撼,王瑜,等. 多环境参数控制的猪养殖箱设
计及箱内气流场分析[J]. 农业工程学报,201935(2)
203212.
Gao Yun, Chen Zhenhan, Wang Yu, et al. Design for pig
breeding chamber under multiple environment variable
control and analysis of internal flow field[J]. Transactions of
the Chinese Society of Agricultural Engineering
(Transactions of the CSAE), 2019, 35(2): 203212. (in
Chinese with English abstract)
[26] 高云,郭继亮,黎煊,等. 基于深度学习的群猪图像实例
分割方法[J]. 农业机械学报,201950(4)179187.
Gao Yun, Guo Jiliang, Li Xuan, et al. Instance-level
农业工程学报(http://www.tcsae.org 2019
200
segmentation method for group pig images based on deep
learning[J]. Transactions of the Chinese Society for
Agricultural Machinery, 2019, 50(4): 179187. (in Chinese
with English abstract)
[27] Ian Goodfellow, Yoshua Bengio, Aaron Courville. Deep
Learning[M]. 北京:人民邮电出版社,2016.
[28] Tran D, Bourdev L, Fergus R, et al. Learning spatiotemporal
features with 3D convolutional networks[C]//2015 IEEE
International on Computer Vision and Pattern Recognition,
2015: 46944702.
[29] Srivastava N, Hinton G, Krizhevsky A, et al. Dropout: A
simple way to prevent neural networks from overfitting[J].
Journal of Machine Learning Research, 2014, 15(1): 1929
1958.
[30] Ioffe S, Szegedy C. Batch normalization: Accelerating deep
network training by reducing internal covariate shift[C]//
International Conference on International Conference on
Machine Learning. JMLR.org, 2015.
[31] Zhang Z, Zhang X, Peng C, et al. ExFuse: Enhancing feature
fusion for semantic segmentation[C]// European Conference
on Computer Vision. Springer, Cham, 2018.
[32] Liu W, Anguelov D, Erhan D, et al. SSD: Single shot
MultiBox detector[C]// European Conference on Computer
Vision. 2016.
[33] Joseph Redmon, Farhadi Ali. YOLOv3: An incremental
improvement[EB/OL].[2018-04-08].https://arxiv.org/pdf/180
4.02767.pdf.
Recognition method for aggressive behavior of group pigs based on
deep learning
Gao Yun1,2, Chen Bin1, Liao Huimin1, Lei Minggang2,3, Li Xuan1,2, Li Jing1, Luo Junjie1
(1. College of Engineering, Huazhong Agricultural University, Wuhan 430070, China; 2. Cooperative Innovation Center for Sustainable
Pig Production, Wuhan 430070, China; 3. College of Animal Science and Technology, College of Animal Medicine,
Huazhong Agricultural University, Wuhan 430070, China)
Abstract: Pigs like to fight with each other to form a hierarchy relationship in groups. Aggressive behaviors, mostly fighting,
are frequently found in intensive pig raising facilities. Strong aggressive behaviors can cause other pigs lack of food and water,
growing slowly, wounds, sick and even dead in serious situation. This considerably reduces health and welfare of pigs and
further decreases economic benefits of pig industries. Monitoring and recognizing aggressive behaviors among pig group is the
first step to manage the aggressive behaviors in group pigs effectively. Traditional human recording method is time-consuming
and labor-intensive. This method can’t be used 24 hours a day, 7 days a week. Machine vision technique brings an automatic
monitoring method to solve this problem. In this paper, we introduced a new method for aggressive behaviors monitoring
based on deep learning. The experiments were held under controlled environments, which were achieved in an
environment-controlled chamber designed previously. The details of the chamber were depicted in a published paper written
by our research group. Nursery pigs were fed under three different concentration levels of NH3 gas, which were <3.80 mg/m3,
15.18 mg/m3, 37.95 mg/m3, with a suitable temperature of around 27 and the comfortable humidity between 50%-70%.
Each nursery group had six pigs and were weight around 9.6 kg. During each 28 days’ experiment of three concentration levels
of NH3, videos were taken from the top of the chamber. An end-to-end network, named 3D CONVNet, was proposed for
aggressive behavior recognition of group pigs in this paper, which based on a C3D network and built with 3D convolution
kernels. The network structure of the 3D CONVNet was improved in both width and depth dimensions. The number of main
convolutional layers was increased to 19, extra batch normalization and dropout layers were added to deepen the network.
Furthermore, the multi-scale feature fusion method was introduced to widen the network. This improvement had bettered the
performance of the algorithm considerably. To train the 3D CONVNet, 380 aggressive (14 074 frames) and 360
none-aggressive videos (13 040 frames) were chosen from experimental videos recording in experiments of two concertation
levels. These videos were randomly divided into training set and validation set, and the ratio of each set is 3:1. Another 556
aggressive videos and 510 none-aggressive videos from the three experimental batches were chosen to build the testing set.
There was no overlap among training set, validation set, and testing set. Results showed a total of 981 videos, including
aggressive and non-aggressive behaviors, was correctly recognized from the whole 1066 testing videos. The precision of the
3D CONVNet was proved to be 92.03% on testing set. Among them, the precision, recall rate and F1-Score for aggressive
behaviors were 94.86%, 89.57%, and 92.14%, respectively. The precision for different NH3 concentration experimental levels
were 94.29%, 89.44%, and 85.91%, respectively, which showed the generalization performance of the 3D CONVNet. With the
similar heat environments, the 3D CONVNet also showed the good performances under different illumination condition. The
comparison with C3DC3D_1 (19 layers) and C3D_2 (BN) networks resulted in 95.7% on validation set, 43.27 percent higher
than the C3D network. The recognition on single image using the 3D CONVNet was only 0.5 s, which was much faster than
the other three networks. Therefore, the 3D CONVNet was effective and robust in aggressive behavior recognition among
group pigs. The algorithm provides a new method and technique for aggressive behavior auto-monitoring of group pigs and
helps improve establishment of auto-monitoring system in pig farms and manage level of pig industry.
Keywords: convolutional neural network; machine vision; models; behavior recognition; aggressive behavior; deep learning;
group pigs
... Resnet-FPN network is used to improve Mask R-CNN deep learning model to identify the mounting behaviour of pigs . The 3D convolutional neural network model is proposed to identify the aggressive behaviour of herd pigs (Gao et al., 2019). YOLOv4 model is applied to detect the dietary behaviour of pigs (Jiang et al., 2021). ...
Article
Full-text available
Attack behaviour detection of the pig is a valid method to protect the health of pig. Due to the farm conditions and the illumination changes of the piggery, the images of the pig in the videos are often being overlapped, which lead to difficulties in recognizing pig attack behaviour. We propose an improved YOLOX target detection model to overcome these difficulties. The improvements of the proposed model are: (1) the normalization attention mechanism is adopted to gain global information in the last block of the neck network and (2) the loss function IoU in YOLOX is replaced by DIoU to improve the detection accuracy. The pig attack behaviour considered in this paper includes the ear biting, the tail biting, the head to head collision and the head to body collision. The dataset is builded from the artificially observed attack video segments by using the inter-frame difference method. In the pig attack behaviour detection experiments, the improved YOLOX model achieves 93.21% precision which is 5.30% higher than the YOLOX model. The experiment results show that the improved YOLOX can realize pig attack behaviour detection with high precision.
... The other parameters were set to default. According to the method described by Gao (2019), the training dataset:validation dataset ratio was set as 3:1. Thus, 44 observations were separated into 33 for training and 11 for validation. ...
... At present, many scholars applied CNN (convolutional neural networks) to recognize multi-category targets [1][2][3][4][5] in different fields. In pig-posture behavior recognition, Gao et al. [6] used a 3D convolutional network to recognize the aggressive pig behaviors, but some interference factors such as pig adhesion, different lightings, and complex behaviors resulted in a weak generalization ability of CNN. Zhou et al. [7] designed a pig face detection method with an attention mechanism [8][9][10] combined with a feature pyramid structure, which made the network more focused on the effective information area of the images, but the gradient dispersion appeared with the deepening of the number of the network layers. ...
Article
Full-text available
Due to the low detection precision and poor robustness, the traditional pig-posture and behavior detection method is difficult to apply in the complex pig captivity environment. In this regard, we designed the HE-Yolo (High-effect Yolo) model, which improves the Darknet-53 feature extraction network and integrates DAM (Dual attention mechanism) of channel attention mechanism and space attention mechanism, to recognize the posture behaviors of the enclosure pigs in real-time. First, the pig data set is clustered and optimized by the K-means algorithm to obtain a new anchor frame size. Second, the DSC (Depthwise separable convolution) and h-switch activation function are innovatively introduced into the Darknet-53 feature extraction network, and the C-Res (Contrary residual structure) unit is designed to build Darknet-A feature extraction network, so as to avoid network gradient explosion and ensure the integrity of feature information. Subsequently, DAM integrating the spatial attention mechanism and the channel attention mechanism is established, and it is further combined with the Incep-abate module to form DAB (Dual attention block), and HE-Yolo is finally built by Darknet-A and DAB. A total of 2912 images of 46 enclosure pigs are divided into the training set, the verification set, and the test set according to the ratio of 14 : 3:3, and the recognition performance of HE-Yolo is verified according to the parameters of the precision P, the recall R, the AP (i.e., the area of P-R curve) and the MAP (i.e., the average value of AP). The experiment results show that the AP values of HE-Yolo reach 99.25%, 98.41%, 94.43%, and 97.63%, respectively, in the recognition of four pig-posture behaviors of standing, sitting, prone and sidling of the test set. Compared with other models such as Yolo v3, SSD, and faster R–CNN, the mAP value of HE-Yolo is increased by 5.61%, 4.65%, and 0.57%, respectively, and the single-frame recognition time of HE-Yolo is only 0.045 s. In the recognition of images with foreign body occlusion and pig adhesion, the mAP values of HE-Yolo are increased by 4.04%, 4.94%, and 1.76%, respectively, while compared with other models. Under different lighting conditions, the mAP value of HE-Yolo is also higher than that of other models. The experimental results show that HE-Yolo can recognize the pig-posture behaviors with high precision, and it shows good generalization ability and luminance robustness, which provides technical support for the recognition of pig-posture behaviors and real-time monitoring of physiological health of the enclosure pigs.
... Behavior recognition mainly focuses on the identification of specific behaviors of individual pigs, such as movement behavior [9], aggressive behavior [10], biting behavior [11], lying behavior [12], and exploration behavior [13]. Research on pigs in a group mainly focuses on the identification and positioning of pigs, such as target segmentation [14], individual identification and counting [15], and aggressive behavior identification [16]. Studies from the perspective of tracking different behaviors of pigs and analyzing their activity areas are rare. ...
Article
Full-text available
Tracking the behavior trajectories in pigs in group is becoming increasingly important for welfare feeding. A novel method was proposed in this study to accurately track individual trajectories of pigs in group and analyze their behavior characteristics. First, a multi-pig trajectory tracking model was established based on DeepLabCut (DLC) to realize the daily trajectory tracking of piglets. Second, a high-dimensional spatiotemporal feature model was established based on kernel principal component analysis (KPCA) to achieve nonlinear trajectory optimal clustering. At the same time, the abnormal trajectory correction model was established from five dimensions (semantic, space, angle, time, and velocity) to avoid trajectory loss and drift. Finally, the thermal map of the track distribution was established to analyze the four activity areas of the piggery (resting, drinking, excretion, and feeding areas). Experimental results show that the trajectory tracking accuracy of our method reaches 96.88%, the tracking speed is 350 fps, and the loss value is 0.002. Thus, the method based on DLC–KPCA can meet the requirements of identification of piggery area and tracking of piglets’ behavior. This study is helpful for automatic monitoring of animal behavior and provides data support for breeding.
Article
In order to solve the problem that the complex pig house environment leads to the difficulty and low accuracy of abnormal detection of group pigs. The video of 9 adult fattening pigs were collected, and the video key frames were obtained by the frame differential method as the training set, and the YOLOX model for abnormal detection of group pigs was constructed. The results show that the average accuracy of YOLOX model on the test set is 98.0%. The research results can provide a reference for the detection of pig anomalies in the breeding environment of pig farms.
Chapter
The branch BP neural network of machine learning can recognize image information, so it can effectively improve the application effect of artificial intelligence. Compared with humans, machine learning has faster data processing speed, storage information and the ability to extract features. Machine learning models that deal with image recognition and cognitive abilities have a profound impact on applications such as massive image classification, image recognition positioning, image-text conversion, and image segmentation. This paper uses BP neural network to design a multi-feature fusion image recognition system. After using this system to extract the color and texture feature vector values of the image, the image recognition accuracy is calculated. Research shows that the system has the highest recognition accuracy of the image in the forest, reaching 93.68. %, the recognition accuracy of grass is the lowest, only 9.52%.
Article
Full-text available
In the context of the increasing importance on environmental control in large-scale pig farming operations, more attention is being given to the research on different environmental impacts on pig health. In order to conduct variable environments for pig health experiments, a pig breeding chamber under multiple environment variable control was designed in this paper for more precise environmental control experiments. The pig breeding chamber was composed of 4 parts, the main chamber, the air mixing box, the environmental regulation executing devices, and the environmental control system. The main chamber was the living space for experimental animals. The air mixing box was used to regulate the air variables, such as temperature, relative humidity, NH 3 and CO 2 concentration, before the air entering the main chamber. The environmental regulation executing devices involved the fans, the air valves, the air conditioning compressor, the air heating pipe, the electromagnetic valve for NH 3 . Then the environmental control system read environment variables through sensors and controlled the working of all the environmental regulation executing devices to limit the environment variables in the main chamber in a setting range. Main ventilation mode of the pig breeding chamber was self-circulated. A fan working at the outlet built a negative pressure to exhaust the airflow entering the main chamber through the air mixing box and the ventilation pipe. The air flow entered the main chamber through three air inlets, dissipated in the chamber and then was exhausted from the outlet, thereby forming the air circulation. The air conditioning compressor and air heating pipe in the air mixing box could cool down or heat up the airflow respectively, and the electromagnetic valve for NH 3 gas could increase NH 3 concentration of the airflow. The exhaust air valve was working with the fresh air valve to reduce NH 3 concentration or CO 2 concentration in the chamber. The airflow in the main chamber was optimized previously by ANSYS flow field simulation. The environmental control system of the chamber was composed of environmental variable detection module, S7-200 PLC (Programmable Logical Controller), and host computer. The environment variable detection module sampled all the environmental data, such as temperature, relative humidity, air velocity, NH 3 concentration, and CO 2 concentration, and sent to a STM32 microcontroller every 2 seconds. The program embedded in STM32 integrated these data into one data packet and sent them to the S7-200 PLC through a serial port. The S7-200 PLC transferred the data to the host computer and simultaneously calculated out control instructions to control environmental regulation executing devices, limited the chamber's internal environments to a setting range. Besides, the manure pump and LED lights were also controlled by the PLC to realize automatic manure cleaning and lighting timing. The host computer realized dynamic and real-time display and storage of environmental data. The running states of the executing devices were showed on the screen through WinCC monitoring software. Three tests for the pig breeding chamber were conducted, including the smoke test for air flow field, an empty chamber test and a full-loaded chamber test. The smoke test of air flow field verified the airflow pattern, which was simulated by ANSYS previously. Result showed the smoke formed a circle in the main chamber and dissipated all through the chamber without leaving any dead space. The test in the empty chamber verified the function and performance of the control system. The results of the environmental test with animals loaded showed that the control precision of temperature was limited within ±1℃, the relative humidity could be controlled within the pig comfortable range of 50%-80%, the oscillations of NH 3 concentration were limited less than ±3×10 ⁻⁶ when the setting value of NH 3 concentration was 10×10 ⁻⁶ , and the concentration of CO 2 could be controlled below 1 540×10 ⁻⁶ basically, which was a standard for animal health. During the full-loaded experiment, which lasted for almost 3 weeks, temperature, relative humidity, NH 3 and CO 2 concentrations variables inside the chamber were accurately controlled. This shows the pig breeding chamber can provide an effective platform for more precise pig, especially nurseries, breeding experiments under variable environments and potentially helps improve the research method to reveal the relationship between pigs and their environments. © 2019, Editorial Department of the Transactions of the Chinese Society of Agricultural Engineering. All right reserved.
Article
The unmanned aerial vehicle (UAV) remote sensing featured by low cost and flexibility offers a promising solution for pests monitoring by acquiring high resolution forest imagery. So the forest pest monitoring system based on UAV is essential to the early warning of red turpentine beetle (RTB) outbreaks. However, the UAV monitoring method based on image analysis technology suffers from inefficiency and depending on pre-processing, which prohibits the practical application of UAV remote sensing. Due to the long process flow, traditional methods can not locate the outbreak center and track the development of epidemic in time. The RTB is a major forestry invasive pest which damages the coniferous species of pine trees in northern China. This paper focuses on the detection of pines infected by RTBs. A real-time forest pest monitoring method based on deep learning is proposed for UAV forest imagery. The proposed method was consisted of three steps: 1) The UAV equipped with prime lens camera scans the infected forest and collects images at fixes points. 2) The Android client on UAV remote controller receives images and then requests the mobile graphics workstation for infected trees detection through TensorFlow Serving in real time. 3) The mobile graphics workstation runs a tailored SSD300 (single shot multibox detector) model with graphics processing unit (GPU) parallel acceleration to detect infected trees without orthorectification and image mosaic. Compared with Faster R-CNN and other two-stage object detection frameworks, SSD, as a lightweight object detection framework, shows the advantages of real-time and high accuracy. The original SSD300 object detection framework uses truncated VGG16 as basic feature extractor and the 6 layers (named P1-P6) prediction module to detect objects with different sizes. The proposed tailored SSD300 object detection framework includes two parts. First, a 13-layer depthwise separable convolution is used as basic feature extractor, which reduces several times computation overhead compared with the standard convolutions in VGG16. Second, most loss is derived from positive default boxes and these boxes mainly concentrated in P2 and P3 due to the constraints of crown size, UAV flying height and lens' focal length. Therefore, the tailored SSD300 retains only P2 and P3 as prediction module and the other prediction layers are deleted to further reduce computation overhead. Besides, aspect ratio of default boxes is set to {1, 2, 1/2}, since the aspect ratio of crown is approximate 1. The UAV imagery is collected on 6 experimental plots at 50-75 m height. The photos of No. 2 experimental plot are considered as test set and the rest are train set. A total of 82 aerial photos are used in the experiment, including 70 photos in the train set and 12 photos in the test set. The AP and run time of five models are evaluated. The average precision (AP) of the tailored SSD300 model reaches up to 97.22%, which is lower than the AP of original SSD300. While the proposed model has only 18.8 MB parameters, reducing above 530 MB compared with the original model. And the run time is 0. 46 s on a mobile workstation equipped with NVIDIA GTX 1050Ti GPU, while the original model needs 4.56 s. Experimental results demonstrate that the downsize of basic feature extractor and prediction module speed up detection with a little impact on AP. The maximum coverage of aerial photo captured at 75 m height is 38.18 m×50.95 m. When the UAV has a horizontal speed of 15 m/s, it takes 3.4 s to move to the next shooting point without overlap, longer than the detection time. Therefore, the proposed method can simplify the detection process of UAV monitoring and realizes the real-time detection of RTB damaged pines, which introduces a practical and applicable solution for early warning of RTB outbreaks. © 2018, Editorial Department of the Transactions of the Chinese Society of Agricultural Engineering. All right reserved.
Article
The feeding behavior of each individual pig is an important indicator to determine whether it is healthy or not. Therefore, automatic behavior recognition for individual pig is one of the core problem in precision pig farming. Video surveillance is a common tool for monitoring animal behaviors. To accurately identify each pig from the video sequences is a prerequisite for individual pig behavior recognition. This paper proposed to use Faster R-CNN to locate and identify individual pigs from a group-housed pen. The head of each pig was also located. An algorithm for associating the head of each pig with its body was designed. On this basis, a behavior recognition algorithm based on feeding area occupation rate was implemented to measure the feeding behavior of pigs. Experiment showed that our algorithm can recognize the feeding behavior of pigs with a precision rate of 99.6% and recall rate of 86.93%.
Article
In this paper, a new method for lactating sow image segmentation from the overhead views of commercial pens is proposed. The method includes two main steps. The first step is the segmentation of the lactating sows from the top-view images using a fully convolutional network (FCN). The second step is the refinement of the coarse output of the FCN using the probability map from the final layer of the FCN and Otsu's thresholding from the hue, saturation, and value colour information. Our segmentation model was trained using 3811 images which were randomly selected from the images of seven pens, and tested on 1085 images which were randomly selected from the images of the remaining 21 pens. The present method provided improved segmentation results compared with SDS, Otsu, MoG, and traditional FCNs. Our method attained a 96.6% mean accuracy rate and 93.0% mean intersection over union. The experiment revealed that our method was suitable for accurate and fast-image segmentation for lactating sows, laying a foundation for the precision husbandry of individual lactating sows.
Article
Modern semantic segmentation frameworks usually combine low-level and high-level features from pre-trained backbone convolutional models to boost performance. In this paper, we first point out that a simple fusion of low-level and high-level features could be less effective because of the gap in semantic levels and spatial resolution. We find that introducing semantic information into low-level features and high-resolution details into high-level features is more effective for the later fusion. Based on this observation, we propose a new framework, named ExFuse, to bridge the gap between low-level and high-level features thus significantly improve the segmentation quality by 4.0\% in total. Furthermore, we evaluate our approach on the challenging PASCAL VOC 2012 segmentation benchmark and achieve 87.9\% mean IoU, which outperforms the previous state-of-the-art results.
Article
We present some updates to YOLO! We made a bunch of little design changes to make it better. We also trained this new network that's pretty swell. It's a little bigger than last time but more accurate. It's still fast though, don't worry. At 320x320 YOLOv3 runs in 22 ms at 28.2 mAP, as accurate as SSD but three times faster. When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. It achieves 57.9 mAP@50 in 51 ms on a Titan X, compared to 57.5 mAP@50 in 198 ms by RetinaNet, similar performance but 3.8x faster. As always, all the code is online at https://pjreddie.com/yolo/
Article
The behaviors of livestock on farms are the primary representatives of animal welfare, health conditions and social interactions. Measuring behavior quantitatively in an automatic detection system on computer vision provides valuable behavioral information in an efficient and noninvasive way compared with manual observations or sensing techniques. Lactating sow postures, which are the crucial indicator of maternal evaluation, provide fundamental information for studying the maternal behavioral characteristics and regularities. We introduce a detector, Faster R-CNN, on deep learning framework to identify five postures (standing, sitting, sternal recumbency, ventral recumbency and lateral recumbency) and obtain sows accurate location in loose pens. The detection system consists of a Kinect v2 sensor that acquires depth images and a program that identifies sow postures and locates its bounding-boxes. The depth images of testing dataset of a sow were acquired at 5 frames per second in 24 h on the 15th day of postpartum, and training dataset were collected by some different sows. Since the identification performance from RGB images are impacted by the color and illumination variations caused by in-situ heat lamp and day-night cycle, we show that the automatic detection from depth images could avoid disturbances of the light. We find that the sow spent greater amount of time in recumbency (92.9% at night and 84.1% during the daytime) as compared with standing (0.4% at night and 10.5% during the daytime) and sitting (0.55% at night and 3.4% during the daytime). Statistically, the sow’s activity level is non-uniform in 24-h of a day, and her preferred lying positions is accordant with the pen’s floor design. The posture’s change frequency and average duration are presented. From the estimated general manner of posture change, we find that the sow takes more time in descending body than ascending, which could be a favorable indication of maternal ability with a slow-motion falling to avoid crushing piglets.
Article
The aim of this study is to develop a computer vision-based method to automatically detect aggressive behaviors among pigs. Ten repetitions of the same experiment were performed. In each of the experiment, 7 piglets were mixed from three litters in two pigpens and captured on video for a total of 6 h. From these videos, the first 3 h of video after mixing were recorded as a training set, and the 3 h of video after 24 h were recorded as a validation set. Connected area and adhesion index were used to locate aggressive pigs and to extract key frame sequences. The two pigs in aggression were regarded as a whole rectangle according to their characteristics of continuous and large-proportion adhesion. The acceleration feature was extracted by analyzing the displacement change of four sides of this rectangle between adjacent frames, and hierarchical clustering was used to calculate its threshold. Based on this feature, the recognition rules of medium and high aggression were designed. Testing 10 groups of pigs, the accuracy of recognizing medium aggression was 95.82% with a sensitivity of 90.57% and with a specificity of 96.95%, and the accuracy of recognizing high aggression was 97.04% with a sensitivity of 92.54% and with a specificity of 97.38%. These results indicate that the acceleration can be used to recognize pigs’ aggressive behaviors.
Conference Paper
State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [7] and Fast R-CNN [5] have reduced the running time of these detection networks, exposing region pro-posal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolu-tional features. For the very deep VGG-16 model [18], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image. The code will be released.
Article
Lean meat percentage (LMP) is an important indicator of pig quality, playing an important role in pig breeding and sale. At present, methods for detection of LMP are mostly destructive, i.e., by way of segmentation, weighing and calculation. However, advanced ultrasound equipment is expensive, and most individual farmers are unable to afford the cost. For slaughtering and food processing industries, it is very necessary to develop a rapid nondestructive LMP detection method. In this study, machine vision technology was applied to estimate LMP through external physical characteristics of pigs, so as to provide decision-making basis of pigs' quality for breeders and buyers. Therefore, technology should have a capacity of processing a large amount of vision information and high detection speed, and use a nondestructive detection method capable of acquiring global indexes. With MATLAB as a development tool, in this study, we realized the software interface through the Graphical User Interface (GUI), and selected the side image and back image of pigs as research objects. Different focal lengths and object distances would result in different ratio scales of images. To avoid these factors, ratios of parameters were selected rather than specific length, area and so on. Firstly, 116 sets of measured data were collected and analyzed. The results showed that the ratio of chest depth to body height, the ratio of hip width to body length, the ratio of hip width to waist width and the ratio of abdomen length to body length had certain relations with the LMP. Secondly, with 100 sets of measured data as training samples and remaining 16 sets of measured data as test samples, a prediction model based on radial basis function(RBF) neural network was built. The results showed that the average error of the test samples was 0.31%, and the maximum and minimum errors were respectively 0.47% and 0.07%. The precision and rate of the network all fulfilled the requirement. Then, seven groups of pig images were photographed, and after image gray processing and preprocessing by a series of weighted formulas, binaryzation by Otsu method, and secondary image denoising by morphological operations, and outline shapes were extracted. Based on Harris algorithm and inherent external physical characteristics of living pigs, we extract body length, body height, chest depth, abdomen length, hip width, waist width and other characteristic parameters. Finally, the calculated parameters were used as the input in the model to obtain corresponding LMP values which were compared with the measured data to verify feasibility of the method. In this study, seven groups of pig shape images were processed, respectively. The average estimated accuracy rates of the four ratios were 92.90%, 92.44 %, 95.17% and 96.51%, respectively. The average estimated accuracy rate of LMP reached 94.35%, and the maximum and minimum errors were 6.56% and 3.57%, respectively. The results showed that the new assessment method based on shape characteristics could be used for estimation of LMP of pigs with low cost and high efficiency. Furthermore, the future development trends of machine vision on nondestructive test of livestock were proposed since it prevents from the animal stress and anthropozoonosis. © 2017, Editorial Department of the Transactions of the Chinese Society of Agricultural Engineering. All right reserved.