ArticlePDF Available

Brain tumor segmentation based on the fusion of deep semantics and edge information in multimodal MRI

October 2022
Information Fusion 91(2)

October 2022
91(2)

DOI:10.1016/j.inffus.2022.10.022

Authors:

Zhi-Qin Zhu

Chongqing University of Posts and Telecommunications

Guanqiu Qi

Arizona State University

Show all 6 authorsHide

Brain tumor segmentation in multimodal MRI has great significance in clinical diagnosis and treatment. The utilization of multimodal information plays a crucial role in brain tumor segmentation. However, most existing methods focus on the extraction and selection of deep semantic features, while ignoring some features with specific meaning and importance to the segmentation problem. In this paper, we propose a brain tumor segmentation method based on the fusion of deep semantics and edge information in multimodal MRI, aiming to achieve a more sufficient utilization of multimodal information for accurate segmentation. The proposed method mainly consists of a semantic segmentation module, an edge detection module and a feature fusion module. In the semantic segmentation module, the Swin Transformer is adopted to extract semantic features and a shifted patch tokenization strategy is introduced for better training. The edge detection module is designed based on convolutional neural networks (CNNs) and an edge spatial attention block (ESAB) is presented for feature enhancement. The feature fusion module aims to fuse the extracted semantic and edge features, and we design a multi-feature inference block (MFIB) based on graph convolution to perform feature reasoning and information dissemination for effective feature fusion. The proposed method is validated on the popular BraTS benchmarks. The experimental results verify that the proposed method outperforms a number of state-of-the-art brain tumor segmentation methods. The source code of the proposed method is available at https://github.com/HXY-99/brats.

An example of multimodal MRI for brain tumor segmentation. (a) The ground truth (GT) segmentation label provided by domain experts (the green, yellow and red represent edema (ED), enhancing tumor (ET), and necrosis and nonenhancing tumor (NCR/NET), respectively). (b) The FLAIR modality. (c) The T1 modality. (d) The T1ce modality. (e) The T2 modality.

…

The framework of the proposed brain tumor segmentation method. It mainly consists of three modules: a semantic segmentation module, an edge detection module and a feature fusion module.

…

The architecture of a Swin Transformer block with the shifted patch tokenization strategy. Both W-MSA and SW-MSA are multi-head attention, which represent the regular window and the shifted window, respectively.

…

The architecture of the edge spatial attention block (ESAB) designed for edge feature enhancement.

…

The architecture of the multi-feature inference block (MFIB) for feature fusion. (a) The whole architecture of the i-th MFIB. (b) The specific structure of graph convolution.

…

Figures - uploaded by Zhi-Qin Zhu

Content may be subject to copyright.

Content uploaded by Zhi-Qin Zhu

Content may be subject to copyright.

Brain Tumor Segmentation Based on the Fusion of Deep Semantics and

Edge Information in Multimodal MRI

Zhiqin Zhua, Xianyu Hea, Guanqiu Qib, Yuanyuan Lia, Baisen Congc, Yu Liud,∗

aCollege of Automation, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China

bComputer Information Systems Department, State University of New York at Buﬀalo State, Buﬀalo, NY 14222, USA

cDiagnostics Digital, DH(Shanghai) Diagnostics Co, Ltd, a Danaher company, Shanghai, 200335, China

dDepartment of Biomedical Engineering, Hefei University of Technology, Hefei 230009, China

Abstract

Brain tumor segmentation in multimodal MRI has great signiﬁcance in clinical diagnosis and treatment.

The utilization of multimodal information plays a crucial role in brain tumor segmentation. However, most

existing methods focus on the extraction and selection of deep semantic features, while ignoring some features

with speciﬁc meaning and importance to the segmentation problem. In this paper, we propose a brain tumor

segmentation method based on the fusion of deep semantics and edge information in multimodal MRI, aiming

to achieve a more suﬃcient utilization of multimodal information for accurate segmentation. The proposed

method mainly consists of a semantic segmentation module, an edge detection module and a feature fusion

module. In the semantic segmentation module, the Swin Transformer is adopted to extract semantic features

and a shifted patch tokenization strategy is introduced for better training. The edge detection module is

designed based on convolutional neural networks (CNNs) and an edge spatial attention block (ESAB) is

presented for feature enhancement. The feature fusion module aims to fuse the extracted semantic and edge

features, and we design a multi-feature inference block (MFIB) based on graph convolution to perform feature

reasoning and information dissemination for eﬀective feature fusion. The proposed method is validated on

the popular BraTS benchmarks. The experimental results verify that the proposed method outperforms

a number of state-of-the-art brain tumor segmentation methods. The source code of the proposed

method is available at https://github.com/HXY-99/brats.

Keywords: Brain tumor segmentation, Transformer, convolutional neural networks, edge feature, feature

fusion

1. Introduction

Medical image segmentation is an important topic in the community of medical image processing. Among

them, brain tumor segmentation aims to localize multiple types of tumor regions from images, which is of

∗Corresponding author

Email addresses: zhuzq@cqupt.edu.cn (Zhiqin Zhu), s210301012@stu.cqupt.edu.cn (Xianyu He), qig@buffalostate.edu

(Guanqiu Qi), liyy@cqupt.edu.cn (Yuanyuan Li), bcong@dhdiagnostics.com (Baisen Cong), yuliu@hfut.edu.cn (Yu Liu)

Preprint submitted to Information Fusion October 2, 2022

great signiﬁcance to clinical practice [1]. Owing to the good capacity in providing high-resolution anatomic

structures for soft-tissues, magnetic resonance imaging (MRI) is mostly utilized in the diagnosis and treat-5

ment of brain tumor diseases. To obtain comprehensive information for accurate segmentation, multimodal

MRI scans with diﬀerent imaging parameters are usually required in brain tumor segmentation. Commonly-

used modalities include ﬂuid attenuation inversion recovery (FLAIR), T1-weighted (T1), contrast enhanced

T1-weighted (T1ce) and T2-weighted (T2). Images of diﬀerent modalities capture diﬀerent pathological

information and they can complement each other eﬀectively [2], which plays a crucial role in segmenting10

multiple types of brain tumor regions such as edema (ED), necrosis and non-enhancing tumor (NCR/NET),

and enhancing tumor (ET). An example of multimodal MRI for brain tumor segmentation is shown in Fig.

1. For simplicity, only a slice is selected from the entire scan. Fig. 1(a) shows the ground truth (GT)

segmentation label provided by domain experts. The green, yellow and red indicate ED, ET and NCR/NET

regions, respectively. From Fig. 1(b)-(e), it can be seen that the characteristics of diﬀerent modalities vary15

signiﬁcantly. For example, the FLAIR modality can well capture the ED regions with distinct edges or

boundaries between the tumor and normal tissues, while the T1ce modality is more eﬀective in detecting

the tumor core (i.e., the union of ET and NRC/NET) with high contrast [3].

Figure 1: An example of multimodal MRI for brain tumor segmentation. (a) The ground truth (GT) segmentation label

provided by domain experts (the green, yellow and red represent edema (ED), enhancing tumor (ET), and necrosis and non-

enhancing tumor (NCR/NET), respectively). (b) The FLAIR modality. (c) The T1 modality. (d) The T1ce modality. (e) The

T2 modality.

A variety of brain tumor segmentation approaches have been presented in the literature. In recent years,

deep learning-based methods have emerged as the mainstream in this ﬁeld [4]. The most popular way is20

to adopt semantic segmentation-oriented convolutional neural networks (CNNs) such as fully convolutional

networks (FCNs) [5], U-Net [6] and V-Net [7] to segment brain tumors. The CNN-based models can well

capture local features in 2D or 3D spaces. However, CNNs are limited by the receptive ﬁeld of convolutions,

leading to diﬃculty in characterizing the global dependencies of features, which is essentially an important

clue in semantic segmentation. To address this issue, Transformer-based models [8, 9] have been introduced25

into the study of brain tumor segmentation [10]. By establishing the connection between feature base

units (i.e., tokens) and adopting a self-attention mechanism, the Transformer-based models demonstrate

better capacity in capturing global contextual information. However, the input size of the standard vision

Transformer (ViT) [9] is ﬁxed, which causes the problem of excessive computational cost for semantic

segmentation that requires pixel-wise dense prediction. By constructing hierarchical structure to obtain30

feature maps like CNNs, the Swin Transformer [11] well solves this problem and achieves eﬃcient dense

prediction, which exhibits clear advantages for semantic segmentation problems. Nevertheless, the Swin

Transformer still suﬀers from the problem of low locality inductive bias, which means that it usually requires

a very large amount of training data to obtain satisfactory visual representation [12]. Since the dataset size

of most medical image analysis problems is typically very small, a pre-trained model is generally needed35

when utilizing ViT and its variants including Swin Transformer. However, an appropriate pre-trained model

is not always available in practice and performing pre-training by oneself is not an easy task.

In most existing brain tumor segmentation methods, the multimodal MRI scans are simply stacked

as the model input for semantic segmentation, which may cause the insuﬃcient utilization of multimodal

information [13, 14]. In fact, the role of some modalities tends to be more signiﬁcant in the segmentation task,40

as they contain more distinctive information. For instance, the FLAIR and T1ce modalities can eﬀectively

capture the edges of multiple types of tumor regions such as ED and ET. The edge information is actually very

important with regard to brain tumor segmentation, as it not only helps to achieve better localization for the

tumors, but also beneﬁts the boundary quality (e.g., sharpness and accuracy) of segmented regions [15, 16].

It is believed that the edge information could be a good complementary to the deep semantic information. It45

is worth noting that although some methods [2, 17, 18] consider the complementary features of multimodal

inputs and explore multimodal fusion accordingly, they mainly focus on the extraction and fusion of deep

semantic features without considering the importance of the speciﬁc edge features for segmentation.

In this paper, we propose a brain tumor segmentation method based on the fusion of deep semantics

and edge information in multimodal MRI, aiming to achieve a more suﬃcient utilization of multimodal50

information for accurate segmentation. Speciﬁcally, the proposed segmentation framework consists of three

main modules: semantic segmentation, edge detection and feature fusion. The semantic segmentation mod-

ule adopts the Swin Transformer as the backbone due to its advantages mentioned above. Moreover, we

introduce a shifted patch tokenization strategy [12] into the Swin Transformer to increase its locality in-

duction bias, so as to achieve easier training for small-size datasets. The edge detection module is designed55

based on CNNs to extract edge features from the FLAIR and T1ce modalities by considering their char-

acteristics. The feature fusion module is designed to fuse semantic features and edge features extracted

from MRI of diﬀerent modalities. This module adopts graph convolution to structure diﬀerent areas into

diﬀerent vertices and collects similar semantic features and edge features under the same vertex. It realizes

the reasoning and dissemination of information between semantic features and edge features, leading to the60

improvement of feature fusion eﬀect. Through the above designs, the proposed segmentation framework can

eﬀectively extract and fuse deep semantic features and edge features in multimodal MRI, and experimental

results on BraTS benchmarks in 2018, 2019 and 2020 demonstrate its superior performance in brain tumor

segmentation.

In summary, the contributions of this paper are four-fold.65

1. The primary contribution of this paper is that we propose a deep learning-based brain tumor segmen-

tation method that simultaneously utilizes deep semantic features and speciﬁc edge features. To the

best of our knowledge, this manner is diﬀerent from existing works that mostly focus on the extraction

and fusion of semantic features from multimodal MRI. To achieve this target, three modules including

a semantic segmentation module, an edge detection module, and a feature fusion module are designed,70

leading to the following three technical contributions.

2. We present a Swin Transformer-based semantic segmentation module to extract semantic features

for brain tumor segmentation. In particular, to address the problem caused by the lack of locality

inductive bias, we introduce a shifted patch tokenization strategy into the Swin Transformer, leading

to easier training for small-size datasets.75

3. We present a CNN-based edge detection module to extract edge features from the FLAIR and T1ce

modalities. In this module, an edge spatial attention block (ESAB) using Sobel operator is designed

to enhance edge features, which are extracted in a progressive manner.

4. We present a feature fusion module to fuse the extracted deep semantic features and edge features.

Speciﬁcally, a multi-feature inference block (MFIB) based on graph convolution is designed to achieve80

eﬀective feature fusion.

2. Related Work

Various methods for brain tumor segmentation have been proposed in recent years. These methods

can be broadly classiﬁed into two categories: the generative model-based methods and the discriminative

model-based methods [19]. The generative model-based methods focus on the appearance characteristics85

of tumorous and healthy tissues, thus requiring related domain-speciﬁc prior information, which is usually

obtained through probabilistic image atlases. Menze et al. [20] augmented a probabilistic atlas of healthy

tissue priors with a latent atlas of the lesion and derive the estimation algorithm to extract tumor boundaries

and the latent atlas from the image data. Heinrich et al. [21] employed discrete optimization and self-

similarity for multimodal medical image segmentation under discrete medical image registration framework.90

The discriminative model-based methods regard tumor segmentation as a classiﬁcation problem to determine

the property of voxels. Owing to the rapid development of machine learning techniques, the discriminative

model-based methods have gradually become the main trend in this ﬁeld. Early methods in this category

mainly rely on hand-crafted features such as local histograms [22] and texture features [23], and then employ

discriminative models such as decision tree [24] and conditional random ﬁeld [25] for classiﬁcation.95

In the past few years, deep learning has rapidly become the mainstream in the study of brain tumor

segmentation. Some early approaches adopt a patch-based classiﬁcation strategy and design CNNs to

predict the class of the center voxel of a 2D or 3D patch [19, 26]. However, it is diﬃcult for such patch-

based methods to fully consider the correlation among neighboring patches within a relatively large range.

To address this issue, end-to-end semantic segmentation models such as U-Net [6], attention U-Net [27]100

and U-Net++ [28] have become more popular in brain tumor segmentation. Myronenko [3] proposed a

segmentation network that adds a variational auto-encoder branch to reconstruct the input image for more

eﬀective feature learning. Liu et al. [29] introduced pixel-level image fusion as an auxiliary task to regularize

feature learning and presented a multi-task model for brain tumor segmentation. Isensee et al. [30] proposed

an adaptive framework based on 2D U-Net, 3D U-Net and U-Net Cascade. The framework automatically105

adjusts all hyperparameters without human intervention.

Although the CNN-based methods have achieved great success in brain tumor segmentation, it is known

that CNNs suﬀer from the limitation of capturing global contextual information, which is a crucial clue

for semantic segmentation. To solve this problem, the Transformer-based methods have gained increasing

attention in the ﬁeld of medical image segmentation with some representative models being proposed, such as110

the TransUNet [31] that combines Transformer and U-Net, the MedT [32] that presents gated axial-attention

for segmentation. In the study of brain tumor segmentation, the TransBTS proposed by Wang et al. [10] is

the ﬁrst work that uses Transformer for segmentation and achieves good performance. The above methods

are based on the standard ViT [9], in which the input size should be ﬁxed, leading to high computational

complexity for dense prediction problems such as semantic segmentation. The Swin Transformer [11], which115

adopts hierarchical structure to obtain feature maps like CNNs, can eﬀectively alleviate this problem. This

improvement greatly enhance the potential of Transformer models for semantic segmentation, and motivates

us to adopt the Swin Transformer for brain tumor segmentation in this work. Nevertheless, similar to the

standard ViT, the Swin Transformer still suﬀers from a defect, i.e., the lack of locality inductive bias [12].

As a consequence, it is pretty hard to utilize the Swin Transformer for small-size datasets without pre-120

training, leading to some inconvenience when applying it to medical image analysis tasks including brain

tumor segmentation, since an appropriate pre-trained model is not always available in practice. To tackle

this problem, we introduce a shifted patch tokenization strategy [12] into the Swin Transformer for brain

tumor segmentation, so that the model can be trained from scratch.

In order to obtain more accurate segmentation results, the use of multimodal MRI data has become an125

interesting topic in brain tumor segmentation. Most existing methods simply adopt an multi-channel input

by stacking multimodal MRI scans and don’t fully consider the diﬀerence in terms of their importance to

brain tumor segmentation. Pereira et al. [33] designed a convolutional network for automatic brain tumor

segmentation using a four-channel format for multimodal images. Wang et al. [34] achieved the segmentation

of diﬀerent brain tumor regions by constructing three cascaded networks. In order to use multimodal130

information more eﬀectively, some feature fusion and selection approaches have appeared based on speciﬁc

architectures such as attention mechanism. Dolz et al. [35] extended dense connections to multimodal image

segmentation based on DenseNets. Each modality to be input into the network individually is considered

as a branch and dense connections between each branch are used to fuse features from diﬀerent modalities.

Liu et al. [36] proposed an attention-based modality selection feature fusion module for multimodal feature135

reﬁnement to address the diﬀerence among multiple modalities for a given segmentation target. Zhang

et al. [37] used FCN to extract features from images of diﬀerent modalities, and designed a modality-

aware module for more eﬃcient information exchange across diﬀerent modalities. Mo et al. [2] divided the

diﬀerent modalities into main modality and auxiliary modality, and applied the attention mechanism for

feature fusion.140

Although the above-mentioned methods make good eﬀorts to utilize multimodal MRI information, they

all focus on the extraction and selection of deep semantic features, while ignoring some features with speciﬁc

meaning and importance to the segmentation problem. In this paper, in addition to the semantic features, we

further concentrate on extracting the edge information of multiple types of tumor regions from some relevant

modalities including FLAIR and T1ce, which are of great signiﬁcance to improve segmentation quality, since145

the edge information is helpful to obtaining more accurate locations and boundaries of tumors. The extracted

edge features are merged with the semantic features, aiming to utilizes multimodal MRI more eﬀectively

and improve the segmentation accuracy accordingly.

3. The Proposed Method

3.1. Overview150

The framework of the proposed brain tumor segmentation method is shown in Fig. 2. It is mainly

composed of a semantic segmentation module, an edge detection module and a feature fusion module.

The semantic module adopts an improved Swin Transformer block to extract deep semantic features from

multimodal MRI scans including FLAIR, T1, T1ce and T2. The edge detection module aims to extract edge

features by employing a convolutional network as the backbone and designing edge spatial attention blocks155

(ESABs) for feature enhancement. Considering the modal characteristics of MRI modalities, only FLAIR

and T1ce are selected as the input of the edge detection module. The feature fusion module consists of several

multi-feature inference blocks (MFIBs), aiming to fuse the semantic features obtained from the semantic

segmentation module and the edge features obtained from the edge module at multiple levels. To reconstruct

the segmentation result, a successive expanding decoder that is widely adopted in the U-Net-like architectures160

is employed . For the edge detection task, the result is directly obtained by bilinear interpolation. For the

Figure 2: The framework of the proposed brain tumor segmentation method. It mainly consists of three modules: a semantic

segmentation module, an edge detection module and a feature fusion module.

feature fusion model, the output includes both an edge detection result and a segmentation result. To train

the network, the semantic segmentation module and edge detection module are ﬁrst trained individually.

Then, the output of the feature fusion module is used to supervise the training of the entire model.

3.2. Semantic Segmentation Module165

In the semantic segmentation module, we apply the Swin Transformer with an improved patch merging

approach to extract semantic features for the segmentation tasks. The Swin Transformer consists of four

stages, as shown in Fig. 2. For the last three, the original Swin Transformer blocks with a patch merging

step is adopted [11]. For the ﬁrst one, the steps of patch partition and linear embedding are required prior

to the Swin Transformer blocks. Let X∈RH×W×Cdenote the input, where H×Wrepresents the size170

of the input feature map and Crepresents the number of channels. The input image is ﬁrst divided into

Mpatches of size P×P, and then each patch is reshaped into a 1D vector xp∈RN×(P2·C). Next, these

patches are ﬂattened and mapped to Ddimension through a trainable linear projection E∈R(P2·C)×Dto

obtain the visual token zinvolving a learnable position variable Epos ∈R(P2·C)×Das

z=xpE+Epos,(1)

where zis input to the Transformer Block as an embedding sequence. Since the Transformer directly divides175

the features, the local information in the patch is diﬃcult to capture, thereby making the Transformer lack

the ability of locality inductive bias.

Figure 3: The architecture of a Swin Transformer block with the shifted patch tokenization strategy. Both W-MSA and SW-

MSA are multi-head attention, which represent the regular window and the shifted window, respectively.

To address this problem, as shown in Fig. 3, this paper introduces a shifted patch tokenization strategy

[12], which can embed more spatial information into the visual token, increasing the locality induction ability

of the Transformer to avoid extensive pre-training. Speciﬁcally, the input is shifted before patch partition180

by a patch size from four directions, and then the original input and its shifted versions are spliced. Finally,

patch partition and linear embedding are performed.

Each stage in the Swin Transformer consists of Lblocks consisting of multi-head self attention (MSA)

and multilayer perceptron (MLP). The structure of each block is shown in Fig. 3. The layer normalization

(LN) is ﬁrstly applied and residual connections are used. The MLP contains two fully connected layers with185

GELU. The above process can be expressed as

ˆzl= W−MSA LN zl−1+zl−1,(2)

zl= MLP LN ˆzl+ ˆzl,(3)

ˆzl+1 = SW−MSA LN zl+zl,(4)

zl+1 = MLP LN ˆzl+1 + ˆzl+1,(5)

where ˆzland zlrepresent the output features of the W-MSA or SW-MSA module and MLP module at the190

l-th block, where W-MSA and SW-MSA denote window-based multi-head self-attention using regular and

shifted window partitioning conﬁgurations, respectively. At the end of the Transformer layer, the output zl

goes through a LN layer to obtain the ﬁnal output z:

z= LN zl.(6)

To generate a hierarchical representation, the Swin Transformer reduces the number of tokens and

Figure 4: The architecture of the edge spatial attention block (ESAB) designed for edge feature enhancement.

increases the dimensionality through patch merging. In each patch merging step, the features of adjacent

2×2 patches are concatenated and the concatenated features are linearized to increase the dimensionality.

Speciﬁcally, the following output features are obtained after four stages.

seg ∈RH

2×W

2×128, X 2

seg ∈RH

4×W

4×256

seg ∈RH

8×W

8×512, X 4

seg ∈RH

16 ×W

16 ×1024

These features are subsequently input into the feature fusion module and fused with the output features

of the edge detection module to achieve better segmentation performance.195

3.3. Edge Detection Module

The segmentation performance can be improved by supplementing the edge information of brain tumors.

However, the edge features are shallow features. Directly using features of the last convolution block will

force the deep network to capture the shallow edge features, thus aﬀecting the extraction performance of

the edge features. At the same time, the middle layers can also bring rich convolutional features about edge200

information [38, 39]. Therefore, it is necessary to utilize all the convolution layers to obtain richer features.

To this end, this paper designs an edge detection module to utilizes the features of multiple convolution

layers simultaneously. As shown in the edge detection of Fig. 2, after the image is input into the network,

features of diﬀerent dimensions are extracted through 4 convolution blocks. The convolution block consists

of two 3×3 convolutional layers, a regularization layer and a 2 ×2 max-pooling layer. Speciﬁcally, when

the image to be detected is input to the edge detection module, the output features obtained after four

convolution blocks are given as

edge ∈RH

2×W

2×128, X 2

edge ∈RH

4×W

4×256

edge ∈RH

8×W

8×512, X 4

edge ∈RH

16 ×W

16 ×1024

In order to enhance the edge features, the features of each convolution block are reﬁned by an edge

spatial attention block (ESAB). The architecture of the ESAB is shown in Fig. 4. The Sobel convolution

is used in the ESAB. The 1 ×1 convolution reduces the feature volume to a single channel map and obtain

the output feature. Furthermore, the output features of a certain layer are added to the output features of205

the next layer, leading to a progressive manner to obtain richer edge features. Finally, the output features

are interpolated to the original input size to reconstruct edge detection result.

3.4. Feature Fusion Module

The feature fusion module that consists of four multi-feature inference blocks (MFIBs) is designed to

fuse semantic features and edge features extracted by the above two modules. In recent years, graph-210

based applications have become more and more widespread and have been veriﬁed to be an eﬀective way of

relational reasoning, which makes it a suitable tool to implement multi-feature fusion [40]. In this work, we

design the feature fusion module based on graph convolution by referring to a a recent work on graph-based

global reasoning [41]. The architecture of our MFIB is shown in Fig. 5(a).

The features obtained by the given semantic segmentation module and edge detection module are denoted215

as Xseg ∈RH×W×Cand Xedge ∈RH×W×C, respectively. To achieve better integration of semantics and

features, we map the input features from the spatial domain Xto the graph domain G∈RN×F, where N

represents the number of nodes in the graph and Frepresents the features contained in a node [41]. In this

way, pixels with similar features can be aggregated into a node as an anchor to generate a semantic-aware

graph feature. The feature fusion process is detailed as below. Let Xseg and Xedge denote the input semantic220

and edge features of a given MFIB, respectively. They are mapped to the graph domain to obtain Gseg and

Gedge through two convolutional layers as

Gseg =v(Xseg;Wv)⊗w(Xseg ;Ww),(7)

Gedge =v(Xedge ;Wv)⊗w(Xedge ;Ww),(8)

where v(·) represents the convolution operations used for graph projection and w(·) represents the convolu-

tion operations used for feature dimensionality reduction. Wvand Wwdenote the learnable kernels of v(·)225

and w(·), respectively. The symbol ⊗indicates matrix multiplication. More details of the above projection

can be found in [41].

After projection, in order to learn the relationship between the related node features of the semantic

graph and edge graph, the graph convolution [42] is adopted to learn edge weights corresponding to the

features of each node for reasoning on the fully connected graph. The input of the graph convolution unit230

Gis obtained by adding Gseg and Gedge as

G=Gseg +Gedge.(9)

The architecture of the graph convolution unit is shown in Fig. 5(b), which is implemented by two 1D

Figure 5: The architecture of the multi-feature inference block (MFIB) for feature fusion. (a) The whole architecture of the

i-th MFIB. (b) The speciﬁc structure of graph convolution.

convolutions in channel-wise and node-wise directions. As a result, the output can be expressed as

G= ((I−Ag)G)Wg,(10)

where I∈RN×Nrepresents the identity matrix, Ag∈RN×Nrepresents the adjacency matrix, and Wg

represents the update parameter. Agand Wgare all randomly initialized during training and optimized by235

gradient descent.

The fused graphs for semantic and edge features are further calculated as

Gseg =Gseg +ˆ

G, (11)

Gedge =Gedge +ˆ

G. (12)

Then, we remap ˆ

Gseg and ˆ

Gedge back to the original spatial domain through the projection operation v(·)

obtained in the mapping step to obtain fused features ˆ

Xseg and ˆ

Xedge as240

Xseg =Xseg +v(Xseg;Wv)T⊗ˆ

Gseg,(13)

Xedge =Xedge +v(Xedge;Wv)T⊗ˆ

Gedge.(14)

In the proposed method, the input of the ﬁrst MFIB are exactly the semantic and edge features obtained

at the ﬁrs stage in the semantic segmentation module and the edge detection module, respectively. For the

last three MFIBs, the fused features obtained by the previous block is added into the corresponding original245

features to generate the input, as shown in Fig. 5(a).

3.5. Loss Function

The proposed segmentation network is trained by three loss functions, Lseg , Ledge , Lf usion. The Lseg is

the loss of the semantic segmentation module for learning semantic features. The BCEDiceLoss, which is a

combination of the binary cross-entropy (BCE) loss and the dice loss [7], is used to deﬁne Lseg as250

Lseg =X0.5·(−ylog (ˆy)−(1 −y) log (1 −ˆy)) + 1−2|y∩ˆy|

|y|+|ˆy|,(15)

where yrepresents GT and ˆyrepresents the prediction result.

The Ledge is the loss of the edge detection module for learning edge features. For the edge detection

problem, the class imbalance problem is important because most samples are negative. To address this

problem, the Ledge is deﬁned by combining the edge loss presented in [38] and the dice loss as

Ledge = 0.5·LCRF +LDice,(16)

where the deﬁnition of LCRF is given as follows255











α·log 1−ˆyj

iˆyi= 0

0 0 <ˆyi< η,

β·log ˆyj

iother

(17)

where ˆyj

irepresents the predicted value of the i-th pixel of the j-th edge map, and ηis a pre-deﬁned threshold.

This means that if a pixel marked as positive is by less than ηinterpreters, it is discarded when the loss

is calculated and a positive sample is not considered. βis the number of percentages divided according to

negative samples. α=λ·(1 −β), where λis a hyperparameter for balancing positive and negative samples.

The Lfusion is the loss of the feature fusion module. It is deﬁned as260

Lfusion =L′

seg +γ·L′

edge,(18)

where γis the weight parameter. L′

seg and L′

edge have the same deﬁnitions as Lseg and Ledge, but using the

predictions of the feature fusion module for calculation.

In this paper, we ﬁrst train the semantic segmentation module and edge detection module using Lseg

and Ledge, respectively. Then, the entire model is further trained using Lf usion.

4. Experiments265

4.1. Dataset and Implementation Details

In the experiments, the training and testing datasets are all from BraTS2018, BraTS2019 and BraTS2020

benchmarks [43–45]. As an important public dataset for multimodal brain tumor segmentation, BraTS is

used in the annual MICCAI brain tumor segmentation challenge and widely adopted in the study of this

topic. The dataset is added, deleted or replaced in each year’s competition to enrich its scale. BraTS2018,270

2019, and 2020 have 285, 335, and 369 annotated brain tumor samples for model training, respectively.

Each case has MRI scans of four diﬀerent modalities (Flair, T1, T1ce and T2) and are labeled by domain

experts. The labels contain four classes: background, NCR/NET, ED and ET. The evaluation is based on

three diﬀerent brain tumor regions: Whole Tumor (WT = NCR/NET + ED + ET), Tumor Core (TC =

NCR/NET + ET) and Enhancing Tumor (ET). In this paper, we adopt two most commonly used metrics in275

medical image segmentation for performance assessment, which are the Dice Score and the %95 Hausdorﬀ

Score (HD).

In the preprocessing stage, the size of each scan is 240 ×240 ×155. In this paper, the scans of all

modalities are sliced, and the size of each slice is 240 ×240. For the semantic segmentation module, all the

four modalities are used as the input. For the edge detection module, the input consists of the FLAIR and280

T1ce modalities. In addition, the popular z-score normalization is performed on the raw data to resolve

inconsistencies in image contrast under diﬀerent modalities.

All the programs were implemented under the PyTorch framework. The training process is conducted

on four Tesla P100 GPUs. The optimizer used in the experiments is Adam [46]. The momentum is set to

0.9. The initial learning rate, weight decay and batch size are set to 1e-3, 1e-5, and 16, respectively.285

4.2. Parameter Analysis

In Section 3.5, several several adjustable parameters such as η,βand γare involved in the loss functions.

The parameters ηand βare set to the default values 0.3 and 1.1 according to [?]. In this subsection, we

mainly analyze the eﬀect of the parameter γin Eq. (18) on the segmentation performance obtained by the

proposed method. The function of the parameter γis to balance the semantic segmentation loss and the290

edge detection loss to ensure both of them have suﬃcient contributions. In this experiment, we set γto a

set of values including 1, 0.5, 0.1 and 0.05 to study its impact. The BraTS 2018 benchmark is used in this

experiment. The corresponding evaluation results are shown in Table 1. It can be seen from the results

that the performance on HD tends to be better when γincreases. This is because the edge information is

more concerned when γis larger, leading to higher accuracy in terms of the tumor boundaries, while the the295

metric HD is sensitive to that. For the metric Dice, the proposed method tends to obtain higher performance

when a relatively smaller γis used, which indicates that the semantic features may be more important in

terms of this metric. Based on the above observations, we set γto 0.1 by default in our method to achieve

a good trade-oﬀ between the performances on two metrics.

Table 1: The segmentation performance of the proposed method using diﬀerent values of the parameter γin Eq. (18).

WT TC ET Average

Dice HD Dice HD Dice HD Dice HD

γ=1 90.15 3.364 87.91 4.615 81.37 3.327 86.48 3.769

γ=0.5 90.85 3.720 88.08 4.569 81.62 3.615 86.85 3.968

γ=0.1 90.89 3.923 87.96 5.217 81.94 3.440 86.93 4.193

γ=0.05 90.63 4.419 88.14 5.545 81.41 4.289 86.72 4.289

4.3. Comparison with Other Methods300

To verify the superiority of the proposed method for brain tumor segmentation, several state-of-the-art

segmentation methods that have been tested on the BraTS2018-2020 benchmarks are used for comparison,

which include 2D or 3D CNN-based segmentation methods [3, 13, 14, 27, 28, 47–50], Transformer-based

segmentation methods [10, 31] and the methods that focus on multimodal feature fusion [17, 18, 51]. A

brief description of these methods is given in Table 2. Since the source code of many existing brain tumor305

segmentation methods was not released, and to avoid the bias introduced in model re-training, we directly

refer to related publications to obtain the evaluation results of the corresponding methods, which is a

commonly used manner in the study of brain tumor segmentation. The evaluation results of diﬀerent

methods on BraTS2018, BraTS2019 and BraTS2020 benchmarks are listed in Table 3, Table 4 and Table

5, respectively. The best-performed values are indicated in bold. The corresponding results are visualized310

for better comparison in Fig. 6 and Fig. 7, which illustrate the performance of diﬀerent brain tumor

segmentation methods on two metrics Dice and HD, respectively. The best-performed method in each case

is marked by a star on the corresponding bar.

According to the results reported in the above Tables and Figures, the proposed method achieves more

competitive performance when compared with other methods. Speciﬁcally, regarding the average Dice score,315

the proposed method achieves 86.71%, 88.22% and 87.95% on the BraTS2018-2020 benchmarks, which out-

performs other reference methods by 0.58% to 7.81%, 0.42% to 8.95%, and 2.89% to 4.93%, respectively.

Compared with TransBTS, which jointly uses Transformer and U-Net for semantic segmentation, the pro-

posed method achieves better results in all cases with clear advantages, and the improvement for the tumor

core is most signiﬁcant. Additionally, compared with the latest RFNet method that considers multimodal320

feature fusion, the proposed method achieves obvious improvement for both tumor core and enhancing tumor

regions.

Fig. 8 shows the visual eﬀect comparison of the brain tumor segmentation results obtained by diﬀerent

Figure 6: Performance comparison of diﬀerent brain tumor segmentation methods on the metric Dice. The best-performed

method in each case is marked by a star.

Figure 7: Performance comparison of diﬀerent brain tumor segmentation methods on the metric HD. The best-performed

method in each case is marked by a star.

Input Image U-Net++ GT

Proposed

U-Net TransUNet

CENET

Figure 8: Visual eﬀect comparison of brain tumor segmentation results obtained by diﬀerent methods. The green, yellow and

red indicate ED, ET and NCR/NET regions, respectively.

Input Image U-Net U-Net++ TransUNet Proposed GT

WT_HD: 1.414WT_HD: 2.236WT_HD: 2.828WT_HD: 3.317

WT_HD: 1.414

WT_HD: 3.162WT_HD: 2.000WT_HD: 2.828

WT_HD: 2.828 WT_HD: 2.646 WT_HD: 3.000 WT_HD: 1.414

Figure 9: Performance comparison of diﬀerent segmentation methods in terms of tumor boundary accuracy.

Table 2: A brief description of the methods used for performance comparison in our experiments.

Method Brief description

Myronenko[3] Proposes a U-Net-based segmentation framework by adding a VAE branch to reg-

ularize feature learning.

NoNewNet[13] Designs an improved U-Net architecture for segmentation.

Attention Unet[27] Proposes an attention mechanism based on U-Net for segmentation.

U-Net++[28] Adds a series of dense skip connections to the U-Net for segmentation.

N3D[14] Proposes a 3D U-Net for brain tumor segmentation.

Z. Jiang[49] Proposes an end-to-end cascading U-Net architecture for segmentation.

CENET[47] Designs a context extractor to generate more advanced semantic feature maps.

HNF-Net[50] Proposes a 3D high-resolution and non-local feature network for segmentation.

T. Zhou[18] Designs four independent encoding paths to extract features from four modalities

and then fuse them.

D. Zhang[17] Proposes a task-structured brain tumor segmentation network by considering mul-

timodal fusion.

TransBTS[10] Proposes an encoder-decoder structure consisting of Transformer and U-Net for

segmentation.

RFNet[51] Proposes a region-aware fusion network that exploits diﬀerent combinations of mul-

timodal data.

TransUnet[31] Proposes a universal segmentation framework by combining Transformer and U-

Net.

Point-UNet[48] Designs a U-Net to perform a ﬁne-class segmentation of the input point cloud.

Table 3: Objective evaluation results of diﬀerent brain tumor segmentation methods on the BraTS2018 benchmark.

WT TC ET Average

Dice HD Dice HD Dice HD Dice HD

Myronenko[3] 90.40 4.483 85.90 8.278 81.40 3.805 85.90 5.500

NoNewNet[13] 90.80 4.790 84.32 8.160 79.59 3.120 84.90 5.357

U-Net++[28] 88.96 5.327 84.65 8.535 79.49 4.285 84.36 6.049

CENET[47] 89.53 5.271 84.31 8.493 79.95 4.379 84.60 6.193

D. Zhang[17] 89.60 5.733 82.40 9.270 78.20 3.567 83.40 6.190

TransUnet[31] 90.25 4.390 87.19 5.539 80.41 3.731 85.95 4.553

Point-UNet[48] 90.55 - 87.09 - 80.76 - 86.13 6.010

Proposed 90.89 3.923 87.96 5.217 81.94 3.440 86.93 4.193

methods. By referring to the ground truth (GT), the proposed method achieves more accurate segmentation

results and especially for the tumor edges than other methods, which demonstrates the eﬀectiveness of the325

edge features extracted for segmentation.

Fig. 9 illustrates an example to compare the performance of diﬀerent segmentation methods in terms of

tumor boundary accuracy. As mentioned above, the metric HD is more sensitive to the boundary shape, so

the corresponding HD scores of whole tumor are provided as well. Among all the methods, the proposed

one obtains the best results in terms of both the HD scores and the visual eﬀect. These results further show330

Table 4: Objective evaluation results of diﬀerent brain tumor segmentation methods on the BraTS2019 benchmark.

WT TC ET Average

Dice HD Dice HD Dice HD Dice HD

Attention Unet[27] 88.81 7.756 77.20 8.258 75.96 5.202 80.66 7.072

U-Net++[28] 89.67 6.345 87.13 5.521 80.25 3.313 85.68 5.060

Z. Jiang[49] 90.94 4.263 86.47 5.439 80.21 3.146 85.87 4.283

N3D[14] 91.60 6.547 88.80 6.219 83.00 3.543 87.80 5.436

HNF-Net[50] 91.11 4.136 86.40 5.250 80.96 3.490 86.16 4.292

T. Zhou[18] 89.70 6.700 77.50 9.300 70.60 7.400 79.27 7.800

TransBTS[10] 90.00 5.644 81.94 6.049 78.93 3.736 83.62 5.143

Proposed 91.58 3.866 89.24 5.118 83.84 3.080 88.22 4.021

Table 5: Objective evaluation results of diﬀerent brain tumor segmentation methods on the BraTS2020 benchmark.

BraTS2020 WT TC ET Average

Dice HD Dice HD Dice HD Dice HD

U-Net++[28] 89.77 6.299 85.57 5.483 79.83 4.328 85.06 5.370

Point-UNet[48] 89.67 - 82.97 - 76.43 - 83.02 8.260

TransBTS[10] 90.09 4.964 81.73 9.769 78.73 17.947 83.52 10.893

RFNet[51] 91.11 - 85.21 - 78.00 - 84.77 -

Proposed 91.03 4.719 88.22 5.985 84.61 3.051 87.95 4.585

that edge features can beneﬁt the brain tumor segmentation task.

4.4. Ablation Study

To further verify the eﬀectiveness of the main components including the shifted patch tokenization

strategy, the edge detection module and the MFIB in the feature fusion module that are designed in our

method, a ablation study is conducted in this subsection.335

Table 6: Objective evaluation results for the ablation study on the BraTS2018 benchmark.

WT TC ET Average

Dice HD Dice HD Dice HD Dice HD

SwinTrans 88.97 6.276 85.72 6.563 80.31 4.364 85.00 5.734

SwinTrans+SPD 89.00 5.720 85.95 6.453 80.62 4.338 85.19 5.504

SwinTrans+SPD+ED 89.93 4.259 87.24 5.398 81.19 3.728 85.95 4.462

Proposed 90.89 3.923 87.96 5.217 81.94 3.440 86.93 4.193

In this experiment, we use the standard Swin Transformer [11] as the baseline model, and then adds

diﬀerent components one by one to validate their eﬀect. Speciﬁcally, we mainly compare the performance

of the following four models:

SwinTrans SwinTrans+SPD SwinTrans+SPD+ED

Input Image

Completed Model

Groud Truth

Figure 10: Visual eﬀect comparison of segmentation results obtained by diﬀerent models in the ablation study.

-SwinTrans: Just using the Swin Transformer for brain tumor segmentation via a well pre-trained

model. This is the baseline model.340

-SwinTrans+SPD: Introducing the shifted patch tokenization strategy into the Swin Transformer for

segmentation without using pre-training.

-SwinTrans+SPD+ED: Further adding the edge detection module based on the above model. This

model has similar framework to the proposed one, but it just simply concatenate the semantic and

edge features for fusion, instead of using the proposed MFIB.345

-Completed Model: The complete model (i.e., SwinTrans+SPD+ED+MFIB) proposed in this paper.

Therefore, the comparison between SwinTrans and SwinTrans+SPD is used to demonstrate the eﬀective-

ness of the shifted patch tokenization strategy adopted in the Swin Transformer. The comparison between

SwinTrans+SPD and SwinTrans+SPD+ED can validate the eﬀect of the edge detection module (please

kindly note that the ED module cannot be individually used without the semantic segmentation module).350

The comparison between SwinTrans+SP+ED and Completed Model is used to show the eﬀectiveness of the

designed MFIB for feature fusion.

Table 6 lists the objective performance of diﬀerent models. We can see that the the each of the above

components leads to some improvements of the segmentation results. Among them, the eﬀect of adding

edge detection module and using MFIB for feature fusion is more obvious.355

The visual eﬀect comparison of segmentation results obtained by diﬀerent models in the ablation study

is shown in Fig. 10. Some interesting observations include: 1) After adding the shifted patch, partial

unnecessary disturbance noise is eliminated. 2) After adding the edge detection module, the segmentation

accuracy of tumor edges is obviously higher when compared with the GT. 3) After adding the MFIB for

feature fusion, the tumor edges are visually more natural than the simple concatenating manner.360

5. Conclusion

This paper proposes a novel deep learning-based brain tumor segmentation method by jointly utilizing

deep semantics and edge information in multimodal MRI. To achieve this target, three functional modules are

designed. Speciﬁcally, we present a semantic segmentation module based on an improved Swin Transformer

by introducing the shifted patch tokenization strategy for better training. In addition, a CNN-based edge365

detection module is designed to extract edge features from the input MRI scans. Finally, we present a

feature fusion module by designing a multi-feature inference block based on graph convolution to fuse the

deep semantic edges and speciﬁc edge features. Experimental results demonstrate the eﬀectiveness of the

key components designed in our method. Moreover, the proposed method achieves better performance

when compared with some state-of-the-art methods on the BraTS benchmarks. Future work may focus370

on exploring the feasibility of some other speciﬁc features for brain tumor segmentation and extending the

proposed approach to other semantic segmentation problems.

References

[1] F. Piccialli, V. Di Somma, F. Giampaolo, S. Cuomo, G. Fortino, A survey on deep learning in medicine: Why, how and

when?, Information Fusion 66 (2021) 111–137.375

[2] S. Mo, M. Cai, L. Lin, R. Tong, Q. Chen, F. Wang, H. Hu, Y. Iwamoto, X.-H. Han, Y.-W. Chen, Multimodal priors

guided segmentation of liver lesions in mri using mutual information based graph co-attention networks, in: International

Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2020, pp. 429–438.

[3] A. Myronenko, 3d mri brain tumor segmentation using autoencoder regularization, in: International MICCAI Brainlesion

Workshop, Springer, 2018, pp. 311–320.380

[4] A. Barredo Arrieta, N. D ˜

Aaz-Rodr ˜

Aguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado, S. Garcia, S. Gil-Lopez,

D. Molina, R. Benjamins, R. Chatila, F. Herrera, Explainable artiﬁcial intelligence (xai): Concepts, taxonomies, oppor-

tunities and challenges toward responsible ai, Information Fusion 58 (2020) 82–115.

[5] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in: Proceedings of the IEEE

conference on computer vision and pattern recognition, 2015, pp. 3431–3440.385

[6] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: International

Conference on Medical image computing and computer-assisted intervention, Springer, 2015, pp. 234–241.

[7] F. Milletari, N. Navab, S.-A. Ahmadi, V-net: Fully convolutional neural networks for volumetric medical image segmen-

tation, in: 2016 fourth international conference on 3D vision (3DV), IEEE, 2016, pp. 565–571.

[8] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin, Attention is all you390

need, Advances in neural information processing systems 30.

[9] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer,

G. Heigold, S. Gelly, et al., An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint

arXiv:2010.11929.

[10] W. Wang, C. Chen, M. Ding, H. Yu, S. Zha, J. Li, Transbts: Multimodal brain tumor segmentation using transformer, in:395

International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2021, pp. 109–119.

[11] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin transformer: Hierarchical vision transformer using

shifted windows, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 10012–10022.

[12] S. H. Lee, S. Lee, B. C. Song, Vision transformer for small-size datasets, arXiv preprint arXiv:2112.13492.

[13] F. Isensee, P. Kickingereder, W. Wick, M. Bendszus, K. H. Maier-Hein, No new-net, in: International MICCAI Brainlesion400

Workshop, Springer, 2018, pp. 234–244.

[14] F. Wang, R. Jiang, L. Zheng, C. Meng, B. Biswal, 3d u-net based brain tumor segmentation and survival days prediction,

in: International MICCAI Brainlesion Workshop, Springer, 2019, pp. 131–141.

[15] T. Cheng, X. Wang, L. Huang, W. Liu, Boundary-preserving mask r-cnn, in: European conference on computer vision,

Springer, 2020, pp. 660–676.405

[16] D. Acuna, A. Kar, S. Fidler, Devil is in the edges: Learning semantic boundaries from noisy annotations, in: Proceedings

of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 11075–11083.

[17] D. Zhang, G. Huang, Q. Zhang, J. Han, J. Han, Y. Wang, Y. Yu, Exploring task structure for brain tumor segmentation

from multi-modality mr images, IEEE Transactions on Image Processing 29 (2020) 9032–9043.

[18] T. Zhou, S. Ruan, Y. Guo, S. Canu, A multi-modality fusion network based on attention mechanism for brain tumor410

segmentation, in: 2020 IEEE 17th international symposium on biomedical imaging (ISBI), IEEE, 2020, pp. 377–380.

[19] M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. Courville, Y. Bengio, C. Pal, P.-M. Jodoin, H. Larochelle, Brain tumor

segmentation with deep neural networks, Medical image analysis 35 (2017) 18–31.

[20] B. H. Menze, K. v. Leemput, D. Lashkari, M.-A. Weber, N. Ayache, P. Golland, A generative model for brain tumor

segmentation in multi-modal images, in: International Conference on Medical Image Computing and Computer-Assisted415

Intervention, Springer, 2010, pp. 151–159.

[21] M. P. Heinrich, O. Maier, H. Handels, Multi-modal multi-atlas segmentation using discrete optimisation and self-

similarities., VISCERAL Challenge@ ISBI 1390 (2015) 27.

[22] M. Goetz, C. Weber, J. Bloecher, B. Stieltjes, H.-P. Meinzer, K. Maier-Hein, Extremely randomized trees based brain

tumor segmentation, Proceeding of BRATS challenge-MICCAI (2014) 006–011.420

[23] N. K. Subbanna, D. Precup, D. L. Collins, T. Arbel, Hierarchical probabilistic gabor and mrf segmentation of brain

tumours in mri volumes, in: International conference on medical image computing and computer-assisted intervention,

Springer, 2013, pp. 751–758.

[24] D. Zikic, B. Glocker, E. Konukoglu, A. Criminisi, C. Demiralp, J. Shotton, O. M. Thomas, T. Das, R. Jena, S. J. Price,

Decision forests for tissue-speciﬁc segmentation of high-grade gliomas in multi-channel mr, in: International Conference425

on Medical Image Computing and Computer-Assisted Intervention, Springer, 2012, pp. 369–376.

[25] W. Wu, A. Y. Chen, L. Zhao, J. J. Corso, Brain tumor detection and segmentation in a crf (conditional random ﬁelds)

framework with pixel-pairwise aﬃnity and superpixel-level features, International journal of computer assisted radiology

and surgery 9 (2) (2014) 241–253.

[26] K. Kamnitsas, C. Ledig, V. F. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, B. Glocker, Eﬃcient430

multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation, Medical image analysis 36 (2017) 61–78.

[27] O. Oktay, J. Schlemper, L. L. Folgoc, M. Lee, M. Heinrich, K. Misawa, K. Mori, S. McDonagh, N. Y. Hammerla, B. Kainz,

et al., Attention u-net: Learning where to look for the pancreas, arXiv preprint arXiv:1804.03999.

[28] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, J. Liang, Unet++: Redesigning skip connections to exploit multiscale features

in image segmentation, IEEE transactions on medical imaging 39 (6) (2019) 1856–1867.435

[29] Y. Liu, F. Mu, Y. Shi, X. Chen, Sf-net: A multi-task model for brain tumor segmentation in multimodal mri via image

fusion, IEEE Signal Processing Letters 29 (2022) 1799–1803.

[30] F. Isensee, P. F. Jaeger, S. A. Kohl, J. Petersen, K. H. Maier-Hein, nnu-net: a self-conﬁguring method for deep learning-

based biomedical image segmentation, Nature methods 18 (2) (2021) 203–211.

[31] J. Chen, Y. Lu, Q. Yu, X. Luo, E. Adeli, Y. Wang, L. Lu, A. L. Yuille, Y. Zhou, Transunet: Transformers make strong440

encoders for medical image segmentation, arXiv preprint arXiv:2102.04306.

[32] J. M. J. Valanarasu, P. Oza, I. Hacihaliloglu, V. M. Patel, Medical transformer: Gated axial-attention for medical image

segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer,

2021, pp. 36–46.

[33] S. Pereira, A. Pinto, V. Alves, C. A. Silva, Brain tumor segmentation using convolutional neural networks in mri images,445

IEEE transactions on medical imaging 35 (5) (2016) 1240–1251.

[34] G. Wang, W. Li, S. Ourselin, T. Vercauteren, Automatic brain tumor segmentation using cascaded anisotropic convolu-

tional neural networks, in: International MICCAI brainlesion workshop, Springer, 2017, pp. 178–190.

[35] J. Dolz, K. Gopinath, J. Yuan, H. Lombaert, C. Desrosiers, I. B. Ayed, Hyperdense-net: a hyper-densely connected cnn

for multi-modal image segmentation, IEEE transactions on medical imaging 38 (5) (2018) 1116–1126.450

[36] Y. Liu, F. Mu, Y. Shi, J. Cheng, C. Li, X. Chen, Brain tumor segmentation in multimodal mri via pixel-level and

feature-level image fusion, Frontiers in Neuroscience 16 (2022) 1000587.

[37] Y. Zhang, J. Yang, J. Tian, Z. Shi, C. Zhong, Y. Zhang, Z. He, Modality-aware mutual learning for multi-modal medical

image segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention,

Springer, 2021, pp. 589–599.455

[38] Y. Liu, M.-M. Cheng, X. Hu, K. Wang, X. Bai, Richer convolutional features for edge detection, in: Proceedings of the

IEEE conference on computer vision and pattern recognition, 2017, pp. 3000–3009.

[39] X. Chen, C. Dong, J. Ji, J. Cao, X. Li, Image manipulation detection by multi-view multi-scale supervision, in: Proceedings

of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 14185–14193.

[40] L. Jiao, J. Chen, F. Liu, S. Yang, C. You, X. Liu, L. Li, B. Hou, Graph representation learning meets computer vision: A460

survey, IEEE Transactions on Artiﬁcial Intelligence.

[41] Y. Chen, M. Rohrbach, Z. Yan, Y. Shuicheng, J. Feng, Y. Kalantidis, Graph-based global reasoning networks, in: Pro-

ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 433–442.

[42] T. N. Kipf, M. Welling, Semi-supervised classiﬁcation with graph convolutional networks, arXiv preprint arXiv:1609.02907.

[43] B. H. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Farahani, J. Kirby, Y. Burren, N. Porz, J. Slotboom, R. Wiest,465

et al., The multimodal brain tumor image segmentation benchmark (brats), IEEE transactions on medical imaging 34 (10)

(2014) 1993–2024.

[44] S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. S. Kirby, J. B. Freymann, K. Farahani, C. Davatzikos, Advancing

the cancer genome atlas glioma mri collections with expert segmentation labels and radiomic features, Scientiﬁc data 4 (1)

(2017) 1–13.470

[45] S. Bakas, M. Reyes, A. Jakab, S. Bauer, M. Rempﬂer, A. Crimi, R. T. Shinohara, C. Berger, S. M. Ha, M. Rozycki,

et al., Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall

survival prediction in the brats challenge, arXiv preprint arXiv:1811.02629.

[46] D. P. Kingma, J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980.

[47] Z. Gu, J. Cheng, H. Fu, K. Zhou, H. Hao, Y. Zhao, T. Zhang, S. Gao, J. Liu, Ce-net: Context encoder network for 2d475

medical image segmentation, IEEE transactions on medical imaging 38 (10) (2019) 2281–2292.

[48] N.-V. Ho, T. Nguyen, G.-H. Diep, N. Le, B.-S. Hua, Point-unet: A context-aware point-based neural network for volumetric

segmentation, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer,

2021, pp. 644–655.

[49] Z. Jiang, C. Ding, M. Liu, D. Tao, Two-stage cascaded u-net: 1st place solution to brats challenge 2019 segmentation480

task, in: International MICCAI brainlesion workshop, Springer, 2019, pp. 231–241.

[50] H. Jia, Y. Xia, W. Cai, H. Huang, Learning high-resolution and eﬃcient non-local features for brain glioma segmentation

in mr images, in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer,

2020, pp. 480–490.

[51] Y. Ding, X. Yu, Y. Yang, Rfnet: Region-aware fusion network for incomplete multi-modal brain tumor segmentation, in:485

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 3975–3984.

Source Code of "Brain tumor segmentation based on the fusion of deep semantics and edge information in multimodal MRI" Verson with both training and testing codes

Data

June 2024

Zhi-Qin Zhu · Xianyu He · Guanqiu Qi · Yuanyuan Li · Yu Liu

Download

code.zip

Data

November 2022

Zhi-Qin Zhu · Xianyu He · Guanqiu Qi · Yuanyuan Li · Yu Liu

Download

DenUnet: enhancing dental image segmentation through edge and body fusion

Article

Full-text available

Jun 2024
MULTIMED TOOLS APPL

Accurate tooth segmentation is of paramount importance in oral healthcare because it provides critical positional data for clinical diagnosis, orthodontic treatment, and surgical procedures. Despite the widespread use of convolutional neural networks (CNNs) in image segmentation, its limitations in collecting complete global context and long-range interactions are acknowledged. Although vision transformers show promise in understanding larger contextual information, they struggle to manage the spatial complexities of medical images. To tackle these issues, the proposed DenUnet leverages the strengths of both CNNs and transformers. It introduces a dual-branch encoder that simultaneously extracts edge and body information and a double-level fusion module for integrating multi-scale features. To ensure the fine fusing of edge and body information derived from the two mentioned encoders, we propose a local cross-attention feature fusion module to enhance feature fusion with accurate blending losses. Experimental results underscore the superior efficacy of DenUnet in comparison to existing methods. We achieved 95.4% accuracy and 92.7% F1-score on the DNS dataset, which is particularly evident in its ability to adeptly handle irregular boundaries of dental datasets. Code is available at https://github.com/Omid-Nejati/DenUnet

Feature ensemble network for medical image segmentation with multi‐scale atrous transformer

Article

Full-text available

Jun 2024
IET IMAGE PROCESS

Recent years have witnessed notable advancements in medical image segmentation through deep convolutional neural networks. However, a notable limitation lies in the local operation of convolution, which hinders the ability to fully exploit global semantic information. To overcome the challenges prevalent in medical image segmentation, the feature ensemble network with multi‐scale atrous transformer is proposed. At the core of the approach lies the multi‐scale contextual integration module, which is based on the multi‐scale atrous transformer and facilitates contextual integration of multi‐level features. To extract discriminative fine‐grained features of the target region, a hybrid attention mechanism that synergistically combines spatial and channel attention, thereby sharpening the model's focus on crucial target information within high‐level features, is incorporated. Additionally, the channel‐aware feature reconstruction module is introduced as an innovative component engineered to tackle feature similarity issues across different categories. This module performs feature reconstruction based on channel perception, effectively widening the feature gap between categories and enhancing the segmentation capability. It is worth mentioning that our approach surpasses the state‐of‐the‐art method using three benchmark datasets in medical image segmentation.

nnSegNeXt: A 3D Convolutional Network for Brain Tissue Segmentation Based on Quality Evaluation

Article

Full-text available

Jun 2024

Accurate and automated segmentation of brain tissue images can significantly streamline clinical diagnosis and analysis. Manual delineation needs improvement due to its laborious and repetitive nature, while automated techniques encounter challenges stemming from disparities in magnetic resonance imaging (MRI) acquisition equipment and accurate labeling. Existing software packages, such as FSL and FreeSurfer, do not fully replace ground truth segmentation, highlighting the need for an efficient segmentation tool. To better capture the essence of cerebral tissue, we introduce nnSegNeXt, an innovative segmentation architecture built upon the foundations of quality assessment. This pioneering framework effectively addresses the challenges posed by missing and inaccurate annotations. To enhance the model’s discriminative capacity, we integrate a 3D convolutional attention mechanism instead of conventional convolutional blocks, enabling simultaneous encoding of contextual information through the incorporation of multiscale convolutional features. Our methodology was evaluated on four multi-site T1-weighted MRI datasets from diverse sources, magnetic field strengths, scanning parameters, temporal instances, and neuropsychiatric conditions. Empirical evaluations on the HCP, SALD, and IXI datasets reveal that nnSegNeXt surpasses the esteemed nnUNet, achieving Dice coefficients of 0.992, 0.987, and 0.989, respectively, and demonstrating superior generalizability across four distinct projects with Dice coefficients ranging from 0.967 to 0.983. Additionally, extensive ablation studies have been implemented to corroborate the effectiveness of the proposed model. These findings represent a notable advancement in brain tissue analysis, suggesting that nnSegNeXt holds the promise to significantly refine clinical workflows.

Application of a digital twin for highway tunnels based on multi-sensor and information fusion

Article

Full-text available

Jun 2024

Due to the harsh environment of highway tunnels and frequent breakdowns of various detection sensors and surveillance devices, the operational management of highway tunnels lacks effective data support. This paper analyzes the characteristics of operational surveillance data in highway tunnels. It proposes a multimodal information fusion method based on CNN-LSTM-attention and designs and develops a digital twin for highway tunnel operations. The system addresses issues such as insufficient development and coordination of the technical architecture of operation control systems, weak information service capabilities, and insufficient data application capabilities. The system also lacks intelligent decision-making and control capabilities. The developed system achieves closed-loop management of "accurate perception-risk assessment-decision warning-emergency management" for highway tunnel operations based on data-driven approaches. The engineering demonstration application underscores the system's capacity to enhance tunnel traffic safety, diminish tunnel management costs, and elevate tunnel driving comfort.

Hybrid Multihead Attentive Unet-3D for Brain Tumor Segmentation

Preprint

Full-text available

May 2024

Brain tumor segmentation is a critical task in medical image analysis, aiding in the diagnosis and treatment planning of brain tumor patients. The importance of automated and accurate brain tumor segmentation cannot be overstated. It enables medical professionals to precisely delineate tumor regions, assess tumor growth or regression, and plan targeted treatments. Various deep learning-based techniques proposed in the literature have made significant progress in this field, however, they still face limitations in terms of accuracy due to the complex and variable nature of brain tumor morphology. In this research paper, we propose a novel Hybrid Multihead Attentive U-Net architecture, to address the challenges in accurate brain tumor segmentation, and to capture complex spatial relationships and subtle tumor boundaries. The U-Net architecture has proven effective in capturing contextual information and feature representations, while attention mechanisms enhance the model's ability to focus on informative regions and refine the segmentation boundaries. By integrating these two components, our proposed architecture improves accuracy in brain tumor segmentation. We test our proposed model on the BraTS 2020 benchmark dataset and compare its performance with the state-of-the-art well-known SegNet, FCN-8s, and Dense121 U-Net architectures. The results show that our proposed model outperforms the others in terms of the evaluated performance metrics.

GAIR-U-Net: 3D guided attention inception residual u-net for brain tumor segmentation using multimodal MRI images

Article

Jun 2024

Unveiling the Essentials: Feature Selection and Extraction for Graph Neural Networks

Article

Full-text available

Jun 2024

Louis Frank

Graph neural networks (GNNs) have emerged as a powerful tool for learning on graph-structured data, with applications spanning fields such as social network analysis, drug discovery, and recommendation systems. A crucial step in the effective deployment of GNNs is the careful selection and extraction of relevant features from the input graph data. This paper provides a comprehensive review of the state-of-the-art techniques for feature selection and extraction in the context of GNNs. We first discuss the importance of feature engineering in the success of GNNs, highlighting how the choice of node, edge, and graph-level features can significantly impact model performance. We then survey a range of feature selection methods, including filter-based, wrapper-based, and embedded approaches, and examine their suitability for different GNN architectures and problem domains. Additionally, we explore feature extraction techniques that leverage the inherent structure of graph data, such as graph decomposition, subgraph mining, and graph representation learning. The advantages and limitations of these approaches are discussed, with a focus on their ability to capture meaningful topological and attributional information. Finally, we outline promising directions for future research, including the integration of feature selection and extraction within end-to-end GNN training, the development of unsupervised feature learning methods, and the exploration of transferable feature representations across different graph domains. This review aims to serve as a valuable resource for researchers and practitioners interested in advancing the state-of-the-art in graph-based machine learning. # 1. Introduction ## Overview of Graph Neural Networks (GNNs) Graph Neural Networks (GNNs) are a powerful class of deep learning models designed to operate on graph-structured data. Compared to traditional machine learning techniques, GNNs are uniquely suited to capture the complex relationships and interdependencies present in graph-based data structures.

MultiTrans: Multi-branch transformer network for medical image segmentation

Article

Jun 2024
COMPUT METH PROG BIO

SegLD: Achieving universal, zero-shot and open-vocabulary segmentation through multimodal fusion via latent diffusion processes

Article

Jun 2024
INFORM FUSION

Dual Stream Long Short-Term Memory Feature Fusion Classifier for Surface Electromyography Gesture Recognition

Article

Full-text available

Jun 2024
SENSORS-BASEL

Gesture recognition using electromyography (EMG) signals has prevailed recently in the field of human–computer interactions for controlling intelligent prosthetics. Currently, machine learning and deep learning are the two most commonly employed methods for classifying hand gestures. Despite traditional machine learning methods already achieving impressive performance, it is still a huge amount of work to carry out feature extraction manually. The existing deep learning methods utilize complex neural network architectures to achieve higher accuracy, which will suffer from overfitting, insufficient adaptability, and low recognition accuracy. To improve the existing phenomenon, a novel lightweight model named dual stream LSTM feature fusion classifier is proposed based on the concatenation of five time-domain features of EMG signals and raw data, which are both processed with one-dimensional convolutional neural networks and LSTM layers to carry out the classification. The proposed method can effectively capture global features of EMG signals using a simple architecture, which means less computational cost. An experiment is conducted on a public DB1 dataset with 52 gestures, and each of the 27 subjects repeats every gesture 10 times. The accuracy rate achieved by the model is 89.66%, which is comparable to that achieved by more complex deep learning neural networks, and the inference time for each gesture is 87.6 ms, which can also be implied in a real-time control system. The proposed model is validated using a subject-wise experiment on 10 out of the 40 subjects in the DB2 dataset, achieving a mean accuracy of 91.74%. This is illustrated by its ability to fuse time-domain features and raw data to extract more effective information from the sEMG signal and select an appropriate, efficient, lightweight network to enhance the recognition results.

Brain tumor segmentation in multimodal MRI via pixel-level and feature-level image fusion

Article

Full-text available

Sep 2022

Brain tumor segmentation in multimodal MRI volumes is of great significance to disease diagnosis, treatment planning, survival prediction and other relevant tasks. However, most existing brain tumor segmentation methods fail to make sufficient use of multimodal information. The most common way is to simply stack the original multimodal images or their low-level features as the model input, and many methods treat each modality data with equal importance to a given segmentation target. In this paper, we introduce multimodal image fusion technique including both pixel-level fusion and feature-level fusion for brain tumor segmentation, aiming to achieve more sufficient and finer utilization of multimodal information. At the pixel level, we present a convolutional network named PIF-Net for 3D MR image fusion to enrich the input modalities of the segmentation model. The fused modalities can strengthen the association among different types of pathological information captured by multiple source modalities, leading to a modality enhancement effect. At the feature level, we design an attention-based modality selection feature fusion (MSFF) module for multimodal feature refinement to address the difference among multiple modalities for a given segmentation target. A two-stage brain tumor segmentation framework is accordingly proposed based on the above components and the popular V-Net model. Experiments are conducted on the BraTS 2019 and BraTS 2020 benchmarks. The results demonstrate that the proposed components on both pixel-level and feature-level fusion can effectively improve the segmentation accuracy of brain tumors.

Pixel Difference Networks for Efficient Edge Detection

Conference Paper

Full-text available

Oct 2021

Point-Unet: A Context-Aware Point-Based Neural Network for Volumetric Segmentation

Chapter

Full-text available

Sep 2021

Medical image analysis using deep learning has recently been prevalent, showing great performance for various downstream tasks including medical image segmentation and its sibling, volumetric image segmentation. Particularly, a typical volumetric segmentation network strongly relies on a voxel grid representation which treats volumetric data as a stack of individual voxel ‘slices’, which allows learning to segment a voxel grid to be as straightforward as extending existing image-based segmentation networks to the 3D domain. However, using a voxel grid representation requires a large memory footprint, expensive test-time and limiting the scalability of the solutions. In this paper, we propose Point-Unet, a novel method that incorporates the efficiency of deep learning with 3D point clouds into volumetric segmentation. Our key idea is to first predict the regions of interest in the volume by learning an attentional probability map, which is then used for sampling the volume into a sparse point cloud that is subsequently segmented using a point-based neural network. We have conducted the experiments on the medical volumetric segmentation task with both a small-scale dataset Pancreas and large-scale datasets BraTS18, BraTS19, and BraTS20 challenges. A comprehensive benchmark on different metrics has shown that our context-aware Point-Unet robustly outperforms the SOTA voxel-based networks at both accuracies, memory usage during training, and time consumption during testing. Our code is available at https://github.com/VinAIResearch/Point-Unet.

U-Net: Convolutional Networks for Biomedical Image Segmentation

Book

Jan 2015

SF-Net: A Multi-task Model for Brain Tumor Segmentation in Multimodal MRI via Image Fusion

Article

Jan 2022

Automatic segmentation of brain tumor regions from multimodal MRI scans is of great clinical significance. In this letter, we propose a “Segmentation-Fusion” multi-task model named SF-Net for brain tumor segmentation. In comparison to the widely-used multi-task model that adds a variational autoencoder (VAE) decoder to reconstruct the input data, using image fusion as an additional regularization for feature learning helps to achieve more sufficient fusion of multimodal features, which is beneficial to the multimodal image segmentation problem. To further improve the performance of the multi-task model, an uncertainty-based approach that can adaptively adjust the loss weights of different tasks during the training process is introduced for model training. Experimental results on the BraTS 2020 benchmark demonstrate that the proposed method can achieve higher segmentation accuracy than the VAE-based approach. In addition, as the by-product of the multi-task model, the image fusion results obtained are of high quality on the brain tumor regions. The source code of the proposed method is available at https://github.com/yuliu316316/SF-Net .

Graph Representation Learning Meets Computer Vision: A Survey

Article

Jan 2022

Graph structure is a powerful mathematical abstraction, which can not only represent information about individuals, but also capture the interactions between individuals for reasoning. Geometric modeling and relational inference based on graph data is a long-standing topic of interest in the computer vision community. In this paper, we provide a systematic review of graph representation learning and its applications in computer vision. First, we sort out the evolution of representation learning on graphs, categorizing them into the non-neural network and neural network methods based on the way the nodes are encoded. Specifically, non-neural network methods, such as graph embedding and probabilistic graphical models, are introduced, and neural network methods, such as Graph Recurrent Neural Networks (Graph RNN), Graph Convolutional Networks (GCN) and Variants of Graph Neural Networks are also respectively presented. Then, we organize the applications of graph representation algorithms in various vision tasks (such as image classification, semantic segmentation, object detection and and tracking, etc.) for review and reference, and the typical graph construction approaches in computer vision are also summarized. Finally, on the background of biology and brain inspiration, we discuss the existing challenges and future directions of graph representation learning and computer vision.

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Conference Paper

Oct 2021

RFNet: Region-aware Fusion Network for Incomplete Multi-modal Brain Tumor Segmentation

Conference Paper

Oct 2021

Modality-Aware Mutual Learning for Multi-modal Medical Image Segmentation

Chapter

Sep 2021

Liver cancer is one of the most common cancers worldwide. Due to inconspicuous texture changes of liver tumor, contrast-enhanced computed tomography (CT) imaging is effective for the diagnosis of liver cancer. In this paper, we focus on improving automated liver tumor segmentation by integrating multi-modal CT images. To this end, we propose a novel mutual learning (ML) strategy for effective and robust multi-modal liver tumor segmentation. Different from existing multi-modal methods that fuse information from different modalities by a single model, with ML, an ensemble of modality-specific models learn collaboratively and teach each other to distill both the characteristics and the commonality between high-level representations of different modalities. The proposed ML not only enables the superiority for multi-modal learning but can also handle missing modalities by transferring knowledge from existing modalities to missing ones. Additionally, we present a modality-aware (MA) module, where the modality-specific models are interconnected and calibrated with attention weights for adaptive information exchange. The proposed modality-aware mutual learning (MAML) method achieves promising results for liver tumor segmentation on a large-scale clinical dataset. Moreover, we show the efficacy and robustness of MAML for handling missing modalities on both the liver tumor and public brain tumor (BRATS 2018) datasets. Our code is available at https://github.com/YaoZhang93/MAML.

Medical Transformer: Gated Axial-Attention for Medical Image Segmentation

Chapter

Sep 2021

Over the past decade, deep convolutional neural networks have been widely adopted for medical image segmentation and shown to achieve adequate performance. However, due to inherent inductive biases present in convolutional architectures, they lack understanding of long-range dependencies in the image. Recently proposed transformer-based architectures that leverage self-attention mechanism encode long-range dependencies and learn representations that are highly expressive. This motivates us to explore transformer-based solutions and study the feasibility of using transformer-based network architectures for medical image segmentation tasks. Majority of existing transformer-based network architectures proposed for vision applications require large-scale datasets to train properly. However, compared to the datasets for vision applications, in medical imaging the number of data samples is relatively low, making it difficult to efficiently train transformers for medical imaging applications. To this end, we propose a gated axial-attention model which extends the existing architectures by introducing an additional control mechanism in the self-attention module. Furthermore, to train the model effectively on medical images, we propose a Local-Global training strategy (LoGo) which further improves the performance. Specifically, we operate on the whole image and patches to learn global and local features, respectively. The proposed Medical Transformer (MedT) is evaluated on three different medical image segmentation datasets and it is shown that it achieves better performance than the convolutional and other related transformer-based architectures. Code: https://github.com/jeya-maria-jose/Medical-Transformer

Brain tumor segmentation based on the fusion of deep semantics and edge information in multimodal MRI

Abstract and Figures

Supplementary resources (2)

Recommended publications

SwinBTS: A Method for 3D Multimodal Brain Tumor Segmentation Using Swin Transformer

Brain Tumor Segmentation Based on The Learning Statistical Texture

Sparse Dynamic Volume TransUNet with multi-level edge fusion for brain tumor segmentation

Medical image segmentation method based on multi-feature interaction and fusion over cloud computing

code.zip

Source Code of "Brain tumor segmentation based on the fusion of deep semantics and edge information...