Fig 6 - uploaded by Florent Lalys
Content may be subject to copyright.
Typical digital microscope frames for the 12 surgical phases: 1-preparation, 2-betadine injection, 3-lateral corneal incision, 4principal corneal incision, 5-viscoelastic injection, 6-capsulorhexis, 7-phacoemulsification, 8-cortical aspiration of the big pieces of the lens, 

Typical digital microscope frames for the 12 surgical phases: 1-preparation, 2-betadine injection, 3-lateral corneal incision, 4principal corneal incision, 5-viscoelastic injection, 6-capsulorhexis, 7-phacoemulsification, 8-cortical aspiration of the big pieces of the lens, 

Source publication
Article
Full-text available
The need for a better integration of the new generation of computer-assisted-surgical systems has been recently emphasized. One necessity to achieve this objective is to retrieve data from the operating room (OR) with different sensors, then to derive models from these data. Recently, the use of videos from cameras in the OR has demonstrated its ef...

Context in source publication

Context 1
... the OPMI Lumera surgical microscope (Carl Zeiss) with an initial resolution of 720 x 576 at 25 fps. Considering the goal of recognizing only high-level tasks surgical tasks, each video was down- sampled to 1 fps. Original frames were also spatially down- sampled by a factor 4 with a 5-by-5 Gaussian kernel. Twelve surgical phases were defined (Fig. ...

Citations

... However, the past decade has witnessed a surge in the automated analysis of surgical video data. Surgical video analysis techniques offer surgeons various benefits, such as generating post-operative reports [1,2], evaluating surgical skills for training purposes [3], and creating educational content. Furthermore, real-time video analysis holds promise for intraoperative communication with surgeons, including the development of automated warning or recommendation systems based on the real-time recognition of surgical tasks, steps, or gestures [4][5][6]. ...
Article
Full-text available
Background: Open surgery relies heavily on the surgeon’s visual acuity and spatial awareness to track instruments within a dynamic and often cluttered surgical field. Methods: This system utilizes a head-mounted depth camera to monitor surgical scenes, providing both image data and depth information. The video captured from this camera is scaled down, compressed using MPEG, and transmitted to a high-performance workstation via the RTSP (Real-Time Streaming Protocol), a reliable protocol designed for real-time media transmission. To segment surgical instruments, we utilize the enhanced U-Net with GridMask (EUGNet) for its proven effectiveness in surgical tool segmentation. Results: For rigorous validation, the system’s performance reliability and accuracy are evaluated using prerecorded RGB-D surgical videos. This work demonstrates the potential of this system to improve situational awareness, surgical efficiency, and generate data-driven insights within the operating room. In a simulated surgical environment, the system achieves a high accuracy of 85.5% in identifying and segmenting surgical instruments. Furthermore, the wireless video transmission proves reliable with a latency of 200 ms, suitable for real-time processing. Conclusions: These findings represent a promising step towards the development of assistive technologies with the potential to significantly enhance surgical practice.
... However, the past decade has witnessed a surge in automated analysis of surgical video data. These techniques offer surgeons various benefits, such as generating post-operative reports [1,2], evaluating surgical skills for training purposes [3], or creating educational content. Furthermore, real-time video analysis holds promise for intraoperative communication with surgeons. ...
Preprint
Full-text available
A single paragraph of about 200 words maximum. For research articles, abstracts should give a pertinent overview of the work. We strongly encourage authors to use the following style of structured abstracts, but without headings: (1) Background: Place the question addressed in a broad context and highlight the purpose of the study; (2) Methods: briefly describe the main methods or treatments applied; (3) Results: summarize the article’s main findings; (4) Conclusions: indicate the main conclusions or interpretations. The abstract should be an objective representation of the article and it must not contain results that are not presented and substantiated in the main text and should not exaggerate the main conclusions.
... Similar work has been done on other surgical tasks, especially laparoscopy or eye surgery. The first proposed methods were based on hand-crafted features, [23,24], which can be very efficient but hard to generalize to a wider dataset. More advanced methods using deep learning were also explored. ...
Article
Full-text available
Colorectal cancer is the third most common type of cancer with almost two million new cases worldwide. They develop from neoplastic polyps, most commonly adenomas, which can be removed during colonoscopy to prevent colorectal cancer from occurring. Unfortunately, up to a quarter of polyps are missed during colonoscopies. Studies have shown that polyp detection during a procedure correlates with the time spent searching for polyps, called the withdrawal time. The different phases of the procedure (cleaning, therapeutic, and exploration phases) make it difficult to precisely measure the withdrawal time, which should only include the exploration phase. Separating this from the other phases requires manual time measurement during the procedure which is rarely performed. In this study, we propose a method to automatically detect the cecum, which is the start of the withdrawal phase, and to classify the different phases of the colonoscopy, which allows precise estimation of the final withdrawal time. This is achieved using a Resnet for both detection and classification trained with two public datasets and a private dataset composed of 96 full procedures. Out of 19 testing procedures, 18 have their withdrawal time correctly estimated, with a mean error of 5.52 seconds per minute per procedure.
... Anatomy was described in eight (23%) studies [16,17,[36][37][38][39][40], and specific anatomy described in three [38][39][40]. Anatomical characteristics (normal or pathologic) were never reported. ...
... Anatomy was described in eight (23%) studies [16,17,[36][37][38][39][40], and specific anatomy described in three [38][39][40]. Anatomical characteristics (normal or pathologic) were never reported. ...
... Before 2020 (i.e., between 2008 and 2020), among 19 articles, only 6 (31.6%) studies had ethics committee approval. Only one (5%) study had clinical data within Shi et al. [25] Surgical workflow recognition Jin et al. [22] Tool presence detection and phase recognition Twinanda et al. [26] Tool presence detection and phase recognition Bodenstedt et al. [19] Surgical workflow recognition Bodenstedt et al. [20] Procedure duration prediction Jalal et al. [56] Surgical workflow recognition Lecuyer et al. [23] Surgical workflow recognition CHOLEC120 = CHOLEC80 + 40 After 2020 (i.e., between 2020 and 2022), among 15 articles, 9 (60%) studies had ethics committee approval. Three (20%) studies had clinical data within the dataset. ...
Article
Full-text available
Background Annotated data are foundational to applications of supervised machine learning. However, there seems to be a lack of common language used in the field of surgical data science. The aim of this study is to review the process of annotation and semantics used in the creation of SPM for minimally invasive surgery videos. Methods For this systematic review, we reviewed articles indexed in the MEDLINE database from January 2000 until March 2022. We selected articles using surgical video annotations to describe a surgical process model in the field of minimally invasive surgery. We excluded studies focusing on instrument detection or recognition of anatomical areas only. The risk of bias was evaluated with the Newcastle Ottawa Quality assessment tool. Data from the studies were visually presented in table using the SPIDER tool. Results Of the 2806 articles identified, 34 were selected for review. Twenty-two were in the field of digestive surgery, six in ophthalmologic surgery only, one in neurosurgery, three in gynecologic surgery, and two in mixed fields. Thirty-one studies (88.2%) were dedicated to phase, step, or action recognition and mainly relied on a very simple formalization (29, 85.2%). Clinical information in the datasets was lacking for studies using available public datasets. The process of annotation for surgical process model was lacking and poorly described, and description of the surgical procedures was highly variable between studies. Conclusion Surgical video annotation lacks a rigorous and reproducible framework. This leads to difficulties in sharing videos between institutions and hospitals because of the different languages used. There is a need to develop and use common ontology to improve libraries of annotated surgical videos.
... Moreover, it can avoid medical accidents caused by leftover surgical instruments. Early work responds to different data sets by using different artificial characteristics for detection like [2,3]. However, these methods are limited and cannot extract more efficient advanced features easily influenced by people's subjectivity, thus ignoring the more important fine-grained characteristics [4]. ...
Article
Full-text available
In minimally invasive laparoscopic surgery, it is of practical significance to quickly locate the location and category information of the surgical instrument. It can remind medical personnel of irreversible injury caused to patients due to leaving surgical instruments after the operation. In this paper, the Gaussian kernel is introduced into each ground truth, which is conducive to making full use of label information to allocate positive and negative samples and improve the accuracy of location and classification. Then, we introduce SIoU Loss and Harmonic Loss function into a total loss. The former uses relative coordinates to make the network converge more quickly, and the latter solves the problem of asynchronous optimization of the two branches of classification and regression. Our experiment proves that the strategy based on Gaussian kernel sample allocation is very effective on a pubic data set m2cai16-tool-locations, displaying our method possesses conspicuous accuracy of classification and regression than other work.
... Some of the early works on surgical tool detection used radiofrequency identification tags (Kranzfelder et al. 2013), Viola-Jones detection algorithm (Lalys et al. 2011) and segmentation, contour delineation and three-dimensional modelling (Speidel et al. 2009). With the advent of DL-based approaches using convolutional neural networks, computer vision methods have evolved with remarkable growth and demonstrated promising outcomes (Russakovsky et al. 2015). ...
Article
Full-text available
Surgical tool detection in minimally invasive surgery is an essential part of computer-assisted interventions. Current approaches are mostly based on supervised methods requiring large annotated datasets. However, labelled datasets are often scarce. Semi-supervised learning (SSL) has recently emerged as a viable alternative showing promise in producing models retaining competitive performance to supervised methods. Therefore, this paper introduces an SSL framework in the surgical tool detection paradigm, which aims to mitigate training data scarcity and data imbalance problems through a knowledge distillation approach. In the proposed work, we train a model with labelled data which initialises the Teacher-Student joint learning, where the Student is trained on Teacher-generated pseudo-labels from unlabelled data. We also propose a multi-class distance with a margin-based classification loss function in the region-of-interest head of the detector to segregate the foreground-background region effectively. Our results on m2cai16-tool-locations dataset indicates the superiority of our approach on different supervised data settings (1%, 2%, 5% and 10% of annotated data) where our model achieves overall improvements of 8%, 12%, and 27% in mean average precision on 1% labelled data over the state-of-the-art SSL methods and the supervised baseline, respectively. The code is available at https://github.com/Mansoor-at/Semi-supervised-surgical-tool-detection.
... Existing studies mainly focus on modeling high-dimensional visual features or the time sequence information for surgical phase recognition. In terms of visual feature extraction, early studies used manually designed descriptors to extract features, such as intensity and gradient [9], shape, color, and texture-based features [10]. Meanwhile, in time sequence feature modeling, several studies have utilized linear statistical models to capture the temporal structure of surgical videos, including dynamic time warping [11,12], conditional random fields [13][14][15], and variants of hidden Markov models (HMMs) [16,17]. ...
Article
Full-text available
Background Surgical video phase recognition is an essential technique in computer-assisted surgical systems for monitoring surgical procedures, which can assist surgeons in standardizing procedures and enhancing postsurgical assessment and indexing. However, the high similarity between the phases and temporal variations of cataract videos still poses the greatest challenge for video phase recognition. Methods In this paper, we introduce a global–local multi-stage temporal convolutional network (GL-MSTCN) to explore the subtle differences between high similarity surgical phases and mitigate the temporal variations of surgical videos. The presented work consists of a triple-stream network (i.e., pupil stream, instrument stream, and video frame stream) and a multi-stage temporal convolutional network. The triple-stream network first detects the pupil and surgical instruments regions in the frame separately and then obtains the fine-grained semantic features of the video frames. The proposed multi-stage temporal convolutional network improves the surgical phase recognition performance by capturing longer time series features through dilated convolutional layers with varying receptive fields. Results Our method is thoroughly validated on the CSVideo dataset with 32 cataract surgery videos and the public Cataract101 dataset with 101 cataract surgery videos, outperforming state-of-the-art approaches with 95.8% and 96.5% accuracy, respectively. Conclusions The experimental results show that the use of global and local feature information can effectively enhance the model to explore fine-grained features and mitigate temporal and spatial variations, thus improving the surgical phase recognition performance of the proposed GL-MSTCN.
... Real time object detection has recently been widely used in artificial intelligence surgery,surgical instrument detection is a promising AI sub project [6][7][8].Its value lies in helping surgeons to monitor medical conditions and tools to assist surgeons to make correct decisions,by providing test results with high accuracy.Also,it can avoid medical accidents caused by leftover surgical instruments.Early work responds to different data sets by using different artificial characteristics for detection,however,these methods are limited and cannot extract more efficient advanced features easily influenced by people's subjectivity, thus ignoring the more important fine-grained characteristics [9].In addition,the application of convolutional neural network algorithm has played a revolutionary role in the development of computer science and deep learning,making it possible to design an efficient positive and negative sample allocation strategy with higher accuracy than before. ...
Preprint
Full-text available
In minimally invasive laparoscopic surgery,it is of practical significance to quickly locate the location and category information of surgical instruments.It can remind medical personnel of irreversible injury caused to patients due to leaving surgical instruments after the operation.In this paper,Gaussian kernel is introduced into each ground truth,which is conducive to making full use of label information to allocate positive and negative samples and improve the accuracy of location and classification.Then, we introduce SIoU Loss and Harmonic Loss function into total loss.The former uses relative coordinates to make the network converge more quickly,and the latter solves the problem of asynchronous optimization of the two branches of classification and regression.Our experiment proves that the strategy based on Gaussian kernel sample allocation is very effective on a pubic data set m2cai16-tool-locations,displaying our method possesses conspicuous accuracy of classification and regression than other work.
... Some of the early works on surgical tool detection used radio frequency identification tags (Kranzfelder et al. 2013), Viola-Jones detection algorithm (Lalys et al. 2011) and segmentation, contour delineation and three-dimensional modeling (Speidel et al. 2009). With the advent of deep learning-based approaches using convolutional neural networks, computer vision methods have evolved with remarkable growth and demonstrated promising outcomes (Russakovsky et al. 2015). ...
Preprint
Full-text available
Surgical tool detection in minimally invasive surgery is an essential part of computer-assisted interventions. Current approaches are mostly based on supervised methods which require large fully labeled data to train supervised models and suffer from pseudo label bias because of class imbalance issues. However large image datasets with bounding box annotations are often scarcely available. Semi-supervised learning (SSL) has recently emerged as a means for training large models using only a modest amount of annotated data; apart from reducing the annotation cost. SSL has also shown promise to produce models that are more robust and generalizable. Therefore, in this paper we introduce a semi-supervised learning (SSL) framework in surgical tool detection paradigm which aims to mitigate the scarcity of training data and the data imbalance through a knowledge distillation approach. In the proposed work, we train a model with labeled data which initialises the Teacher-Student joint learning, where the Student is trained on Teacher-generated pseudo labels from unlabeled data. We propose a multi-class distance with a margin based classification loss function in the region-of-interest head of the detector to effectively segregate foreground classes from background region. Our results on m2cai16-tool-locations dataset indicate the superiority of our approach on different supervised data settings (1%, 2%, 5%, 10% of annotated data) where our model achieves overall improvements of 8%, 12% and 27% in mAP (on 1% labeled data) over the state-of-the-art SSL methods and a fully supervised baseline, respectively. The code is available at https://github.com/Mansoor-at/Semi-supervised-surgical-tool-det
... Surgical tool detection is one of the longest and most researched sub-applications of surgical AI, which can assist surgeons in monitoring the status of medical tools during minimally invasive surgery, making a correct objective judgment of the current surgical outcome, and alerting to these possible adverse medical events. The early work improved the accuracy of surgical tools detection by using handcrafted features and the improvement of traditional machine learning algorithms like [12][13][14] to reduce the cost of manual supervision. Still, these methods limited themselves that could not extract more high-level features efficiently. ...
Article
Full-text available
In minimally invasive laparoscopic surgery, accurate detection of the location and specific category of surgical tools assists the surgeon in making a correct objective judgment of the current surgical outcome and alerts to these possible adverse medical events. In this paper, an analyzing error method is proposed for revealing the specific sources of error in previous work. Then we confirm the main bottlenecks of the current methods with a high level of missing error and localization error. To reduce these errors, we render the backbone with a three‐dimensional attention mechanism and adopt a double‐headed detection head design to replace the single detection head. Moreover, we propose an enhanced multilayer perceptron called Mconv to enhance the localization branch of the double‐headed detection head. Our experiments evaluate our improved approach by our proposed error analysis method on a pubic dataset m2cai16‐tool‐locations, showing that our method yield remarkable detection accuracy than others.