REVIEW PAPERS

Machine Learning for Surgical Phase Recognition

A Systematic Review

Garrow, Carly R. BSc^∗; Kowalewski, Karl-Friedrich MD^∗,†; Li, Linhong BSc^∗; Wagner, Martin MD^∗; Schmidt, Mona W. MD^∗; Engelhardt, Sandy PhD^‡; Hashimoto, Daniel A. MD, MS^§; Kenngott, Hannes G. MD, MSc^∗; Bodenstedt, Sebastian PhD^¶,||; Speidel, Stefanie PhD^¶,||; Müller-Stich, Beat P. MD, MBA^∗; Nickel, Felix MD, MME^∗

Author Information

^∗Department of General, Visceral, and Transplantation Surgery, University Hospital of Heidelberg, Heidelberg, Germany

^†Department of Urology, University Medical Center Mannheim, Heidelberg University, Mannheim, Germany

^‡Department of Computer Science, Mannheim University of Applied Sciences, Mannheim, Germany

^§Department of Surgery, Massachusetts General Hospital, Boston, Massachusetts

^¶Division of Translational Surgical Oncology, National Center for Tumor Diseases (NCT), Dresden, Germany

^||Centre for Tactile Internet with Human-in-the-Loop (CeTI), TU Dresden, Dresden, Germany.

[email protected].

C.R.G. and K.-F.K. contributed equally to this manuscript.

Carly Garrow, Karl-Friedrich Kowalewski, Linhong Li, Martin Wagner, Mona Schmidt, Sandy Engelhardt, Hannes Kenngott, Sebastian Bodenstedt, Stefanie Speidel, and Beat Müller-Stich have no conflicts of interest or financial ties to disclose. Felix Nickel reports receiving travel support for conference participation and equipment provided for laparoscopic surgery courses by KARL STORZ, Johnson & Johnson, Intuitive Surgical, Cambridge Medical Robotics, and Medtronic. Daniel Hashimoto serves as an independent consultant for Verily Life Sciences and the Johnson & Johnson Institute. He serves on the clinical advisory board of Worrell, Inc and has received research support from Olympus Corporation.

Study conception and design: Kowalewski, Nickel, Garrow, Hashimoto, Schmidt, Kenngott.

Acquisition of data: Li, Garrow, Wagner, Engelhardt, Schmidt.

Analysis: Kowalewski, Li, Garrow, Bodenstedt, Speidel.

Analysis and interpretation of data: Kowalewski, Nickel, Wagner, Hashimoto, Li, Garrow.

Drafting of manuscript: Garrow, Müller-Stich, Kowalewski, Nickel, Schmidt.

Critical revision: Müller-Stich, Nickel, Hashimoto, Speidel, Engelhardt, Bodenstedt, Wagner, Kenngott.

This study was supported by Stiftung Oskar-Helene-Heim (http://www.stiftung-ohh.de/).

The authors declare no conflict of interest.

Supplemental digital content is available for this article. Direct URL citations appear in the printed text and are provided in the HTML and PDF versions of this article on the journal's Web site (www.annalsofsurgery.com).

This is an open access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal. http://creativecommons.org/licenses/by-nc-nd/4.0

Annals of Surgery 273(4):p 684-693, April 2021. | DOI: 10.1097/SLA.0000000000004425

Open
SDC

Abstract

Objective:

To provide an overview of ML models and data streams utilized for automated surgical phase recognition.

Background:

Phase recognition identifies different steps and phases of an operation. ML is an evolving technology that allows analysis and interpretation of huge data sets. Automation of phase recognition based on data inputs is essential for optimization of workflow, surgical training, intraoperative assistance, patient safety, and efficiency.

Methods:

A systematic review was performed according to the Cochrane recommendations and the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement. PubMed, Web of Science, IEEExplore, GoogleScholar, and CiteSeerX were searched. Literature describing phase recognition based on ML models and the capture of intraoperative signals during general surgery procedures was included.

Results:

A total of 2254 titles/abstracts were screened, and 35 full-texts were included. Most commonly used ML models were Hidden Markov Models and Artificial Neural Networks with a trend towards higher complexity over time. Most frequently used data types were feature learning from surgical videos and manual annotation of instrument use. Laparoscopic cholecystectomy was used most commonly, often achieving accuracy rates over 90%, though there was no consistent standardization of defined phases.

Conclusions:

ML for surgical phase recognition can be performed with high accuracy, depending on the model, data type, and complexity of surgery. Different intraoperative data inputs such as video and instrument type can successfully be used. Most ML models still require significant amounts of manual expert annotations for training. The ML models may drive surgical workflow towards standardization, efficiency, and objectiveness to improve patient outcome in the future.

Registration PROSPERO:

CRD42018108907

The newest form of computer-based automation, where many different types of data are self-captured and self-analyzed, such as self-driving cars, is advancing rapidly in all fields of daily life.¹ However, this huge amount of data can be difficult to interpret. To make this data usable, artificial intelligence (AI) and its subdomains such as machine learning (ML) can be applied. AI is generally defined as the ability of machines or computers to show intelligent behavior, for example, learning and solving tasks for which they were not explicitly programmed.² ML can roughly be described as the automated extraction of knowledge from data³ and is already commonly used in fields such as economics, advertisement, military, and engineering.⁴ Its implementation into healthcare has proven more difficult and has thus lagged behind other disciplines,⁵ though it is rapidly gaining popularity as well.^2,6,7 For example, ML models have recently demonstrated success in identifying cancerous lesions in radiographs⁸ and in identifying diabetic retinopathy,⁹ with the first medical device using ML receiving Food and Drug Administration (FDA) approval in 2018.^10,11 Furthermore, Shademan et al built an autonomous robot capable of suturing an intestinal anastomosis on an animal model that was of equal quality to anastomoses performed by surgeons.¹² Likewise, the number of hits on Medline for the search term “artificial intelligence” or “machine learning” show a steep increase since the introduction of ML in the last decade (Fig. 1).

FIGURE 1:
Number of hits for “artificial intelligence” OR “machine learning” on PubMed by year.

The technological advancements in surgery, especially minimally invasive and robotic surgery, have led to a growing number of devices used in the operating room. These devices can provide increasingly more information from surgical procedures (eg, surgical video, instrument use, pneumoperitoneal pressure, staff participation, table position, and instrument trajectories).^13,14 As the amount of information has increased, the field of surgical data science subsuming the collection, processing, and analysis of surgical data to help improve patient care, has become increasingly important. Within this field, the automation of data analysis is essential to reduce complexity while maximizing the utility of the data to enable new opportunities. For example, automated intraoperative assistance and cognition guidance for surgeons in real time is a potential future outcome, and automated and improved feedback for trainees.^15–18 Besides the potential to increase surgical quality itself, autonomously analyzing this data also offers the potential to optimize the entire workflow in surgical departments.^19–21 Having organized, annotated data during surgery could enable standardization and objectivity of surgical care including assessment, early detection of errors, and deviation from the normal operative course, potentially resulting in improved patient care.

Automated surgical phase recognition is a cornerstone for the realization of the aforementioned applications of ML in surgery.²² Surgical phases are defined as higher-level tasks that constitute an entire surgical procedure, e.g. dissecting Calots’ triangle to achieve critical view of safety in laparoscopic cholecystectomy (LC). This is illustrated in Figure 2. As surgical literature involving ML expands, surgeons should have an understanding of the types of studies being conducted. Thus, the aim of this systematic review was to summarize the ML models used for automated phase recognition in general surgery.

FIGURE 2:
Operative phases for laparoscopic cholecystectomy as defined by different authors. The number of phases as defined in the studies included in this review was variable.

METHODS

This systematic review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement²³ and Measurement Tool to Assess Systematic Reviews (AMSTAR 2) criteria.²⁴ Additionally, the concept of the review was registered with PROSPERO (CRD42018108907).

Literature Search

A comprehensive literature search was conducted using Medline (via PubMed) and Web of Science, and the IEEEXplore database to account for technical papers. The full PubMed, Web of Science, and IEEExplore search strategies can be found in Supplementary File 3, https://links.lww.com/SLA/C591. To optimize sensitivity of the search, the search was designed according to the Problem Intervention Control Outcome (PICO) criteria.²² Free-text words were combined using Boolean operators, in addition to medical subject headings terms. The search was performed by a professional librarian at Heidelberg University in May 2018 (see acknowledgments, Supplementary File 3, https://links.lww.com/SLA/C591). No language restrictions were applied.

To account for grey literature and further articles, specifically technical papers not published in a medical or IEEE journal, a manual search of google.scholar.com, CiteSeerX, and the references of relevant articles was performed. Finally, experts at the interface of ML and medicine were contacted.

Inclusion and Exclusion Criteria

Inclusion criteria included: a general surgery procedure; the capture of intraoperative signals; and phase recognition based on ML models. Exclusion criteria included: a study not performed on human patients (eg, models or animal experiments) and a surgical procedure from a specialty other than general surgery.

Study Selection

Two reviewers independently screened titles and abstracts for relevance. Disagreement was solved by coming to a consensus or with the help of a third reviewer. In the next step, the same process was performed for eligible full-texts. Data were extracted into a dedicated spreadsheet which was pre-tested on 5 studies to test its suitability.

Outcomes

The primary aim of this systematic review was to summarize the most important ML and computer models used to realize phase recognition in general surgery.

Secondary aims were to present commonly used vocabulary and concepts of AI and ML to the readership. Additionally, types of data that can be used as input for phase recognition are presented. Finally, accuracy rates and limitations of currently implemented methods were summarized.

Bias

Due to the nature of the review, bias analysis according to the Newcastle-Ottawa-scale or the Cochrane tool for assessing risk of bias was not applicable. However, all studies were discussed in the author panel and, if doubt about methodological quality was raised, results were interpreted carefully, and limitations discussed in the review.

RESULTS

The literature search retrieved a total of 2254 studies. After title and abstract screening, the full-texts of 68 studies were analyzed and 35 studies were found eligible for inclusion (Fig. 3).

FIGURE 3:
PRISMA flow diagram of literature search. PRISMA indicates preferred reporting items for systematic reviews and meta-analyses.

Figure 4 provides a comprehensive overview of the entire phase recognition process, incorporating data types, ML algorithms, surgical phases, and applications. Although the ML concepts are introduced in general in the manuscript, more detailed descriptions are provided in Supplementary File 1, https://links.lww.com/SLA/C571. Table 1 provides general information about included studies with references, and Supplementary File 2, https://links.lww.com/SLA/C572 includes a detailed summary table of each study included in this review for further reading.

FIGURE 4:
Overview of the phase recognition process. Annotated training data is used to train the machine learning algorithm, and the algorithm creates a model based on the data. Once the model is developed, it can be used to recognize the phases of unlabeled test data. This phase recognition may act as a cornerstone for a variety of tasks in the future. ANN indicates artificial neural network; DTW, dynamic time warping; HMM, hidden Markov model; RF, random forest; SVM, support vector machine.

TABLE 1 - General Information Summary of all Papers Discovered in Systematic Review

Author	Country	Year	# Procedures	Procedure	# Phases	ML Type	Accuracy
Ahmadi et al²⁵	Germany	2006	6	LC	14	DTW	92%
Padoy et al²⁶	Germany	2007	6	LC	5, 14	HMM	2.4% error
Padoy et al²⁷	Germany	2007	10	LC	14	DTW	5% error
Blum et al²⁸	Germany	2008	12	LC	14	HMM	6.7% error
Klank et al²⁹	Germany	2008	6	LC	14	SVM	86%
Padoy et al³⁰	Germany	2008	11	LC	14	HMM	7.6% error
Bouarfa et al³¹	Netherlands	2009	4	LC	13	HMM	.83 area under ROC curve
Blum et al³²	Germany	2010	10	LC	14	HMM, SVM, DTW	76.8%
Bouarfa et al^33,34	Netherlands	2011	10	LC	5	HMM	91%
Padoy et al³⁵	Germany	2012	16	LC	14	DTW, HMM	97.3%
Stauder et al^36,37	Germany	2014	4	LC	7	RF	68.8%
DiPietro et al³⁸	Germany	2015	42	LC	7	SVM, HMM	75.9%
Cadene et al³⁹	France	2016	27^‡	LC	8	CNN	93.5%
Dergachyova et al^40,41	France	2016	7^∗	LC	7	HMM	88.9%
Jin et al⁴²	China	2016	27^‡	LC	8	RNN	78.2 Jaccard score
Lea et al⁴³	USA	2016	25 + 7^∗	C & LC	7	CNN, SMM, DTW	92.8%
Liu et al⁴⁴	USA	2016	39	LC	20	SD	86%
Primus et al⁴⁵	Austria	2016	6	LC	6	SVM	correct within 22 seconds
Sahu et al⁴⁶	Germany	2016	27^‡	LC	8	CNN, RF	53.1 F1 score
Stauder et al⁴⁷	Germany	2016	20	LC	8	CNN	52.4 Jaccard score
Twinanda et al⁴⁸	France	2016	80^† + 27^‡	LC	7, 8	RNN, CNN, HMM	80.7%
Aksamentov et al⁴⁹	France	2017	80^† + 40	LC	7	RNN	89%
Bodenstedt et al⁵⁰	Germany	2017	324 + 7^∗ + 9	LC & CR	7, 8	RNN	74.5%
Hashimoto et al⁵¹	USA	2017	40	LSG	7	N/A	92%
Stauder et al⁵²	Germany/ USA	2017	5 + 18	LC	7	RF, HMM	82.4%
Twinanda et al⁵³	France/ Germany	2017	80^† + 7^∗	LC	7, 8	CNN, SVM, HMM	81%
Volkov et al⁵⁴	USA	2017	10	LSG	7	SVM	92.8%
Funke et al⁵⁵	Germany	2018	80^†	LC	7	RNN	92.7%
Jin et al⁵⁶	China	2018	27^‡ + 80^†	LC	3-8	RNN	92.4%
Loukas et al⁵⁷	Greece	2018	27^‡	LC	8	RNN	86%
Namazi et al⁵⁸	USA	2018	80^†	LC	7	RNN	96.3%
Yengera et al⁵⁹	France	2018	120	LC	7	RNN	86.7%

F1 score, calculated from precision and sensitivity, where 1 would be perfect precision and sensitivity; Jaccard score, A measure of similarity between the actual and predicted phase groups, with a score of 100 being perfect prediction; ROC Curve, A plot of the true positive rate versus the false positive rate, where an area underneath of 1 represents 100% accuracy.

^∗EndoVis workflow challenge dataset.

^†Cholec80 dataset.

^‡MICCAI 2016 workflow challenge dataset.C indicates cholecystectomy; CR, colorectal surgery; LC, laparoscopic cholecystectomy; LSG, laparoscopic sleeve gastrectomy.

ML Methods and Other Algorithms

All of the ML models presented in this review were based on supervised learning, in which humans manually annotate a dataset with the appropriate labels – in this case, labeling surgical video with the appropriate phases (Fig. 4). ML models are then trained on the annotated surgical data, allowing the models to create general rules that can later be used to make informed inferences about new, unseen data during testing (Fig. 4). Learning is guided by optimization techniques to minimize error. In the case of phase recognition, ML models are performing a classification task: assigning segments of surgical video to a given operative phase, based on its characteristics. Table 2 summarizes the most frequently used algorithms identified in this review, whereas Supplementary File 1, https://links.lww.com/SLA/C571 provides in-depth explanations for each type of model.

TABLE 2:
Explanation of Types of Machine Learning Models

Dynamic time warping was used to construct a reference model in 2 studies. These were the first studies to attempt phase recognition for a complete surgery.^25,26 Two other studies used dynamic time warping in combination with another ML model for the purposes of developing an average phase model for a surgery,³⁵ and accounting for temporal information.⁴³ Hidden Markov model (HMM) was one of the most popularly used ML models for phase recognition. One research group from the Technical University of Munich has performed extensive research in using HMM for phase recognition of LC,^28,30,32,35 with other groups performing similar research.^{31,33,34,40,41} An additional 4 studies investigated the use of HMM in combination with another ML model, typically to take temporal information into account and improve accuracy.^38,39,53,54 Several of the papers were able to achieve an error rate of less than 10%. Another 4 studies tested support vector machines (SVM) as the main model for phase prediction. The first study to utilize SVM for phase recognition was published in 2008 by Klank et al.²⁹ Other literature describing SVM was not published until 2015 and later.^38,45,54 One study fed results from their convolutional neural networks model into a SVM for final phase prediction, achieving 92% accuracy.⁵³ Random forests (RF) for surgical phase recognition were first proposed in 2015 and have been less frequently reported for surgical phase recognition in the present review, with 3 studies published in the literature.^36,37,52 Like SVMs, RF can be used in conjunction with neural network and other approaches. Sahu et al fed results from their convolutional neural networks model into 2 RFs for final phase recognition.⁴⁶

Overall, artificial neural networks (ANN) have been the most popularly tested ML type, with 14 published studies included in the present systematic review.^{39,42,43,46–50,53,55–59} ANNs are capable of learning the most important discriminating features simply by receiving information, such as a raw image with phase annotations, as input (feature learning from video). Although ANNs require more annotated data and higher computational power than other approaches, they also have the potential to improve accuracy with increased data inputs and to simplify data acquisition. Interestingly, the first phase recognition paper utilizing an ANN was not published until 2016, with 6 of these studies published in response to the International Conference on Medical Image Computing & Computer Assisted Intervention (MICCAI) 2016 Workflow Challenge.^{39,42,46,48,56,57}

Data Types

To identify the phase of a surgery using ML, at least 1 type of data has to be used as input to an algorithm (Fig. 4A). In this case, an input can be defined as any data that is used to provide information about different time points during a surgery. Table 3 provides explanations of each data type used as input, whereas Supplementary File 1, https://links.lww.com/SLA/C571 can be referenced for further information on data types. The most frequently used data types were instrument use from manual annotation (n = 13) and feature learning from surgical videos (n = 12). Other data types included: instrument use from radio-frequency identification tags (n = 3); intraoperative characteristics (n = 4); instrument use detected from surgical videos (n = 5); and feature extraction from surgical videos (n = 7). Studies were likely to combine multiple data types as input to improve accuracy.

TABLE 3 - Explanation of Data Types

Name	Explanation	Advantages	Disadvantages	References
Intraoperative characteristics (IOC)	Recording of intraoperative characteristics, such as intra-abdominal pressure, suction and irrigation bag weights, and the inclination of the surgical table. Typically used in combination with one of the other techniques below.	Simple to record and does not require any extra equipment in the OR.	Often must be recorded manually, making technique time-consuming.	^36–38,52
Instrument use - manual annotation (IU-M)	Manual annotation of the time points when each instrument is put into or taken out of use.	High accuracy rate. Strongly correlated to underlying surgical workflow.^25,26 Does not require any extra equipment in the OR.	Time-consuming technique.	^{25–28,30,31,33–35,40,41,44,47}
Instrument use – radio-frequency identification tags (IU-R)	Detection of instrument use by attaching radio-frequency identification tags to each instrument and placing antennas throughout the OR. The antennas identify an instrument as “activated” once the surgeon picks it up.	Avoids the work-heavy manual annotation of IU-M. Strongly correlated to underlying surgical workflow.	Equipment and set-up in the OR required.	^36–38
Instrument use – automatic detection from video (IU-V)	Automatic detection of instrument use from the laparoscopic video using ML models.	Avoids the work-heavy manual annotation of IU-M and does not require any extra equipment in the OR. Strongly correlated to underlying surgical workflow.	Slight loss in accuracy when compared to manual annotation.	^{42,43,48,53,55}
Feature extraction from video (FE-V)	Feature extraction involves the manual definition of various types of features in a video, such as texture, color histograms, and shape and object detection.	Takes other features besides IU into account to determine phases. Requires no additional equipment in the OR.	Features are manually created and decided in advance, meaning that other information could be lost that would be useful for ML algorithms.	^{29,30,32,40,41,49,54}
Feature learning from video (FL-V)	Certain models (ANNs, genetic programming) are capable of automatically learning and identifying the important features of a laparoscopic video.	Learned features can provide the most discriminative power for phase recognition, because they consider all data. The algorithms learn the features in a self-supervised manner.	Can be difficult, time-consuming, and computationally expensive to train.	^{39,42,43,46–48,50,53,55,57-59}

Accuracy and Types of Surgery

LC was the most common surgery that was used to create and test the ML models. There was heterogeneity among the definitions of the different phases of LC. The number of defined phases for LC ranged from 3 to 20. However, there was a consistency for certain key steps such as Calot triangle dissection, clipping, and cutting of cystic duct and artery, and gall bladder dissection and removal (see Table 1 and Fig. 2).

Publicly available datasets that provided annotated phase information of LC for training and testing were utilized by multiple studies (see Table 1). The EndoVis workflow challenge set was used in 4 studies,^{40,41,43,50,53} Cholec80 dataset in 6 studies,^{48,49,53,55,56,58} and the MICCAI 2016 workflow challenge dataset in 6 studies.^{39,42,46,48,56,57}

Other studies focused on bariatric surgery and colorectal surgery. Laparoscopic sleeve gastrectomy was investigated in 2 studies,^51,54 with both achieving a phase recognition accuracy over 90%. The same ML model was compared between LC and colorectal surgery in 1 study,⁵⁰ achieving 74% and 67% accuracy, respectively. However, because colorectal procedures are less standardized and more complex than LC, the authors stated that the lower accuracy for colorectal procedures was an expected outcome.

The reported accuracy for automated phase recognition ranged widely based on a number of factors. The type and design of the ML model greatly affected accuracy. Although many of the studies compared multiple model designs to discover the best-performing model, this can also be demonstrated by looking at studies that used the same dataset but different model designs, such as Cadene et al (93.5% accuracy)³⁹ and Loukas et al (86% accuracy).⁵⁷ The type of data used as input and how it was annotated also showed a large effect on accuracy. For example, Twinanda et al compared the performance of the same model design trained on the MICCAI 2016 and Cholec80 data sets, finding an accuracy of 79.5% and 71.1%, respectively.⁴⁸ In addition, studies reported their accuracy using a variety of metrics, making direct comparison between all of the studies difficult. Although most studies simply reported percent accuracy, others reported percent error, or used more complex measures such as the area under the ROC curve or the Jaccard Score. Explanations of these measuring systems can be found in the Table 1 description. However, it does not seem that accuracy increased over time or with more complex ML methods overall.

DISCUSSION

This review provides an overview of ML for automated surgical phase recognition in general surgery procedures with 35 included studies reporting on various types of ML models and data input. The most commonly used ML models were HMM and ANN, whereas the most frequently used data types were instrument use from manual annotation and feature learning from surgical videos. From the first study on this topic published in 2006 to the most recent studies, the complexity of the ML models and the data types has increased, and improved computer power has contributed in part to the increase in the use of computation-heavy ANNs. Automated surgical phase recognition can currently be performed with high accuracy depending on the granularity of the phases, the ML model, the type of data input, and the complexity of the surgical procedure.

The majority of the studies found in this review used LC as their operation of choice. The fact that publicly available data sets, such as the Cholec80,⁵³ focus on LC is likely a contributing factor. Additionally, as discussed in Klank et al, “cholecystectomy is a common surgery that is performed minimally invasively in 95% of the cases, with a low conversion rate and well-defined phases. It is for this reason often used for phase recognition.”²⁹ Though LC does follow a fairly standardized workflow, it is interesting to mention the disparity in the number of formally identified phases found in this review. Table 1 details that the number of phases described in the studies ranged from 5 to 20. Despite this heterogeneity, a certain consistency for including key steps for LC was apparent. As mentioned by Padoy et al,²⁶ a smaller number of phases should theoretically be easier for a ML model, because each phase has more data to train the algorithm. However, a larger number of phases can be beneficial because increased granularity may be more clinically relevant.⁶⁰

The ubiquity of LC in general surgery practice makes it a useful model to illustrate concepts in phase recognition. For example, the use of specific instruments can improve the identification of surgical phases, such as clipping the cystic duct or artery, as the only action that can be performed with a clip applier is clipping. However, instrumentation can also provide clues on deviation from an expected operative trajectory, such as placing additional clips to stop unexpected bleeding. Such examples are important for establishing the distribution of expected deviations from the procedural “standard” that would still be within normal limits, and this has been explored in the surgical education literature.⁶¹ By establishing a distribution of “normal” in operations, deviations from normal could be utilized to detect abnormalities due to unique operative circumstances, operative errors, or unsafe technique.^2,⁶⁰ For example, video analysis showing blood in the field and the re-insertion of the clipping device could indicate clinically significant bleeding and an increased risk of intra- or postoperative complications. Data about additional fluid or medications given by anesthesia could further signal or rule out hemodynamically significant bleeding. Therefore, a single data input may not suffice to recognize a deviation from the normal course, but the comprehensive, automated and simultaneous interpretation of multiple inputs in the data set could.

Automated surgical phase recognition is a foundational step for many different applications (see Fig. 4D). In the future, surgical data will likely be annotated and indexed automatically during surgery, allowing surgeons to quickly review certain steps of previous operations while creating informative and focused education material for students and residents. Automated phase recognition is a prerequisite for clinical decision support systems that will provide information to surgeons relative to the actual phase and course of the operation. Such support could be the depiction of target or at-risk structures such as tumor margins, vessels or nerves through augmented reality overlays.^62,63 Additionally, automated phase recognition can improve efficiency in the OR by estimating the remaining surgery duration in real-time and initiating the preparation of the next patient at the most optimal time point. Thus, during the perioperative process, automated phase recognition could provide current information to robotic surgical assistants, provide context-specific information for the surgical phase, improve team work, predict the remaining length of the procedure to optimize operating room management, identify the risk for complications or assess performance.^22,64

Furthermore, as the use of robotics in surgery has increased, more accurate tracking based on instrument kinematics has become possible. Groups led by Hung, Jarc, and Gao have introduced the use of automated performance metrics with ML to assess surgeon's performance, recognize surgical activity, and even anticipate surgical outcome.^64–66 Automated performance metrics also provide a mechanism for robotic systems to potentially learn how to perform surgical tasks and procedures. However, the plausibility of robotic systems operating autonomously, even for limited steps of an operation, remains to be demonstrated.⁶⁷

Despite the increasing number of studies in the field of ML for automated surgical phase recognition, limitations in the field still exist; data annotation, in particular, is a bottleneck in the process. As supervised learning is the dominant methodology, raw data is of little utility without some annotation. In an effort to reduce the amount of manual annotation needed, Funke et al investigated model accuracy when partial annotations were provided to an algorithm, finding that annotating one fourth of the data led to only about a 4% decrease in accuracy when compared with three fourths.⁵⁵ Other methods for improving the annotation process have been suggested as well, such as active learning with Deep Bayesian Networks that can preselect parts of the image data to be manually annotated and then learn iteratively with decreasing amount of manual work needed over time.^68,69 Furthermore, standardization of annotation is necessary to enable comparison of results across studies.

The collection of sufficient amounts of data for research and implementation of surgical phase recognition is still difficult in everyday clinical practice. Integrated ORs, which allow for easier collection and synchronization of various input data and other perioperative information, have still not been widely implemented. Accuracy of ML for surgical phase recognition can still vary widely, and many models require a prohibitive amount of manual annotation and computer processing power. In addition, legal issues exist in accessing and using video and other data from a patient's surgery, with laws varying by country. These challenges have prevented surgical phase recognition from being utilized outside of academic settings to date. The more widespread implementation of integrated ORs, solutions to decrease manual annotation, improved computer performance, and the standardization of data types are prerequisites for the research on ML for surgical phase recognition to translate into clinical practice in the future.

A challenge in the acceptance of ML in the surgical realm is that ML works differently than human intelligence. It is often difficult to interpret the processes happening within a ML model, if this information is available at all. Additionally, the discriminative features as learned by an ANN often differ from those that would be identified by an expert surgeon, though the ML technique may be highly accurate in surgical phase recognition. This may lead to distrust of ML in its ability to successfully perform its assigned tasks. However, as demonstrated in this review, ML is slowly beginning to be welcomed into the field of surgery and recognized as a new technology that can vastly improve healthcare. The acceptance of ML in surgery will depend not only on accuracy but also on transparency, which is termed “Explainable Artificial Intelligence.”⁷⁰ ML for automated surgical phase recognition is a crucial step for many different applications and not an end in itself. Rather, it is the basis for a comprehensive workflow optimization to further improve the quality of healthcare in the future.

CONCLUSIONS

ML for automated surgical phase recognition is a growing research topic with the potential for important innovation in the future. Automated surgical phase recognition identifies different phases and steps of an operation through data inputs such as videos or instrument use. This information then serves to optimize the workflow including intraoperative assistance, automation, surgical training, and patient safety. For operations with well-defined, simple workflows, such as LC, automated phase recognition can be performed with high accuracy, though analysis of more complex surgical procedures remains more challenging. There is a need for the collection of surgical data ready for ML analysis and for standardization of surgical phases in workflow annotation. Given the multiple potential applications of the technology, the use of ML in the field of surgery will likely benefit patients and complement the knowledge and expertise of surgeons to enable higher quality care and improved efficiency.

Acknowledgment

The authors would like to thank Mr. Dietmar Fleischer from Heidelberg University's Library for conducting the literature search.

REFERENCES

1. Feußner H, Park A. Surgery 4.0: the natural culmination of the industrial revolution? Innov Surg Sci 2017; 2:105–108.

2. Hashimoto DA, Rosman G, Rus D, et al. Artificial intelligence in surgery: promises and perils. Ann Surg 2018; 268:70–76.

3. Kassahun Y, Yu B, Tibebu AT, et al. Surgical robotics beyond enhanced dexterity instrumentation: a survey of machine learning techniques and their role in intelligent and autonomous surgical actions. Int J Comput Assist Radiol Surg 2016; 11:553–568.

4. Mellit A, Kalogirou SA. Artificial intelligence techniques for photovoltaic applications: a review. Prog Energy Combust Sci 2008; 34:574–632.

5. Herzlinger RE. Why innovation in health care is so hard. Harv Bus Rev 2006; 84:58–66.

6. Obermeyer Z, Emanuel EJ. Predicting the future—big data, machine learning, and clinical medicine. N Engl J Med 2016; 375:1216.

7. Hinton G. Deep learning—a technology with the potential to transform health care. JAMA 2018; 320:1101–1102.

8. Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017; 542:115–118.

9. Gulshan V, Peng L, Coram M, et al. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 2016; 316:2402–2410.

10. FDA Permits Marketing of Artificial Intelligence-based Device to Detect Certain Diabetes-related Eye Problems. Available at: https://www.fda.gov/newsevents/newsroom/pressannouncements/ucm604357.htm: U.S. Food and Drug Administration, 2018. Accessed: Aug. 2018.

11. Stead WW. Clinical implications and challenges of artificial intelligence and deep learning. JAMA 2018; 320:1107–1108.

12. Shademan A, Decker RS, Opfermann JD, et al. Supervised autonomous robotic soft tissue surgery. Sci Transl Med 2016; 8:337ra64–1337ra.

13. Maier-Hein L, Vedula SS, Speidel S, et al. Surgical data science for next-generation interventions. Nat Biomed Eng 2017; 1:691–696.

14. Kranzfelder M, Schneider A, Fiolka A, et al. Reliability of sensor-based real-time workflow recognition in laparoscopic cholecystectomy. Int J Comput Assist Radiol Surg 2014; 9:941–948.

15. Kenngott H, Wagner M, Preukschas A, et al. Intelligent operating room suite: From passive medical devices to the self-thinking cognitive surgical assistant. Der Chirurg 2016; 87:1033–1038.

16. Franke S, Rockstroh M, Hofer M, et al. The intelligent OR: design and validation of a context-aware surgical working environment. Int J Comput Assist Radiol Surg 2018; 16:1–8.

17. Kowalewski K-F, Garrow CR, Schmidt MW, et al. Sensor-based machine learning for workflow detection and as key to detect expert level in laparoscopic suturing and knot-tying. Surg Endosc 2019; 33:3732–3740.

18. Kowalewski K-F, Hendrie JD, Schmidt MW, et al. Development and validation of a sensor-and expert model-based training system for laparoscopic surgery: the iSurgeon. Surg Endosc 2017; 31:2155–2165.

19. Katic D, Julliard C, Wekerle AL, et al. LapOntoSPM: an ontology for laparoscopic surgeries and its application to surgical phase recognition. Int J Comput Assist Radiol Surg 2015; 10:1427–1434.

20. Neumuth T. Surgical process modeling. Innov Surg Sci 2017; 2:123–137.

21. Kenngott HG, Apitz M, Wagner M, et al. Paradigm shift: cognitive surgery. Innov Surg Sci 2017; 2:139–143.

22. Weede O, Dittrich F, Wörn H, et al. Workflow Analysis and Surgical Phase Recognition in Minimally Invasive Surgery. 2012 IEEE International Conference on Robotics and Biomimetics (ROBIO): IEEE; 2012:1080–1074.

23. Moher D, Liberati A, Tetzlaff J, et al. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. Ann Internal Med 2009; 151:264–269.

24. Shea BJ, Grimshaw JM, Wells GA, et al. Development of AMSTAR: a measurement tool to assess the methodological quality of systematic reviews. BMC Med Res Methodol 2007; 7:10.

25. Ahmadi SA, Sielhorst T, Stauder R, et al. Recovery of Surgical Workflow Without Explicit Models. International Conference on Medical Image Computing and Computer-Assisted Intervention: Springer; 2006:420–428.

26. Padoy N, Horn M, Feussner H, et al. Recovery of Surgical Workflow: A Model-based Approach. 21st International Congress and Exhibition-Computer Assisted Radiology and Surgery-CARS; 2007.

27. Padoy N, Blum T, Essa I, et al. A boosted segmentation method for surgical workflow analysis. Med Image Comput Comput Assist Interv 2007; 10:102–109.

Google Scholar

28. Blum T, Padoy N, Feußner H, et al. Modeling and Online Recognition of Surgical Phases Using Hidden Markov Models. International Conference on Medical Image Computing and Computer-Assisted Intervention: Springer; 2008:627–635.

29. Klank U, Padoy N, Feussner H, et al. Automatic feature generation in endoscopic images. Int J Comput Assist Radiol Surg 2008; 3:331–339.

30. Padoy N, Blum T, Feussner H, et al. On-line recognition of surgical activity for monitoring in the operating room. Association for the Advancement of Artificial Intelligence Conference: AAAI 2008; 1718–1724.

31. Bouarfa L, Jonker P, Dankelman J. Surgical Context Discovery by Monitoring Low-level Activities in the OR. MICCAI Workshop on Modeling and Monitoring of Computer Assisted Interventions (M2CAI). London, UK; 2009.

32. Blum T, Feussner H, Navab N. Modeling and segmentation of surgical workflow from laparoscopic video. Med Image Comput Comput Assist Interv 2010; 13:400–407.

33. Bouarfa L, Jonker PP, Dankelman J. Discovery of high-level tasks in the operating room. J Biomed Inform 2011; 44:455–462.

34. Bouarfa L, Stassen LP, Dankelman PPJJ. In-vivo measuring surgical workflow activities in the OR. Measuring Behavior 2010; 2010:66.

35. Padoy N, Blum T, Ahmadi SA, et al. Statistical modeling and recognition of surgical workflow. Med Image Anal 2012; 16:632–641.

36. Stauder R, Okur A, Navab N. Detecting and Analyzing the Surgical Workflow to Aid Human and Robotic Scrub Nurses. The Hamlyn Symposium on Medical Robotics: Imperial College London; 2014:91–92.

37. Stauder R, Okur A, Peter L, et al. Random Forests for Phase Detection in Surgical Workflow Analysis. International Conference on Information Processing in Computer-Assisted Interventions: Springer; 2014: 148–157.

38. DiPietro R, Stauder R, Kayis E, et al. Automated Surgical-phase Recognition Using Rapidly-deployable Sensors. Proceedings of Modeling and Monitoring of Computer Assisted Interventions Workshop in Conjunction with Medical Image Computing and Computer Assisted Interventions; 2015.

39. Cadene R, Robert T, Thome N, et al. M2CAI workflow challenge: convolutional neural networks with time smoothing and hidden Markov model for video frames classification. arXiv 2016; arXiv-1610.

40. Dergachyova O, Bouget D, Huaulme A, et al. Automatic data-driven real-time segmentation and recognition of surgical workflow. Int J Comput Assist Radiol Surg 2016; 11:1081–1089.

41. Dergachyova O, Bouget D, Huaulmé A, et al. Data-driven surgical workflow detection: technical report for M2CAI 2016 surgical workflow challenge. IEEE Trans on Medical Imaging 2016.

42. Jin Y, Dou Q, Chen H, et al. EndoRCN: Recurrent convolutional networks for recognition of surgical workflow in cholecystectomy procedure video. IEEE Trans on Medical Imaging 2016.

43. Lea C, Choi JH, Reiter A, et al. Surgical Phase Recognition: From Instrumented ORs to Hospitals Around the World. Medical Image Computing and Computer-assisted Intervention M2CAI—MICCAI Workshop; 2016: 45–54.

44. Liu R, Zhang X, Zhang H. Web-video-mining-supported workflow modeling for laparoscopic surgeries. Artif Intell Med 2016; 74:9–20.

Google Scholar

45. Primus MJ, Schoeffmann K, Böszörmenyi L. Temporal segmentation of laparoscopic videos into surgical phases. 14th International Workshop on Content-Based Multimedia Indexing (CBMI): IEEE 2016; 1–6.

46. Sahu M, Mukhopadhyay A, Szengel A, et al. Tool and phase recognition using contextual CNN features. arXiv 2016; arXiv-1610.

47. Stauder R, Ostler D, Kranzfelder M, et al. The TUM LapChole dataset for the M2CAI 2016 workflow challenge. arXiv 2016; arXiv-1610.

48. Twinanda AP, Mutter D, Marescaux J, et al. Single-and multi-task architectures for surgical workflow challenge at M2CA. arXiv 2016; arXiv-1610.

49. Aksamentov I, Twinanda AP, Mutter D, et al. Deep Neural Networks Predict Remaining Surgery Duration from cholecystectomy videos. International Conference on Medical Image Computing and Computer-Assisted Intervention: Springer; 2017:586–593.

50. Bodenstedt S, Wagner M, Katić D, et al. Unsupervised temporal context learning using convolutional neural networks for laparoscopic workflow analysis. arXiv 2017; arXiv-1702.

51. Hashimoto DA, Rosman G, Volkov M, et al. Artificial intelligence for intraoperative video analysis: machine learning's role in surgical education. J Am Coll Surg 2017; 225:S171.

52. Stauder R, Kayis E, Navab N. Learning-based surgical workflow detection from intra-operative signals. arXiv 2017; arXiv-1706.

53. Twinanda AP, Shehata S, Mutter D, et al. EndoNet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans Med Imaging 2017; 36:86–97.

54. Volkov M, Hashimoto DA, Rosman G, et al. Machine Learning and Coresets for Automated real-time Video Segmentation of Laparoscopic and Robot-assisted Surgery. 2017 IEEE International Conference on Robotics and Automation (ICRA); 2017:754–759.

55. Funke I, Jenke A, Mees ST, et al. Temporal Coherence-based Self-supervised Learning for Laparoscopic Workflow Analysis. 2018; Cham: Springer International Publishing, 85–93.

Cited Here

56. Jin Y, Dou Q, Chen H, et al. SV-RCNet: workflow recognition from surgical videos using recurrent convolutional network. IEEE Trans Med Imaging 2018; 37:1114–1126.

57. Loukas C. Surgical phase recognition of short video shots based on temporal modeling of deep features. 12th International Joint Conference on Biomedical Engineering Systems and Technologies: SCITEPRESS 2019; 2:21–29.

58. Namazi B, Sankaranarayanan G, Devarajan V. Automatic detection of surgical phases in laparoscopic videos. International Conference on Artificial Intelligence: The Steering Committee of the World Congress in Computer Science, Computer Engineering, and Applied Computing. 2018: 124–130.

59. Yengera G, Mutter D, Marescaux J, et al. Less is more: surgical phase recognition with less annotations through self-supervised pre-training of CNN-LSTM networks. arXiv 2018; arXiv-1805.

60. Hashimoto DA, Rosman G, Witkowski ER, et al. Computer vision analysis of intraoperative video: automated recognition of operative steps in laparoscopic sleeve gastrectomy. Ann Surg 2019; 270:414–421.

61. Hashimoto DA, Axelsson CG, Jones CB, et al. Surgical procedural map scoring for decision-making in laparoscopic cholecystectomy. Am J Surg 2019; 217:356–361.

62. Nickel F, Kenngott HG, Neuhaus J, et al. Navigation system for minimally invasive esophagectomy: experimental study in a porcine model. Surg Endosc 2013; 27:3663–3670.

63. Kenngott HG, Wagner M, Gondan M, et al. Real-time image guidance in laparoscopic liver surgery: first clinical experience with a guidance system based on intraoperative CT imaging. Surg Endosc 2014; 28:933–940.

64. Hung AJ, Chen J, Gill IS. Automated performance metrics and machine learning algorithms to measure surgeon performance and anticipate clinical outcomes in robotic surgery. JAMA Surg 2018; 153:770–771.

65. Gao Y, Vedula SS, Reiley CE, et al. Jhu-isi gesture and skill assessment working set (jigsaws): a surgical activity dataset for human motion modeling. MICCAI Workshop: M2CAI 2014; 3:3.

66. Jarc AM, Curet MJ. Viewpoint matters: objective performance metrics for surgeon endoscope control during robot-assisted surgery. Surg Endosc 2017; 31:1192–1202.

67. Panesar S, Cagle Y, Chander D, et al. Artificial intelligence and the future of surgical robotics. Ann Surg 2019; 270:223–226.

68. Bodenstedt S, Rivoir D, Jenke A, et al. Active learning using deep Bayesian networks for surgical workflow analysis. Int J Comput Assist Radiol Surg 2019; 14:1079–1087.

69. Yu T, Mutter D, Marescaux J, et al. Learning from a tiny dataset of manual annotations: a teacher/student approach for surgical phase recognition. arXiv 2018; arXiv-1812.

70. Gordon L, Grantcharov T, Rudzicz F. Explainable artificial intelligence for safe intraoperative decision support. JAMA Surg 2019; 154:1064–1065.

Keywords:

artificial intelligence; digital health; general surgery; machine learning; minimally invasive surgery; robotic surgery; surgical phase recognition; workflow recognition

Secondary Logo

Journal Logo