Conference PaperPDF Available

Achieving Fast Operational Intelligence in NASA's Deep Space Network Through Complex Event Processing

Authors:
  • NASA Jet Propulsion Laboratory
American Institute of Aeronautics and Astronautics
1
Achieving Fast Operational Intelligence in NASA's Deep
Space Network Through Complex Event Processing
Joshua S. Choi1, Rishi Verma2, and Shan Malhotra3
Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA, 91109
NASA’s Deep Space Network (DSN) is a complex, global project, in which the expertise of
human operators remain crucial for its successful operation. To find ways to save costs in
operations and to improve its services, a number of modernization efforts are underway in the
DSN. One such effort is a research and technology development task at the Jet Propulsion
Laboratory that is investigating the use of complex event processing (CEP) for intelligent
assessment of situations, trend analysis, and advanced automation. The technology leverages
the significant business intelligence (BI) and data science advancements made in the enterprise
industries over the last several years. The open source big data processing engine Apache
SparkTM and the high-throughput, distributed messaging system Apache Kafka form the core
of the DSN Complex Event Processing (DCEP) framework. This paper discusses the system
engineering perspective of why achieving efficient, lower-cost operations in the DSN is a
challenging problem, how the DCEP system handles the use cases that help realize intelligent
operations, and how this solution fits into the overall model of the planned DSN Follow-the-
Sun Operations (FtSO).
Nomenclature
3LPO = three links per operator
API = application programming interface
BI = business intelligence
CEP = complex event processing
DCEP = DSN Complex Event Processing
DR = Discrepancy Report
DSCC = Deep Space Communications Complex
DSN = Deep Space Network
DSS = Deep Space Station
DRMS = Discrepancy Reporting Management System
FtSO = Follow-the-Sun Operations
GUI = graphical user interface
JMS = Java Message Service
JPL = Jet Propulsion Laboratory
JSON = JavaScript Object Notation
LCO = Link Control Operator
LTPS = Light Time Physics Service
MCIS = Monitor and Control Infrastructure Services
MDS = Monitor Data Service
MDR = Master DR
MON-2 = DSN Monitor and Control Standard
NASA = National Aeronautics and Space Administration
1 Engineering Applications Software Engineer, Mission Control Systems Section, 4800 Oak Grove Drive, M/S: 301-
480, Pasadena, CA 91109-8099, United States of America.
2 Scientific Applications Software Engineer, Instrument Software and Science Data Systems Section, 4800 Oak Grove
Drive, M/S: 158-242, Pasadena, CA 91109-8099, United States of America.
3 Engineering Applications Software Engineer, Planning & Execution Systems Section, 4800 Oak Grove Drive, M/S:
301-250D, Pasadena, CA 91109-8099, United States of America.
Downloaded by 99.189.1.209 on December 26, 2018 | http://arc.aiaa.org | DOI: 10.2514/6.2016-2375
SpaceOps 2016 Conference
16-20 May 2016, Daejeon, Korea 10.2514/6.2016-2375
Copyright © 2016 by the American Institute of Aeronautics and Astronautics, Inc.
The U.S. Government has a royalty-free license to exercise all rights under the copyright claimed herein for Governmental purposes. All other rights are reserved by the copyright owner.
SpaceOps Conferences
American Institute of Aeronautics and Astronautics
2
NMC = Network Monitor and Control (subsystem)
NOCC = Network Operations and Control Center
Pc/N0 = carrier power to noise spectral density ratio (signal-to-noise ratio)
PL/SQL = Procedural Language/Structured Query Language
REST = representational state transfer
RO = remote operations
SOE = Sequence of Events
SPS = Service Preparation Subsystem
SQA = Service Quality Assessment (subsystem)
TDN = Temporal Dependency Network
WAN = wide area network
XML = Extensible Markup Language
I. Introduction
ASA’S Deep Space Network (DSN) is a global network of space communication facilities and powerful antennas
capable of communicating with interplanetary spacecrafts. Operated by the Jet Propulsion Laboratory (JPL) for
NASA, at any given time, between 40 to 50 space missions rely upon its services. The DSN consists of three Deep
Space Communications Complexes (DSCCs) around the world, geographically spaced apart by approximately 120
degrees on Earth for uninterrupted view of any spacecraft in deep space (see Figure 1). The DSCCs are operated onsite
andfor the most partindependent of each other. Global mission support activities in the DSN are coordinated and
monitored from the Network Operations and Control Center (NOCC) at JPL in Pasadena, California. At every DSCC
and also at the NOCC, numerous heterogeneous hardware and software systems interoperate with each other.
Operating and managing these critical components is not a straightfoward endeavor. To this day, human operators are
heavily relied upon for their expert knowledge of these systems and space communication. Much of the heavy lifting
in the day-to-day DSN Operations is still done by the operators themselves. With significant advances that have been
made in computing over the years, both in hardware (e.g. affordable, faster processors and memory) and software (e.g.
automation, rule engines, business intelligence, frameworks for processing big data, high-performance messaging
systems, data science, and machine learning), this no longer may have to be the case. By leveraging the combination
of these modern, advanced technologies, DSN Operations can be improved on multiple levels: quality of service,
additional capabilities, and reduced operation costs.
N
Figure 1. DSN’s DSCC locations around the world. By dividing the circumference of Earth into approximate
thirds, the DSCCs have a view to any spacecraft in deep space at any point in time, even as the Earth rotates.
The three DSCC locations are: Goldstone, California, USA; Madrid, Spain; and Canberra, Australia. (Image
credit: Deep Space Network Now; http://eyes.nasa.gov/dsn/dsn.html)
Downloaded by 99.189.1.209 on December 26, 2018 | http://arc.aiaa.org | DOI: 10.2514/6.2016-2375
American Institute of Aeronautics and Astronautics
3
Taking a cue from other industries, a research and techonology development team at JPL has been exploring the
application of complex event processing (CEP) in the domain of DSN Operations. CEP is a method of combining
streams of data from multiple sources in order to identify meaningful events or patterns. What makes CEP now viable
in the domain of DSN Operations are all of the computing advancements mentioned previously. CEP can enable a
more comprehensive automation of DSN Operations, reducing the dependency on human intervention. By also using
CEP to perform deeper analysis of situations and non-obvious trends, valuable intelligence can be garnered, possibly
matching or surpassing a human expert’s ability to do so in real-time. Such operational intelligence is the DSN
equivalent of business intelligence (BI) that is increasingly sought after in commercial industries. Furthermore, CEP
perfectly fits into the model of the grander Follow-the-Sun Operations (FtSO) that DSN is moving toward. The DSN
Operation’s CEP systemDSN Complex Event Processing (DCEP) systemprovides the very building blocks
needed to achieve the FtSO’s objective: significant reduction of operations cost.
Currently still in the prototyping stages, the DCEP system has been developed to answer a number of questions:
1) Can CEP really help realize better automation in the DSN?
2) What current problems within DSN Operations does CEP actually solve?
3) What new use cases can CEP handle, that serve as enablers for the FtSO?
These are the questions that this paper addresses in the following sections. In the process of planning and
developing the DCEP system, a number of hurdles that exist in the DSN have been identified, and they are distilled
in Section II. After a brief description of FtSO (Section III), a more in-depth discussion of what CEP is and how it
works is provided in Section IV. Section V presents a set of six real use cases, which the DCEP team used or is
planning to use the DCEP system to handle, in order to validate CEP’s usefulness in the DSN. Sections VI and VII
introduce two open source software that form the core of the current DCEP system; the sections explain why these
were selected and what role they each play in the overall DCEP framework. Section VIII revisits the six use cases and
describes the CEP solution for each. Section IX provides additional details on how the CEP rules can generally be
categorized, and how machine learning can be incorporated to improve CEP. Section X lists some of the work that
remains for the DCEP project, and Section XI concludes the paper.
In this paper, when the term CEP is used, it refers to the processing of complex events, and not the specific
system developed to perform CEP. When the term “DCEP system” is mentioned, it refers to the specific CEP system
that the authors developed for the DSN (in other words, DSN’s CEP system).
Figure 2. DSN facilities. Counterclockwise from top left: Goldstone DSCC, Madrid DSCC, Canberra DSCC,
and NOCC (JPL).
Downloaded by 99.189.1.209 on December 26, 2018 | http://arc.aiaa.org | DOI: 10.2514/6.2016-2375
American Institute of Aeronautics and Astronautics
4
II. Current Challenges in DSN Operations
Currently 49 missions, ranging in wide variety including interstellar missions such as Voyager, planetary missions
such as Mars Science Laboratory, and Earth satellite missions such as Spitzer Space Telescope, all use the DSN for
their spacecraft communication needs. These are not just NASA missions but include those of other space agencies
around the world, such as the European Space Agency (ESA), the Japan Aerospace Exploration Agency (JAXA), and
the Indian Space Research Organisation (ISRO). The DSN has contributed to the successes of these current missions
and numerous other past missions by continuously supplying reliable, high-performing space communication services.
In addition to providing communication links to spacecrafts, the DSN itself serves as a valuable scientific instrument.
Owing to its catalog of large antennas and precision equipments, it plays a significant role in radio astronomy studies,
such as those involving near-Earth objects.
However, the DSN has existed for over 52 years, and throughout those years it has gradually grown in terms of
size, capabilities, and complexity. As technology advanced over those years, the computing industry adopted different
system and communication standards that came and went in phases. Meanwhile, new subsystems continued to be
developed for and delivered to the DSN during that time, and this resulted in modern systems coexisting and
interacting with outdated, legacy systems that were introduced yearseven decadesbefore. To address the ever-
growing complexity in the DSN and to bring its overall capabilities up-to-date, a number of modernization efforts
were undertaken throughout DSN’s history. Series of improvements were made to its architecture and infrastructure.
However, a complete revamp of the DSN and its comprising elements is simply too costly, and so despite the earnest
reengineering efforts, much of the DSN remains outdated and heterogeneous. This situation has held back any
improvements beyond what would be considered marginal, or at best incremental, in the DSN, particularly in its
operations.
A major goal for the DSN is to reduce the cost required for its operations. The financial budget allocated for the
DSN is appropriated annually, and understandably, much of it is allocated to the maintenance and incremental
upgrades of its existing infrastructure, which includes many aged equipments and physical structures. A significant
portion of the budget is also spent on DSN’s day-to-day operations, and this is the area in the DSN where there is
continuous concern about costs. The NASA Office of Inspector General recently audited the DSN, and they concluded
that reduced budgets for the DSN have resulted in several deficiencies in its operation. This finding also warned that,
as a result, some of DSN’s future plans may be in jeopardy.1 Finding ways to cut spending in DSN Operations is now
more critical than ever.
The current high cost of DSN Operations is a direct result of DSN’s legacy, complex, and heterogeneous
infrastructure. DSN Operations still has a heavy reliance on human operators for much of its activities, particularly on
the Link Control Operators (LCOs) who manage the spacecraft tracks.* There currently is an automation software in
use by the DSN Operations, called Temporal Dependency Network (TDN). Actions based on planned sequence of
events that the LCOs would normally perform manually are automated using the TDN in scripts. Although very useful,
the capabilities and scope of this automation is limited, and a human operator still needs to be dedicated to monitor
and control a link (in recent years, up to two links2) for actions that the TDN cannot perform. Also, virtually all analysis
of anomalies, trends, and patterns need to be performed manually, as the current tools and processes in the DSN
Operations provide no automation for those tasks.
If a greater extent of operator and analyst responsibilities can be automated, it would lead to huge cost savings in
operations and improved DSN performance. In addition, the groundwork established for such greater level of
automation may give rise to opportunities in extracting new insights and faster operational intelligence. However,
DSN currently has technical characteristics that presents a significant challenge in introducing such comprehensive
automation. Here are some of those characteristics:
A. Heterogeneous Data
Table 1 shows a sample of the various forms of data being produced by their different sources, some of which the
human operators in the DSN constantly monitor and inspect. These include real-time streams of monitor data
generated by the individual subsystems active in a link and also by the Network Monitor and Control (NMC) software
* In DSN Operations, the term “link” is used to denote the logical connection of subsystems and equipments that allow
them to interoperate in support of one or more spacecraft tracks.
Not all of these types of data are required for automation of operations, but they are all valuable for analysis and
gaining deeper insight into operations.
Downloaded by 99.189.1.209 on December 26, 2018 | http://arc.aiaa.org | DOI: 10.2514/6.2016-2375
American Institute of Aeronautics and Astronautics
5
itself.* These monitor data flow through the Monitor Data Service (MDS), which is part of the Monitor and Control
Infrastructure Services (MCIS). Monitor data is exchanged in a publish-subscribe fashion, and accessing this data
requires the use of the NASA-proprietary MCIS library, which is based in the C language. Monitor data provides
continuous indication of the health statusas well as some activity and configuration statesof the different
subsystems.
Logs are another important category of data. NMC, TDN, subsystems, and others generate activity and system
logs that provide valuable information about both the real-time events that are occurring and also what have happened
in the past. The LCOs, using a graphical user interface (GUI) running on their computer workstations, issue directives
to NMC and subsystems in order to have them perform certain actions (e.g. lock on the signal) or change their
configuration states (e.g. changing the bit rate). Both these directives and the resulting responses are recorded and
retained in the NMC logs. Also, TDN-generated log entries (included as part of NMC logs) give indication of its
automation status, and the LCOs will monitor this in case they need to intervene and resolve issues.
The Service Preparation Subsystem (SPS) produces Schedule Items, and each contains information about a
spacecraft’s track, such as start and end times, the antenna used, and to which higher-level support activity it belongs
to. These Schedule Items are produced well in advance of the actual track, to support the operation’s planning
activities. SPS also generates the Sequence of Events (SOE) files, which lists in time order the configuration and
actions that subsystems need to take as well as the external events that are predicted to occur (e.g. Mars occultation).
SOEs are provided in both human-readable text file format andmore recentlyin XML. Both of these data types
are available for the future, but they are also archived. This also makes them useful for post-analysis. TDN relies on
SOEs and the aforementioned real-time monitor data for its automated execution of operator actions.
Another type of useful data is produced by the Light Time Physics Service (LTPS), which provides highly accurate
and precise astrophysics data in real-time. For example, a LTPS query can return the value of the one-way light time
between a DSN antenna and the spacecraft of interest, such as Voyager 2, at a specific moment in time. This
information can provide greater context during analysis: for example, helping to clarify the source of downlink signal
and noise level fluctuations observed during a track.
Also of important value and interest is the Service Quality Assessment (SQA) subsystem’s repository of historical
monitor data. The real-time monitor data mentioned at the outset of this subsection is completely transient. SQA
captures a key subset of the monitor data that and stores them in a database. The monitor data archived in this persistent
* NMC is a subsystem used to configure, control, and monitor other subsystems. It serves as the main conduit for
LCOs.
Table 1. Heterogeneous data in DSN Operations. (Italicized items are those data types that exist in the DSN
but have not yet been determined as being required for DSN Operations.)
Data Type
Tempora l
Nature
Producers
Source for
DCEP
Format
Monitor Data
Real-time
DSN subsystems
(includes NMC)
MDS
(Currently:
MON-2/JMS
bridge)
Proprietary
binary data
NMC Logs (contains
operator/TDN directives and
responses)
Real-time and
historical
NMC subsystem
(also TDN)
NMC FileSystem
(NMCFS)
Plain text
Other Logs
Real-time and
historical
Yet unknown
Yet unknown
Plain text
(most likely)
Other Real-Time Indicators
Real-time
Yet unknown
Yet unknown
Yet unknown
Schedule Items
Planned
SPS
SPS REST server
XML
SOEs
Planned and
predicted
SPS
SPS REST server
XML
Astrophysics Data
Predicted
LTPS
LTPS REST
server
Plain text
SQA Data
Historical
DSN subsystems
(includes NMC)
SQA server
PL/SQL
collections
DRs and MDRs
Historical
Operators
DRMS
Form data in
English
Downloaded by 99.189.1.209 on December 26, 2018 | http://arc.aiaa.org | DOI: 10.2514/6.2016-2375
American Institute of Aeronautics and Astronautics
6
data store ranges in time from 24 hours to several years in the past. This information repository can be valuable for
comparing patterns observed at the present time to snapshots in the past.
The final type of data that needs mention is the reports filed by the operators themselves, called Discrepancy
Reports (DRs). Anytime there is a failure to support a scheduled DSN activity or an interruption during a support, the
operator writes a DR and enters it into an online repository called the Discrepancy Reporting Management System
(DRMS). If there is a recurring problem, rather than file a separate DR with each incidence, a single Master DR (MDR)
is filed. Previously filed DRs and MDRs are useful when assessing a newly encountered problem because they inform
the operator whether or not the same problem has occurred in the past and what corrective actions were taken.
Currently, trying to accurately match a new problem with any of the problem signatures captured in existing DRs and
MDRs involves much manual analysis.
All of these different types of data, in their own unique formats and properties, are only available from disparate
sources. An intelligent system that can provide truly extensible automation and analytical capabilities must consume
all of them, and this is not a trivial objective to accomplish.
B. Inconsistent Naming Conventions
As mentioned earlier, over the course of time new subsystems have been developed and then integrated into the
DSN in stages, and the newer subsystems did not always stick to the conventions and interfaces of their legacy
predecessors. An area where this presents a problem is the lack of standard naming, particularly with monitor data.
For example, the Antenna Control Assembly (ACA) instances provide the monitor data of the 70-meter antenna in
Madrid and Canberra and the 34-meter high-efficiency (HEF) antennas. ACA publishes the current azimuth value of
its antenna using the character string “AZANG” and the elevation as “ELANG”. On the other hand, the Antenna
Pointing & Control Assembly (APCA) instances, which provide the monitor data of the 70-meter antenna in Goldstone
and the 34-meter beam waveguide (BWG) antennas, publishes the same pair of values under different character
strings: “AzimuthAngle” and “ElevationAngle”. This is merely one example of the yet undetermined number of cases
in the DSN where subsystems do not share a common naming convention.
This situation causes additional complexity for automation and analysis, as additional associations need to be
declared. In the example above, the system must somehow know that “AZANG” and “AzimuthAngle” monitor data
are not distinct data items, but instead, that any computation performed on one can equally be done on the other.
C. Distinct Attributes Among Same Subsystem Types
The same class of subsystems may exhibit different properties, and these inherent uniquenesses also present
challenges. Even among a single class of subsystems, one instance may produce data that have very different values
than another instance, although both of them are effectively in the same state and the data indicates that. Using again
the antenna pointing assemblies as an example, an assembly for an antenna at Canberra will generate “AzimuthAngle”
and “ElevationAngle” pair of monitor data as 0.10 and 90.07, respectively, when the antenna is in the stow position.
The assembly for another antenna in the same complex, however, will generate the pointing angle values of 45.00 and
89.00 when stowed. These types of attribute discrepancies among seemingly identical subsystems serve to add another
layer of complexity.
D. Non-Scaling Monitor Data Service
MDS, as well as its parent MCIS, were created in the 1990s to provide a common monitor and control
communication infrastructure for the operators and the DSN subsystems. It had a robust design, with redundancies
and failover features built in, and even to this day continues to provide its services reliably. In the years that followed,
however, more and more MCIS client software were produced, and it became apparent that MCIS, particularly its
MDS server software, is prone to suffer degraded performance and even failures if there are large number of clients
taxing its services at the same time. This is an issue because we cannot freely introduce new software that subscribes
to monitor data to the MCIS ecosystem, for the fear that DSN’s support of missions will be adversely affected, even
causing complete outages. Furthermore, the MCIS client library also has its limitations, where it simply cannot handle
large numbers of monitor data subscriptions. The subsystems in the DSN, as a collection, produce thousands of
monitor data items at very high rates. A standard MCIS client is unable to subscribe to all of these items and not have
its memory usage grow unbounded.
Changing the MCIS design or even replacing it entirely with one that can scale to meet the growing demand on its
services is deemed too costlyboth in terms of finance and risk. So in order to augment the DSN to enable better
automation and operational intelligencewhich will definitely need access to monitor data in real-timewhile
maintaining the same level of service quality for the missions that it supports, the infrastructural handicap that exists
in MCIS needs to be overcome.
Downloaded by 99.189.1.209 on December 26, 2018 | http://arc.aiaa.org | DOI: 10.2514/6.2016-2375
American Institute of Aeronautics and Astronautics
7
E. Restricted Wide Area Network Bandwidth
Another limitation in the current DSN infrastructure is in the network’s physical layer: Its wide area network
(WAN) has a restricted bandwidth. This severely curbs many potential capabilities in the DSN, particularly remote
operations and DSN-wide analytics. Much raw data generated at the DSCCs need to be downsampled, or perhaps
curated, before they are transmitted over the WAN. This, in fact, is the case: DSN replicates in real-time just a couple
of hundred monitor data items from the DSCCs to the NOCC, and the rate is downsampled. This reality is less than
ideal, because products we can extract from analysis and operational intelligence can only be good as the data being
collected.
The sample of technical issues just discussed leads to a number of undesirable realities in the current DSN
Operations. For one, automation becomes non-trivial. Also, analyzing situations and making decisions is slowed due
to the fact that much of the process is manual. There are anomalies and deviations from the norm that go undetected
because there is not a system or process that will take into account all the metrics available. Similarly, opportunities
are being missed for recognizing patterns and trends that can provide knowledge that contribute to improving the
operations. Because a system that can provide fast, intelligent notifications and recommendations to the human
operators does not yet exist, the operators continue to monitor the subsystems and make all the decisions without any
help, except for the small-scale automation that TDN currently provides. In the big picture, these all result in
continuous heavy reliance on human operators, leading to the high cost of DSN Operations.
III. Follow-the-Sun Operations
JPL has undertaken a project called Follow-the-Sun Operations (FtSO) to improve DSN operation’s cost-
efficiency in the upcoming years.2 FtSO involves a pair of shifts from the current operational paradigm:
A. Remote Operations (RO)
Presently, LCOs are staffed at each DSCC twenty-four hours a day, seven days a week. There are three working
shifts per day, at every DSCC, round-the-clock. FtSO aims to reduce this staffing load by making use of remote
operations. In the new paradigm, each DSCC will have only one working shift per day during daytime, and LCOs on
duty will not only operate the local DSCC assets but also remotely operate assets in the two other DSCCs. The term
“follow-the-Sun” derives from this strategy, that all DSN Operations are handled at the DSCC where the sun is visible
in the sky.
B. Three Links per Operator (3LPO)
As mentioned in the previous section, each LCO currently manages up to two links at a time. In FtSO, each operator
will handle up to three links. This will reduce the required human staffing even further.
FtSO is promising in that the efficient staffing of human operators will achieve significant savings in operation
costs, while continuing to provide high-quality service to DSN users. In order to realize FtSO, however, the technical
difficulties explained in the previous section need to be addressed. Feasibility of RO is contingent upon larger amounts
of information being exchanged between the DSCCs over the WAN than what is currently taking place. If the switch
to 3LPO happens without reducing the LCOs’ work load per linkby at least one-third, theoreticallyit will lead to
overburdening the staff. To solve these problems, the research and technology development task DCEP is investigating
these challenges as part of the FtSO project.
IV. Complex Event Processing
CEP, as a technical term, is a method of taking as inputs a variety of real-time data from different sources, and
then combining them to determine their significance and to take action.3 The data may be of different types, and they
may contain information that are seemingly unrelated. A CEP system transforms the data as necessary, and then
analyzes them to detect relationships, trends and patterns. This continuous process often results in real-time actions,
in addition to producing artifacts for analysis that provide deeper insight into the collective data.
Many industries have been using CEP successfully for years: financial markets, retail, homeland security,
intelligence agencies, social applications, et cetera. As an example, when a person’s credit card is swiped at a sales
register in Pasadena, California, but then the same card is swiped again at a register 2220 kilometers (1379 miles)
away in Houston, Texas, merely five minutes later, the second transaction is automatically denied. This is because the
CEP system at the credit card company quickly processed the credit card information of these two transactions, the
times and geographical locations at which these transactions were made, and also probably the historical usage pattern
Downloaded by 99.189.1.209 on December 26, 2018 | http://arc.aiaa.org | DOI: 10.2514/6.2016-2375
American Institute of Aeronautics and Astronautics
8
of the credit card, then it determined that there likely is a fraudulent activity among the two transactions that just took
place. The different pieces of information involved in this analysis may have been collected from very different data
sources. This is an example of how CEP detects meaningful events.
CEP is also useful for predicting meaningful events that are likely to occur. In the stock market, for example, many
of its traders depend on algorithmic and high-frequency trading systems to maximize their profits. The success of such
trading systems is largely driven by their ability to simultaneously process large volumes of information, some of
which that are discrete in nature. To illustrate, a CEP system in this domain may not only process real-time streams
of global stock prices and foreign exchange rates but also news headlines. News of company mergers and wars in
countries are just some of the data that can drive prices up or down in markets. A CEP system with the intelligence to
accurately predict what events will likely follow (e.g. plummeting stock prices) will produce actions to make good
use of opportunities and to avoid disasters (e.g. unload the entire portfolio of stocks that are about to lose value).
The real advantage of CEP systems over other types of systems such as business rules engines is its ability to
correlate data, thereby framing data into contexts. To illustrate its importance, we can use as an example a real-life
incident that occurred in 2012. A young Irishman posted a message on Twitter, saying: “Free this week, for quick
gossip/prep before I go and destroy America.”4 Few weeks later when he arrived at the Los Angeles International
Airport, he was taken into custody by the Department of Homeland Security agents. The United States government
constantly scans social networking services for information that may suggest upcoming acts of terror, and in this case,
the young man’s tweet had become a serious item of interest because of his use of certain key words (“destroy
America”). After interrogation, however, it was clear that the young man had no intention of committing terrorist acts.
The word “destroy” is a common British slang for partying and getting drunk, similar to the American slang “(getting)
trashed.” Had the word-watch system taken into accountor put into contextthat the originator of the message is
from the British Isles and in that region the word “destroy” may be used as a slang to mean something that is not so
threatening, the traveler could have been spared from his ordeal. Although this particular incident did not result in any
permanent or serious damages, there are situations where intelligent association of information to form proper contexts
can either lead to important consequences or avoid disastrous ones. CEP systems can be used for processing,
recognizing, and correlating such different types of relevant information, as well as for supporting temporal correlation
of events.
Many industries such as financial, retail, and social media have leveraged CEP to reduce their operating costs,
while at the same time adding improved services and capabilities that simply would not have been possible without
it. More recently, these enterprise industries seem to be investing in two areas to improve their operations: business
intelligence (BI) and data science. According to Gartner, an information technology research company, BI “is an
umbrella term that includes the applications, infrastructure and tools, and best practices that enable access to and
analysis of information to improve and optimize decisions and performance.”* Data science is a relatively new,
multidisciplinary field that is focused on extracting knowledge and insights from data in various forms. As many
enterprises faced the challenge of extracting value out of their big data, this naturally ushered in data science. By
many definitions, DSN also is an enterprise, and using CEP as the enabling technology, BI and data science methods
can equally be applied to DSN Operations much to its benefit.
To begin with, CEP can provide solutions to the two main problems faced by the FtSO project, namely the
bandwidth restrictions in the DSN’s WAN and the heavy workload imposed on the LCOs under the 3LPO scheme.
Rather than transmitting the entire large scale of data produced at a controlled DSCC to the controlling DSCC (RO
complex), the CEP system running at the controlled DSCC can process all the data onsite, and then transmit only the
events or data items of interest to the RO complex. The objective is to ‘find signal from the noise’ of endlessly
generated volumes of data and use the WAN resources only for these ‘signals’. This entails the CEP system processing
all of the heterogeneous data available at the DSCC, from real-time monitor data to SOEs to DRMS records and
correlating them. (See Figure 3.) Meanwhile, as the CEP takes over more of the data analysis and forensics on behalf
of human operators, the amount of attention and routine actions required by the LCOs can be reduced. As automation
is pushed towards a more lights-out process, LCO’s workload can be managed to reasonable levels. To that end, the
following section discusses some of the specific use cases that the CEP system in the DSN Operations domainthe
DCEP system—needs to handle.
* http://www.gartner.com/it-glossary/business-intelligence-bi/
“Big data” can be defined as extremely large data sets that may be analyzed computationally to reveal patterns,
trends, and associations, especially relating to human behavior and interactions.
Downloaded by 99.189.1.209 on December 26, 2018 | http://arc.aiaa.org | DOI: 10.2514/6.2016-2375
American Institute of Aeronautics and Astronautics
9
V. Use Cases
Many use cases have been identified for which CEP can make the case for its usefulness by providing novel
solutions, as these use cases presently pose challenges to both the current DSN Operations as well as to the future
FtSO. In this section, six of them are introduced.
A. Standard Naming
As discussed in a previous section, DSN data currently falls short of having a uniform, standard naming scheme,
and this complicates automation and analysis. When an analyst queries for the azimuth and elevation values for all the
antennas, for example, the analyst should not have to worry about whether he/she has queried for them all or if there
is needed data that falls outside the query because of any naming differences. Someone who has many years of
experience in the DSN and has extensive knowledge of it may not consider this too big of a hassle, but this sort of
intricacy is what keeps the operating of the DSN heavily dependent on the domain experts. CEP should, therefore,
handle these naming complexities rather than leaving it to the operators and analysts to figure out and determine.
B. Framing Events in Context
The DCEP system should go above and beyond simple limit checking when alerting (or not alerting) the operators
about a situation. For example, when the downlink signal from the spacecraft is lost in the middle of a track, this can
indicate a fault and operator intervention may be required. However, it is also possible that the spacecraft simply has
entered a planetary, solar, or lunar occultation, in which direct visibility is lost. In this case, the loss of signal is not a
fault but rather a normal event. The LCO need not be needlessly alerted by this event, since there is no action that the
LCO can take to recapture the signal, and at the end of the occultation the downlink communication should resume
Figure 3. Using DCEP as the building blocks for FtSO. The DCEP systems running at each DSCC process
the voluminous, locally-generated data. After performing CEP on the data, only meaningful information is
transmitted across the WAN. For instance, when everything is going well at the remote DSCCs, the DSCC
serving as the RO Center does not need to receive much dataonly periodic notifications that indicate all
services are healthy.
Downloaded by 99.189.1.209 on December 26, 2018 | http://arc.aiaa.org | DOI: 10.2514/6.2016-2375
American Institute of Aeronautics and Astronautics
10
by itself. To put it humorously, the LCO can continue to have his coffee. Even if it is desirable that the operator be
alerted, he/she should be provided with the additional contextual information that the loss of signal was due to an
occultation and therefore is not a deviation in the DSN service.
It is worthwhile to consider the converse situation. If the spacecraft is in an occultation but the downlink subsystem
reports that it is receiving signal (possibly due to a malfunction), this would be an anomalous situation and the operator
may want to be alerted when this occurs, or at least have this incidence automatically recorded so that an investigation
can be performed later. With systems that perform only simple condition or limit checking, this sort of anomaly would
go undetected unless the LCO himself is monitoring the situation closely.
C. Detecting Deviations from the Norm
Some deviations happen so gradually over a long period of time that operations may not be aware of it until there
is a total failure. To illustrate, Deep Space Station (DSS) 43 at the Canberra (Australia) DSCC regularly tracks the
Voyager 2 spacecraft. Suppose that the operator notices the signal-to-noise ratio (Pc/N0) measured by the downlink
subsystem to be 0.1 units lower than the day before. Nonetheless, the operator is not concerned because the signal is
in lock and data is being received. Few days later, during another support for Voyager 2 (again on DSS 43 and at
around the same time of day), the operator notices that the Pc/N0 is back up by 0.1 units. So it all seems that things are
very stable. However, compared to the year before, the average Pc/N0 during those few days may actually be off by 10
units. There can be any number of reasons for this, such as failing hardware or loss of calibration. The takeaway is
that differences in these kinds of measurements may take place so gradually that unless statistical comparisons are
made between the current and the historical, some deviations will never be caught until visible failures occur.
Detecting these deviations early, thereby avoiding the after-the-fact investigation work and downtimes, would mean
higher quality of service to DSN customers.
D. Matching Incidents to Known Discrepancies
During operations, when there is a failure or interruption of service, one of the things that greatly assist problem
resolution is finding out whether or not the encountered anomaly has occurred in the past and how it was resolved. As
mentioned previously, in DSN Operations, these issues are recorded as DRs. If a particular discrepancy is a recurring
problem, it is instead documented as a MDR. So, for example, if a problem were to arise and the incident is quickly
matched to an existing MDR, that alone would save the operator from the work of filing a new DR and also the work
required to search through past DRs and MDRs to see if it is was a reoccurrence. This would be one of the most basic
benefits that DSN Operations can gain from this use case. Furthermore, the ability to match the new problem to an
existing DR or MDR in real-time will most likely provide the operator with useful knowledge of how to address the
problem, without his having to analyze and do problem-solving from scratch.
E. Generic Tools
Currently there is no way to quickly create an application or write a script that will easily interface with the different
data sources that exist in the DSN. Being able to do so will be extremely valuable, however, because over time new
service requirements are placed on the DSN and providing new capabilities can be expensive without such a
framework. For instance, in 2014, the JPL Executive Policy Committee asked that the DSN track the spacecraft
command-loss timer for each mission as part of the NASA Continuity of Operations Plan (COOP). Many DSN mission
spacecrafts carry a timer onboard: If a command from Earth is not received before the timer expires, the spacecraft
will then enter safe-mode. The new requirement has been placed on the DSN in order to avoid those situations. To
track when each of the spacecrafts will time out due to not receiving a command, the option that will provide the best
accuracy is to process the command counter data contained in the downlinked spacecraft telemetry. However, there is
no existing interface for the DSN to access each missions’ spacecraft telemetry in real-time. There is another option:
keep track of the last CLTU radiation time for each spacecraft. At best this would help estimate the timeout values,
since a spacecraft will not receive the command bits until the delay of one-way light time (OWLT) has passed, and
also there is no guarantee that the radiated data will be captured by the spacecraft. However, both types of data that
are needed for this estimation already exists inside the DSN: real-time monitor data that indicates when the last CLTU
was radiated and the table of timeout values for each spacecraft. In order to produce an automated tool that will process
these data and track the spacecraft command-loss timers, one would presently have to create a software application
that uses the MCIS library, and also undergo scrutinizing reviews and testing to ensure that the existing MCIS services
will not adversely be affected by the introduction of this tool into the real-time DSN Operations. Also, because of the
dependency on the MCIS library, the software may need to be written in C or C++, severely limiting the
implementation options. Ideally, analysts, engineers, and operators should be able to write custom tools quickly and
easily, that can process DSN data without the heavy overhead and inflexibilities in implementation.
Downloaded by 99.189.1.209 on December 26, 2018 | http://arc.aiaa.org | DOI: 10.2514/6.2016-2375
American Institute of Aeronautics and Astronautics
11
F. Postage Stamp
As part of one of the latest DSN Operations modernization effort, the Human Interfaces for Mission Operations
group at JPL is conducting a parallel study on a new user experience design for the LCOs. This new design would
provide real-time link situation information to the LCOs in a more intelligent and user-friendly way than what is
currently being provided. Figure 4 shows a sample of this new design. Because of the way the GUI elements are laid
out on the display, the name given to this new design is Postage Stamp. In order to provide this user experience, the
Postage Stamp system needs to process real-time data and produce higher-level, correlated information. The required
real-time data include information on downlink and uplink ranging, commanding, symbol status, stage of track,
predicts mode, antenna state, et cetera. In the existing DSN infrastructure, the Postage Stamp team would need to
create an MCIS client software to subscribe to the various monitor data that carry the aforementioned real-time
information, write their custom algorithms to derive the desired higher-level, correlated information out of those data,
and then have the resulting data displayed on the GUI. However, similar to the previous use case, this entails much
development and testing work. It would be far more desirable for the Postage Stamp system to access the data in a
much easier way, so that the Postage Stamp design team can focus on their primary work which is user experience.
So far, a number of use cases that CEP can address to improve upon the existing capabilities of the DSN have been
described. Successfully handling these use cases will bring much benefit to DSN Operations. In the following sections,
the approach that the DCEP team has taken to handle the use cases, with its design and implementation, will be
discussed.
VI. Fast Data Processing Using Apache Spark
A CEP system that will process the large volumes of real-time, historic, and predicted (commonly referred to as
“predicts”) data in the DSN needs to be high-performing and scalable. As a result of many years of development and
experience in this field by the enterprise industries, several viable productsmost of which are commercialthat
provide CEP are available in the market: TIBCO’s StreamBase, EsperTech’s Esper, SAP’s Event Stream Processor,
Red Hat’s Drools, to name a few. Products such as Splunk market themselves as a “platform for Operational
Intelligence.” The DCEP team evaluated some of these commercial off-the-shelf (COTS) products. As with many
Figure 4. Sample of the Postage Stamp display. Large numbers indicate the DSSes, and the smaller icons
display the different states of the links. Icons change colors depending on whether things are going as expected
(green) or not so much. (Image credit: Dr. Alexandra Holloway)
Downloaded by 99.189.1.209 on December 26, 2018 | http://arc.aiaa.org | DOI: 10.2514/6.2016-2375
American Institute of Aeronautics and Astronautics
12
COTS products, these solutions have a number of
significant disadvantages: high licensing costs, vendor
lock-in, and lack of freedom due to the software being
closed-source. It is unclear whether or not the DSN
operation costs can ultimately be reduced despite these
factors, but even just the investment in research and
prototyping work using these COTS solutions alone
would have incurred significant amount of cost. The
DCEP team therefore eventually focused attention on the
free and open source software solutions.
With the advent of big data challenges in enterprise
industries and other fields, a number of open source
software platforms that can perform large-scale parallel
and distributed data processing emerged. At present, one of the most popular of such platforms is Apache SparkTM.
Apache Spark is “a fast and general engine for large-scale data processing.”* Unlike Hadoop MapReduce, which is
another big data processing engine, Spark stores data in-memory rather than exclusively on disk. This allows Spark
to achieve processing speeds that are ten to hundred times faster than Hadoop MapReduce.5 Using Spark, it is possible
to process and combine large volumes of data from multiple sources fast, and so it can serve as a solid foundation for
complex event processing.
Apache Spark has a modular design where there is a core engine (Spark Core) and working on top of that engine
are four built-in libraries: SQL, Streaming, MLlib and GraphX. Figure 5 is a visualization of this design. Particularly
useful to the needs of CEP are the SQL and Streaming libraries (in future work, MLlib also). Spark SQL allows Spark
applications to query structured data using SQL, and these queries can be uniform regardless of the data source. To
show a simple demonstration on how this works, Figure 6 shows a sample Spark application code that uses Spark
SQL to extract data of interest. This particular code is written in the Scala language, although there are other language
choices, such as Java, Python, and R, for writing Spark applications. The example code loads the latest real-time
monitor data captured (saved as a JSON object in an Amazon GovCloud S3 bucket, just to demonstrate) and also
queries the historical monitor data repository in the SQA subsystem. The objective is to visually see if the Pc/N0 just
observed for the Cassini spacecraft is same or different than the historical average. Because both data are structured,
we can use simple SQL queries to extract the data. At the same time, we can use Spark’s MapReduce programming
pattern to create a flow of data transformations and actions. In Spark, no work is actually executed on transformations
of data but rather only on actions. Multiple transformations therefore can be chained together without triggering any
processing of the data. This lazy evaluation model is preferential when dealing with large streaming data sets, as it
ensures that only those computations directly related to the chosen action are executedothers are ignored.
* http://spark.apache.org/
This is called lazy transformations.
Figure 5. The Apache SparkTM stack of engine
(bottom) and libraries (top). (Image credit:
http://spark.apache.org/)
val sc: SparkContext // An existing SparkContext.
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val monDataDF = sqlContext.read.format("json").load(
"s3n://bucket/latest-mon-data.json").filter(monDataDF("mdItemName") ==
"receiver.pcno").filter(monDataDF("mission" ==
"CAS").registerTempTable("latestCASpcno")
val historicalDF = sqlContext.read.format("jdbc").options(
Map("url" -> "jdbc:oracle:thin:user/password@//sqahost:1521/archivedmondatadb",
"dbtable" -> "historydb")).load().filter(historicalDF("mdItemName") ==
"receiver.pcno").filter(historicalDF("mission" ==
"CAS").registerTempTable("historicalCASpcno")
sqlContext.sql("SELECT mdItemName, value FROM latestCASpcno").show
sqlContext.sql("SELECT mdItemName, AVG(value) FROM historicalCASpcno WHERE timestamp " +
"like '2015%'").show
Figure 6. Sample Spark application code in Scala.
Downloaded by 99.189.1.209 on December 26, 2018 | http://arc.aiaa.org | DOI: 10.2514/6.2016-2375
American Institute of Aeronautics and Astronautics
13
The Spark Streaming library allows
Spark applications to consume
continual streams of data in a scalable,
high-throughput, and fault-tolerant
way. It provides a number of (stream)
receivers out of the box for TCP sockets
and popular data streaming systems:
Apache Kafka, Apache Flume, Twitter,
ZeroMQ, and Amazon Kinesis. Custom
receivers can be implemented (say, for
example, one that subscribes directly to
MCIS data). Same transformations and
actions available in Spark Core can be performed on data received by Streaming. Figure 7 shows how Streaming
divides an incoming data stream into small batches, which then can be processed as any other data in Spark. However,
there are problems when the streaming data is blindly divided into microbatches: Two or more data items that should
be computed together may end up being split apart (e.g. the azimuth value in batch N and the elevation value in batch
N+1). With this in mind, the Streaming library provides windowed computations, which allow applications to apply
transformations over a sliding window of data whose size can be larger than that of the batch itself. The library also
offers configurable persistence levels, such as replicating the input stream of data into multiple nodes in order to
provide fault-tolerance (i.e. no streaming data is lost even when one or more nodes fail).
The DCEP team currently runs a Spark cluster of thirty-two CPU cores on different virtual machines in order to
prototype and demonstrate the CEP use cases. (See Figure 8.) A set of perpetual Spark jobs are run to process the
input stream of real-time monitor data from the DSCCs. As part of the processing, these real-time data are compared
against the planned operational events (SOEs of the spacecraft tracks) to check if the DSN support is proceeding
satisfactorily. Also, another job continuously correlates and translates the real-time monitor data to produce the JSON
data consumed by the Postage Stamp system. Much of these operations include SQL queries on the streaming data,
and all of the jobs are programmed as Spark applications in either Scala or Java. Further discussion on how this Apache
Spark-based framework is being used to handle the different use cases introduced in Section V will be included in a
later section.
The CEP Spark applications need access to different data sources in order to handle the DSN Operations use cases.
Some of the data types listed in
Table 1 have RESTful APIs *:
Schedule Items, SOEs, and
Astrophysics Data. The nature of
these data types is also “on-
demand”the CEP Spark
applications only need to pull or
fetch the data as needed, rather
than consuming them as streams.
Sizes of these data are small, in the
order of kilobytes, and so it is fairly
quick to retrieve them. Using
REST’s GET operation directly in
the application code is therefore
suitable. Accessing the archived
historical data in the SQA
subsystem, on the other hand, is
not as straightforward. The
repository contains last nine years
of collected DSN monitor data, and
its size is in the order of tens of
terabyteswhich will only grow
* REST stands for representational state transfer. If a system or an interface conforms to the REST constraints and
supports its operations, it is said to be RESTful. API stands for application programming interface.
Figure 7. Spark’s Streaming library splits the incoming data flow
into data batches of small sizes. These in turn can be processed just
like any other data set in Sp ark. (Image credit:
http://spark.apache.org/docs/latest/streaming-programming-
guide.html)
Figure 8. Web user interface of the Spark cluster’s master node.
Downloaded by 99.189.1.209 on December 26, 2018 | http://arc.aiaa.org | DOI: 10.2514/6.2016-2375
American Institute of Aeronautics and Astronautics
14
over time. The current API design for accessing this data involves use of PL/SQL functions*, with a set of historical
metrics pre-calculated by the SQA. This strategy helps ensure that data retrievals take a reasonable amount of time
and that the SQA subsystem’s resources are not overburdened (which would happen if the DCEP system directly
submits SQL queries on non-indexed data via JDBC). At this time, there is no interface for Spark applications to
access the DRs stored in DRMS. Because DRs are written by the operators in human language, they require a measure
of extract, transform, and load (or ETL, a common process in data warehousing) before they can be retrieved and
processed in a programmable way. Alternatively, machine learning may be used to have the DCEP system
automatically interpret the original DRs. This approach will be explored in future work. In the meantime, use cases
that involve DRs are handled by implementing the discrepancy conditions directly inside Spark applications.
The most critical data source for automating operations in the DSN is the real-time monitor data that are
continuously generated by the DSN subsystems. For receiving the input stream of this data, rather than implementing
a custom receiver in Spark that will tap directly into the MCIS infrastructure, the DCEP team implemented a more
modular data streaming configuration using an existing MON-2 to Java Message Service (JMS) bridge and the
Apache Kafka messaging system. In addition to monitor data, the DCEP system also leverages this messaging
configuration to also consume real-time logs, such as the NMC (including TDN) logs. The following section focuses
on this input data stream solution using Apache Kafka.
VII. Scalable Messaging Using Apache Kafka
Although Apache Spark allows for creating custom receivers for input stream data, the DCEP team opted for an
alternative solution. When evaluating and prototyping with different processing engines, a more cost-effective
approach is to modularize the data input stream separate from the processing system, so that different processing
applications can be swapped in and out without having to write custom code for the input adapters. So rather than
implementing a custom receiver in Spark that directly subscribes to the MCIS monitor data—DCEP system’s main
real-time data type, the DCEP team created a “plumbing” software that receives the monitor data and pushes it out to
a message bus. This is beneficial in at least two ways: (1) Because the monitor data is published to a common-access
message bus, as long as the message bus is scalable, any number of clients can consume the data without adversely
impacting the existing DSN service; (2) subscribing to the DSN monitor data actually requires much complicated
logic, so it is better to abstract this away from the processing engines under evaluation. (DSN monitor data subscription
contexts are dynamic and frequently change according to the current link configurations, so keeping track of these
link states and performing unsubscribe-subscribe
when these states change is imperative for assuring
continual flow of data.) In this data streaming
configuration, the DCEP team chose Apache Kafka
as the message bus.
Apache Kafka is a distributed publish-subscribe
messaging system that works well with Apache
Spark. Kafka shares many features in common with
other message queueing systems, such as using topics
as the primary abstraction for publishing and
subscribing to related data. However, Kafka has some
unique features that make it more advantageous over
traditional messaging systems. One of those features
is Kafka’s treating each topic partition as a log. A
new message placed into the partition is assigned an
incrementing offset and is simply appended to the
end of the “log.” (See Figure 9.) The responsibility
for ensuring reliable message delivery is on the
* PL/SQL stands for Procedural Language/Structured Query Language. It was developed by the Oracle Corporation
to allow procedural operations to be performed on their databases, extending the capabilities afforded by the standard
SQL alone.
JDBC stands for Java Database Connectivity. It is an API for the Java programming language. Java applications use
this API to connect to databases and access their data.
Strictly speaking, MON-2 refers to the DSN Monitor and Control Standard. At times, however, the term is used to
refer to the monitor data transport protocol, as is the case here.
Figure 9. In Kafka, a message topic is nothing more
than a partitioned log. Each appended message is
assigned an incrementing offset. (Image credit:
http://kafka.apache.org/documentation.html)
Downloaded by 99.189.1.209 on December 26, 2018 | http://arc.aiaa.org | DOI: 10.2514/6.2016-2375
American Institute of Aeronautics and Astronautics
15
consumer: The consuming client needs to track the offset of the last message it receives. If the consuming client’s
input stream is interrupted (or the client fails entirely), upon recovery, the consumer needs to provide Kafka the offset
value from which it wants to resume receiving data. This design allows consumers to come and go without much
impact on the Kafka cluster or other consumers. Coupled with other features such as replicable partitions and allowing
no more than one consumer group per partition, Kafka also takes advantage of fast sequential disk access which
reduces I/O overhead, resulting in high throughput.* Apache Kafka, like Apache Spark, is being widely embraced by
industry for scalable data processing tasks.
The plumbing software that does the actual subscribing to the DSN monitor data takes advantage of an existing
MON-2 infrastructure between the DSCCs and JPL. This infrastructure converts a limitedbut most importantset
of monitor data from the DSCCs into JMS messages and makes it available for other software applications in the JPL
network to subscribe to. As a precautionary measure, in order to not impact existing services that already make use of
the MON-2/JMS bridge, the DCEP team uses a replicated JMS broker (i.e. repeater of the main broker) and its own
MON-2/JMS bridge. This ensures that the DCEP team’s work does not have any impact on both the operational DSN
services and other JPL services that rely on the bridged monitor data. Figure 10 shows the overall architecture of the
DCEP system, its data sources, and the Kafka message bus.
VIII. Operational Intelligence
Using the CEP solution described thus far, each of the use cases that pave the way for fast and intelligent DSN
Operations introduced in Section V can now be handled. This section revisits those individual use cases and presents
the CEP solution for each.
A. Standard Naming
Addressing the lack of standard naming of data should be considered as a prerequisite for automating DSN
Operations because handling of other use cases can be made simpler as a result. Using the current CEP solution,
standard naming of the real-time monitor data is achieved by executing a name translation job in Apache Spark. This
involves two topics in the common Apache Kafka message bus: a raw input topic and a standardized name output
topic. Figure 12 shows how this flow of data and name standardization works. Currently the plumbing software adds
* http://kafka.apache.org/documentation.html#maximizingefficiency
Figure 10. Current architecture of the DCEP system. The dashed boundary line shows what is included
in the scope of the DCEP. NMC logs and DR/MDRs will be integrated in the future as additional input data
sources.
Downloaded by 99.189.1.209 on December 26, 2018 | http://arc.aiaa.org | DOI: 10.2514/6.2016-2375
American Institute of Aeronautics and Astronautics
16
additional metadata to each sample of monitor data to make it more meaningful, such as the originating subsystem,
timestamp, unit of measurement, a lengthier descriptive name, associated Schedule Item, DSS and spacecraft(s). In
the future, this task will be handled by the name translation job in Spark instead, so that all monitor data identity
transformation and association work contained in a single module, in addition to making use of Spark’s fast processing
capabilities. With the two Kafka topics, other client software have the option of easily consuming either or both the
untransformed real-time monitor data and the transformed. CEP Spark jobs that handle other use cases can simply
depend on the topic that carries the name-standardized data.
B. Framing Events in Context
By correlating the planned events data (Schedule Items and SOEs), operator/TDN log data (NMC logs), real-time
monitor data, and others (astrophysics data), CEP can provide a more intelligent situational insight than that of simple
condition or limit checking. A use case instance that has been demonstrated using the DCEP system is the
determination of whether or not a loss of spacecraft signal in the middle of a track is a deviation from the planned
service. For this demonstration, the Dawn spacecraft was selected as the candidate due to its frequent occultation
around the dwarf planet Ceres (September 2015). During a sample pass, the downlink receiver would record the Pc/N0
measurement (signal-to-noise ratio) as switching back and forth between -300 (which indicates that the measurement
is not valid) to ~25 decibel-Hertz. This measurement is published in real-time as monitor data. The LCO supporting
the track would not be startled to see this, since he would have the understanding that this is happening as a result of
Dawn orbiting around Ceres. If we took out the human operator from this use case, however, then CEP is needed to
indeed verify that the observed data is not a sign of an anomaly. Also, CEP itself can produce another metric that is
useful: how accurately the various data correlate to each other.
To apply CEP to this particular scenario, another type of data that the DCEP system needs to process in real-time
is the SOE. SOEs for Dawn include the predicted occultation times, both when the spacecraft will enter and exit each
occultation. In addition, Dawn’s SOEs even include the specific signal loss times as will be observed on Earth. This
latter event provides the best accuracy when trying to correlate the SOE data to the Pc/N0 monitor data’s drop to -300
dbHz. To allow DCEP’s Spark applications to access this SOE data on the fly, the DCEP team created a Java library
that uses SPS’s RESTful API to fetch any SOE in the order of seconds. Fetched SOEs are cached in memory until
their window for useful real-time correlation expires or new versions are available from SPS. The library also pre-
extracts a set of key events listed within the SOEs and makes them available through its API. (These events include
symbol rate settings, downlink receiver activation/deactivation, uplink transmitter activation/deactivation, occultation
Figure 11. DCEP’s name standardization Spark application(s) translates real-time data with nonuniform
identifiers into those trivially recognized by the consumers.
Downloaded by 99.189.1.209 on December 26, 2018 | http://arc.aiaa.org | DOI: 10.2514/6.2016-2375
American Institute of Aeronautics and Astronautics
17
times, signal loss times,
and so forth.) This allows
a Spark application that is
performing CEP to easily
and quickly access the
events of interest from
any applicable SOE in
real-time.
The demonstration
showed that the instances
where the Pc/N0 dropped
to -300 dbHz in the
particular Dawn pass
closely followed (in time)
the signal loss events
predicted in the SOE.
Figure 12 shows one
sample of such instance
from the demonstration.
Therefore, the loss of
signal observed is not a
deviation of service, and
the rest of the track can
proceed normally.
Although this result
confirms what the LCOs
can already determine
fairly quickly during a live pass, it is meaningful because it demonstrates just one building block out of many that can
be processed together and correlated to interpret situations within contextsall without a human operator involved.
Also, unless the LCO is very watchful, the fact that there was a time discrepancy of 7 seconds between the SOE’s
predicted signal loss time and the actual drop to -300 dbHz in the Pc/N0 measurement may not have been clearly
noticed during the track. These sort of metrics determined through CEPin this case, the differences in predicted
versus actualcan provide valuable information. They can indicate whether or not the DSN services are operating
optimally and if there exist problem areas that need to be examined.
C. Detecting Deviations from the Norm
Going a step further, by adding historical data to the mix of the complex events processed, the DCEP system can
automatically detect deviations from what has been the norm in the past. This opens up many possibilities for doing
live statistical analysis and thereby improving operations with the additional intelligence. For example, if a spacecraft
track is in progress at the moment, it can be useful to find out if the performance of the current pass is coherent with
that of the last pass, given that the two tracks’ configurations are similar. This performance can be measured on several
terms. If a track that has just ended recorded N number of good telemetry frames during its service duration time of
T, then comparing the N/T ratio against the spacecraft’s last similar pass (e.g. same DSS), last few passes, or the
historical average of such passes in the past Y years will undoubtedly yield useful information about pass performance.
For such analysis involving historical data, the DCEP system uses the SQA subsystem’s repository via the PL/SQL
interface previously mentioned. The interface allows CEP to retrieve those historical metrics that are periodically pre-
calculated so that correlations can be performed in real-time or near real-time.
Another type of useful analysis made possible by the DCEP system is the comparison of one element under
examination to its other counterparts in the DSN. For instance, comparing the historical performance of one 70-meter
antenna to that of DSN’s other 70-meter antennas could yield valuable information. Lastly, it is possible to determine
trendsgenerally over time or conditioned by external eventsusing CEP.
D. Matching Incidents to Known Discrepancies
When the DCEP system is able to use the DRMS repository as a data source and match ongoing situations in real-
time to incidents previously captured as DRs or MDRs, this will translate to saving much human effort when anomalies
are encountered during operations. In order for this to be possible, the DCEP system will need to be able to understand
Figure 12. Signal loss predicted in the SOE for Dawn (LOS) and the actual loss
observed (pcnoEst value of -300). Automatic correlation of information from
disparately different data sources will frame events in context, resulting in minimal
false-positive and false-negative real-time situation assessments.
Downloaded by 99.189.1.209 on December 26, 2018 | http://arc.aiaa.org | DOI: 10.2514/6.2016-2375
American Institute of Aeronautics and Astronautics
18
the problem symptoms written up in these reports. This presents a huge challenge because the reports are, for the most
part, composed in human language (English). The contents of these reports will have to be either translated or encoded
in some way in order for the DCEP system to accurately correlate the real-time situation with any recorded incidents
in these reports. As one example in this use case, there exists an MDR that documents a recurring problem where the
‘Downlink Channel Controller subsystem stops outputting telemetry to the Data Capture and Delivery subsystem.’
The report lays out what symptoms the operator should watch for: block count stopping when telemetry is in lock and
missing low-criticality progress messages in the NMC logs. Detecting this anomaly alone thus requires three data
sources: the MDR repository, monitor data (real-time), and the NMC logs (also real-time). Once the current situation
is matched to an existing MDR, the DCEP system can both immediately notify the LCO andif configured to do
sostart the execution of the recovery procedure which is also documented in the report. Much work by the DSN
operators have gone into documenting discrepancies that occur during operations. By leveraging upon that work, the
DCEP system provides a way for future operational anomalies to be handled in much more effective ways.
E. Generic Tools
In line with the use of open-source software Apache Kafka and Apache Spark, the DCEP system is designed to be
an open system. Interoperability through use of open standards and extensibility are two of the design choices made
with regard to the DCEP system. This allows the system to be ‘tapped into’ by other tools, and these beneficiary tools
can leverage the messaging and processing already performed by the DCEP system. There currently is a tool in use
by the DSN Project that proves this point. As mentioned previously, a new requirement was added to the DSN recently:
Track each DSN spacecraft’s on-board command-loss timer. Estimating these values requires access to the real-time
monitor data. Because the DCEP system was already in place, consuming the real-time monitor data stream, the data
was already on the Kafka message bus. From here, two choices could be made: implement the handling of the use
case in Spark or create a separate tool that will only use the DCEP system’s message bus. Since the CEP engine
(Spark) and its applications are still being experimented with (but the command-loss timer tracking is a real project
requirement that needs to be satisfied now), the DCEP team created a Java application that consumes the data from
Kafka and calculates the timer estimates from that data. (Missions-informed table of timeout values for each spacecraft
and astrophysics datafor the OWLT valuesmake up the other sources of data.) The results are then published
every few seconds on JPL’s intranet as a web page, so that DSN engineers and managers can readily access the latest
information showing the risk of a command-loss timeout event. Because the scalable, distributed messaging
infrastructure has been established as a part of the DCEP work, many generic tools such as the Command-Loss Timer
Tracker can now be created and immediately access the data streams (both raw and complex-event-processed) with
relative ease, extending the possibilities of operational intelligence further.
F. Postage Stamp
The use case to support the Postage Stamp user experience project is demonstrative of the processing that the
DCEP system can handle on behalf of the external users, sparing them from the nitty-gritty, low-level nature of the
DSN data. Postage Stamp requires real-time data that includes additional context. For example, not only does Postage
Stamp require the current azimuth and elevation angle values of a DSN antenna, it also needs to know the antenna’s
operating state at the moment: on point, slewing, stopped, stowing, stowed, et cetera. The subsystem that controls the
antenna movement does not publish these state values. Therefore, determining the real-time operating state of an
antenna requires an algorithm. For instance, to determine that the antenna is in the stow position, it is necessary to
first obtain its current azimuth and elevation angles, then compare that to the angle values when it is in the stow
position (known ahead of time), and also verify that the antenna angles have not changed in the last few seconds
(which could indicate that it is slewing or stowing instead). Since there are mechanical parts involved, the algorithm
also needs to take into account margins of error in the measurements. As a matter of fact, to complicate things, DSN
antennas do not share uniform stow position angles, so the algorithm also needs to handle such differences.
The DCEP team implemented these kinds of algorithmic data processing required by the Postage Stamp inside a
Spark application. As the necessary calculations are completed by the Spark jobs (which happens periodically, since
it is using the Spark Streaming library), a JSON data object is created and this is made available over the network
using a WebSocket. Postage Stamp then consumes this data and reflects the information directly onto its GUI. The
DCEP system takes care of much of the cumbersome processing work needed on the raw DSN data. In this way, future
technology advancement projects may no longer require expert knowledge of the DSN. The DCEP system can be used
to manage the complexity and shield the external users from being affected by DSN’s intricacies.
Downloaded by 99.189.1.209 on December 26, 2018 | http://arc.aiaa.org | DOI: 10.2514/6.2016-2375
American Institute of Aeronautics and Astronautics
19
IX. Rule Implementation
In order for the DCEP system to handle the aforementioned use cases and others, the set of logic that performs the
actual CEP have to be implemented inside Spark applications. An individual unit of such CEP logic can be referred
to as a rule. There are two key methods by which real-time decision-making, operator notification (e.g. alarms), and
action rules can be implemented in the DCEP system. First, the rule can be based on an expert-driven model of the
DSN system, such that threshold-based rules can be executed against incoming data streams and warehoused data
repositories to indicate the presence of special conditions. The second method is a data-driven model, where machine-
learned relationships between the data constituents of DSN streams and repositories identify the presence of
irregularities. The DCEP system will use both methodologies such that existing expert-based knowledge as well as
machine-learned relationships can dually play a role in fault detection and correlation analysis.
A. Expert-Based Rules
Based upon decades of successful operation of the DSN, a body of operational threshold-based rules has been built
up over the years, upon which real-time monitor data and other types of data can be judged. The DCEP team plans to
fully leverage this expert-based knowledge and implement it as a set of rules within the DCEP system’s knowledge
base, such that the system matches incoming data against relevant rules and makes decisions upon data arrival.
Implementation-wise, this knowledge base consists of a set of procedures, which map incoming data points against
rule conditionals that generate expert-driven decisions. This implementation model can allow for chaining of rules for
more complex conditionals. Although the knowledge base is not necessarily a logical programming inference engine,
the ability to add new rules or facts during runtime is not withheld and thus decision-making based on expert-rules
can increase in capability over time. In fact, it turns out that implementing logical rules by this method is a sound
means to adhere to the DSN operational requirements, where rules arein many casesexplicitly specified and
mandated. An example of such a case is the conditional alarm-sounding requirement for wind speeds at DSN antenna
complexes. Each DSN antenna type has specific requirements that dictate safe operational use based on outside wind
speeds. These conditions, for example requiring antennas to be driven to stow (pointing to zenith) if outside wind
speeds surpass 50 miles per hour, represent well-documented thresholds that require operator action. The DCEP
system can automatically match real-time situations to such predefined conditional rules, and it can then provide alerts
continuously for all antennas based on such rules being available in the system’s knowledge base.
B. Machine-Learned Rules
Where high-quality training data is available and usable,6 the DCEP system is able to derive machine-learned
relationships among constituent data that can help in more accurate automatic decision-making. The key idea is to
leverage machine-learning algorithms to train classifiers in mapping expected fault detection scenarios against
relevant input data, based upon relationships inherent in the training data. This methodology is in contrast to explicitly
programming threshold conditions by which to judge and correlate data. Regression models as well as deep-learning
techniques are viable candidates as methods to produce machine-learned, decision-making classifiers. The DCEP team
ran a prototype study of the viability of machine-learning for CEP. In this study, over thirty-three million data points
were processed to develop a spacecraft identity classifier. The results of this study showed that a machine-learned
approach is viable for CEP in the DSN. In many cases, however, the low-quality labeling of training data and the lack
of sufficient data points turned out to be a severe hindrance to effective classifiers. Irrespective of the exact machine-
learning approach used, one of the DCEP team’s goals is to leverage a knowledge base of procedures (in this case,
classifiers), which will be invoked during runtime based on real-time data characteristics. In other words, the same
high-level interface as the expert-driven rule system will be used so that the machine-learned and the expert-driven
conditionals for fault detection can be used interchangeably.
X. Future Work
Much work still remains in order to raise the technology readiness level of the DCEP system so that it can qualify
to assume operational responsibilities in the DSN. Upfront, there still remains additional data sources that need to
serve as input to the DCEP system: logs (e.g. NMC logs), DRs and MDRs in the DRMS repository, and possibly
others that have yet to be identified. Furthermore, to become operations-ready, the DCEP system’s existing data input
have to be improved to become more robust. This will involve switching from consuming the real-time monitor data
via the MON-2/JMS bridge to directly receiving them from the MCIS infrastructure at the DSCCs (in other words,
closer to the source). This change will enable key improvements in CEP: the propagation delay from the data source
to the DCEP system will be minimized, the entire set of monitor data at the DSCCs will be available for subscription
(whereas via the MON-2/JMS bridge, only a limited set can be subscribed), and the DCEP system will receive
Downloaded by 99.189.1.209 on December 26, 2018 | http://arc.aiaa.org | DOI: 10.2514/6.2016-2375
American Institute of Aeronautics and Astronautics
20
streaming data at its highest rate (whereas via the MON-2/JMS bridge, each individual monitor data item is metered
to five seconds). Making this switch, however, will require careful engineering to make sure that the existing DSN
services are not adversely affected by the introduction of the DCEP system in the operational ecosystem. Also, as the
use cases become more comprehensively defined, the interface between the DCEP system and the SQA subsystem
will undergo refinements, so that additional historical metrics can be made available for CEP. All of this work is
critical and, at the same, extensive.
It will be important for the operational DCEP system to work around the unavoidable WAN bandwidth limitation
that exists in the DSN. As mentioned, the strategy for overcoming this challenge is to process all voluminous DSCC
data locally, at the DSCC itself, and transmit only intelligent information that are smaller in size to the other DSCCs
and NOCC. The ultimate goal is to enable FtSO this way. Design-wise, this strategy entails two levels of CEP: CEP
being done at the local DSCCs and a global CEP that processes key events from the entire DSN. Forming the
requirements and design for this multi-level CEP in the DSN is another area of work that will be done in the future.
SQA is an excellent subsystem that warehouses a treasure trove of historical DSN monitor data that is valuable for
CEP, but there never was a requirement placed on the SQA for it to serve as a data provider to the DCEP system. The
current PL/SQL interface was designed to minimize the burden on the SQA subsystem when the DCEP system
accesses its data. It is now clear that this is a big data problem, and for the DCEP system to make more flexible use of
the historical data, a different solution may be needed. For instance, rather than using the SQA subsystem itself as a
CEP data source, a separate big data database (e.g. Apache Cassandra) can be created and initialized with a copy of
SQA repository’s data. As the DCEP system consumes real-time monitor data from the DSN, the data is also loaded
into this new database, thereby keeping the historical data constantly up to date. With this new elastic database, the
DCEP system will be able to access monitor data from the past and its statistics with greater freedom and with better
performance.
Another area of future work is devising a novel way for managing the CEP logic, or rules. As mentioned, there
exists a number of rule-based systems in the enterprise industry for doing business intelligence. The DCEP team will
investigate and glean from the solutions for rule management that already exist in these systems. Most likely, due to
the unique set of requirements and operational nature of the DSN, a customized strategy will have to be devised for
managing the CEP rules in the DSN.
Finally, it is important to ascertain the performance capabilities of the DCEP system as a whole as well as that of
its individual parts (i.e. Spark, Kafka, et cetera). Having the understanding of the system performance and its
limitations will inform further design choices for the DCEP system. To that end, the DCEP team plans to establish
testing scenarios that push the CEP and the messaging infrastructure to their limits, to get a clearer picture of what is
possible (and what is not) with the DCEP system.
XI. Conclusion
The DSN currently provides reliable, high-quality services to the missions that it supports, but further
improvements need to be made to its operational cost-effectiveness. Using complex event processing, greater
intelligence can be extracted from operations in real-time: anomalies can be detected quickly; false positives and false
negatives can be significantly reduced through the association of contexts; using the knowledge base of previously
documented discrepancies, newly encountered ones can be automatically labeled and remediated directly by the DCEP
system; and through machine learning, the DCEP system will train itself to detect events even smarter as time goes
by. These benefits are in addition to the immediate, more basic solutions that the DCEP system provides, such as
normalizing the irregularities in DSN’s data identifiers, deriving the higher-level state information through algorithms
for outside systems like Postage Stamp, and so forth. Further development, prototyping with more use cases, and
testing need to be done in order for DCEP to be formally accepted as part of DSN Operations; but so far, the progress
up to this point indicates that CEP is a truly viable solution that will help usher DSN Operations into the future and
serve as a key enabler for the Follow-the-Sun Operations.
Acknowledgments
The authors thank: Bach X. Bui, Silvino C. Zendejas, Ara Kassabian, Michael A. Rueckert, and Saman Saeedi for
their valuable contributions to the DCEP task; and Dr. Alexandra Holloway for providing the sample graphic of
Postage Stamp. The authors also give special thanks to Michael E. Levesque and Jay E. Wyatt for their support and
sponsorship of the task. The aforementioned individuals are all affiliated with JPL.
The work described in this paper was carried out at the Jet Propulsion Laboratory (JPL), California Institute of
Technology (Caltech), under a contract with the National Aeronautics and Space Administration (NASA). The work
was funded by the Space Networking and Mission Automation Tech Program.
Downloaded by 99.189.1.209 on December 26, 2018 | http://arc.aiaa.org | DOI: 10.2514/6.2016-2375
American Institute of Aeronautics and Astronautics
21
References
1Office of Audits, “NASA’s Management of the Deep Space Network,” Office of Inspector General, Report No. IG-15-013,
26 Mar. 2015.
2Johnston, M. D., Levesque, M., Malhotra, S., Tran, D., Verma, R., and Zendejas, S., NASA Deep Space Network: Automation
Improvements in the Follow-the-Sun Era,” 24th International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina,
2015.
3Davenport, T. H., “The Confusing Landscape of ‘Cognitive Computing’,” The CIO Report (The Wall Street Journal), 17 Dec.
2014, URL: http://blogs.wsj.com/cio/2014/12/17/the-confusing-landscape-of-cognitive-computing/?mg=id-wsj.
4Springer, K., “British Tourists’ Tweets Get Them Denied Entry to the U.S.,” TIME Newsfeed, 31 Jan. 2012, URL:
http://newsfeed.time.com/2012/01/31/british-tourists-tweets-get-them-denied-entry-to-the-u-s/.
5Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., Stoica, I., “Spark: Cluster Computing with Working Sets,” 2nd
USENIX Workshop on Hot Topics in Cloud Computing, Boston, MA, 2010.
6Jones, N., “Computer science: The learning machines,” Nature, Vol. 505, No. 7482, Jan. 2014, pp. 146148.
Downloaded by 99.189.1.209 on December 26, 2018 | http://arc.aiaa.org | DOI: 10.2514/6.2016-2375
... N ASA's Deep Space Network (DSN) is a globallyspanning communications network responsible for supporting the interplanetary spacecraft missions of NASA and other international users [1]. NASA's DSN is one of the largest and most sensitive telecommunications systems in the world, consisting of three large antenna facilities spaced approximately 120 degrees apart, allowing for constant communication between the DSN's ground stations and the spacecraft they service [1], [2]. ...
... N ASA's Deep Space Network (DSN) is a globallyspanning communications network responsible for supporting the interplanetary spacecraft missions of NASA and other international users [1]. NASA's DSN is one of the largest and most sensitive telecommunications systems in the world, consisting of three large antenna facilities spaced approximately 120 degrees apart, allowing for constant communication between the DSN's ground stations and the spacecraft they service [1], [2]. Located in Goldstone, CA, Madrid, Spain, and Canberra, Australia, each DSN complex contains one massive 70-meter antenna and up to four 34meter antennas [1]- [3]. ...
... NASA's DSN is one of the largest and most sensitive telecommunications systems in the world, consisting of three large antenna facilities spaced approximately 120 degrees apart, allowing for constant communication between the DSN's ground stations and the spacecraft they service [1], [2]. Located in Goldstone, CA, Madrid, Spain, and Canberra, Australia, each DSN complex contains one massive 70-meter antenna and up to four 34meter antennas [1]- [3]. The DSN supports a multitude of users across a wide variety of scientific missions, including lunar, earth orbiting, ground-based, and deep space missions [3]. ...
Article
Full-text available
NASA’s Deep Space Network (DSN) is a globally-spanning communications network responsible for supporting the interplanetary spacecraft missions of NASA and other international users. The DSN is a highly utilized asset, and the large demand for its’ services makes the assignment of DSN resources a daunting computational problem. In this paper we study the DSN scheduling problem, which is the problem of assigning the DSN’s limited resources to its users within a given time horizon. The DSN scheduling problem is oversubscribed, meaning that only a subset of the activities can be scheduled, and network operators must decide which activities to exclude from the schedule. We first formulate this challenging scheduling task as a Mixed-Integer Linear Programming (MILP) optimization problem. Next, we develop a sequential algorithm which solves the resulting MILP formulation to produce valid schedules for large-scale instances of the DSN scheduling problem. We use real world DSN data from week 44 of 2016 in order to evaluate our algorithm’s performance. We find that given a fixed run time, our algorithm outperforms a simple implementation of our MILP model, generating a feasible schedule in which 17% more activities are scheduled by the algorithm than by the simple implementation. We design a non-MILP based heuristic to further validate our results. We find that our algorithm also outperforms this heuristic, scheduling 8% more activities and 20% more tracking time than the best results achieved by the non-MILP implementation.
... DSN time series data is stored in real time in Kafka and then in ElasticSearch [6]. We access ElasticSearch to extract historical tracks and Kafka for ongoing tracks. ...
Preprint
Full-text available
The Deep Space Network is NASA's international array of antennas that support interplanetary spacecraft missions. A track is a block of multi-dimensional time series from the beginning to end of DSN communication with the target spacecraft, containing thousands of monitor data items lasting several hours at a frequency of 0.2-1Hz. Monitor data on each track reports on the performance of specific spacecraft operations and the DSN itself. DSN is receiving signals from 32 spacecraft across the solar system. DSN has pressure to reduce costs while maintaining the quality of support for DSN mission users. DSN Link Control Operators need to simultaneously monitor multiple tracks and identify anomalies in real time. DSN has seen that as the number of missions increases, the data that needs to be processed increases over time. In this project, we look at the last 8 years of data for analysis. Any anomaly in the track indicates a problem with either the spacecraft, DSN equipment, or weather conditions. DSN operators typically write Discrepancy Reports for further analysis. It is recognized that it would be quite helpful to identify 10 similar historical tracks out of the huge database to quickly find and match anomalies. This tool has three functions: (1) identification of the top 10 similar historical tracks, (2) detection of anomalies compared to the reference normal track, and (3) comparison of statistical differences between two given tracks. The requirements for these features were confirmed by survey responses from 21 DSN operators and engineers. The preliminary machine learning model has shown promising performance (AUC=0.92). We plan to increase the number of data sets and perform additional testing to improve performance further before its planned integration into the track visualizer interface to assist DSN field operators and engineers.
Conference Paper
NASA's Deep Space Network (DSN) is a large, globally-spanning organization consisting of communications facilities which are spread across the globe. With an increasing number of missions exploring deep space, the demand placed on the DSN is ever-growing. The DSN requires fast decision making capabilities to combat the over-subscribed situations faced during high demand time intervals. This paper introduces a mathematical formulation for the DSN scheduling problem as a Mixed Integer Linear Program (MILP), and uses it to identify a feasible schedule for large-scale instances of the DSN scheduling problem. We demonstrate that our MILP model encapsulates many of the real-world constraints relevant to the DSN scheduling problem, showing promise as an automated tool that can be used to help DSN operators build real schedules in the future.
Article
An essential requirement of large-scale event-driven systems is the real-time detection of complex patterns of events from a large number of basic events and derivation of higher-level events using complex event processing (CEP) mechanisms. Centralized CEP mechanisms are however not scalable and thus inappropriate for large-scale domains with many input events and complex patterns, rendering the horizontal scaling of CEP mechanisms a necessity. In this paper, we propose CCEP as a mechanism for clustering of heterogeneous CEP engines to provide horizontal scalability using adaptive load balancing. We experimentally compare the performance of CCEP with the performances of three CEP clustering mechanisms, namely VISIRI, SCTXPF, and RR. The results of experiments show that CCEP increases throughput by 40 percent and thus it is more scalable than the other three chosen mechanisms when the input event rate changes at runtime. Although CCEP increases the network utilization by about 40 percent, it keeps the load of the system two times more balanced and reduces the input event loss three times.
Chapter
Cognitive aids have long been used by industries such as aviation, nuclear, and healthcare to support operator performance during nominal and off-nominal events. The aim of these aids is to support decision-making by providing users with critical information and procedures in complex environments. The Jet Propulsion Lab (JPL) is exploring the concept of a cognitive aid for future Deep Space Network (DSN) operations to help manage operator workload and increase efficiency. The current study examines the effects of a cognitive aid on expert and novice operators in a simulated DSN environment. We found that task completion times were significantly lower when cognitive aid assistance was available compared to when it was not. Furthermore, results indicate numerical trends that distinguish experts from novices in their system interactions and efficiency. Compared to expert participants, novice operators, on average, had higher acceptance ratings for a DSN cognitive aid, and showed greater agreement in ratings as a group. Lastly, participant feedback identified the need for the development of a reliable, robust, and transparent system.
Article
Ubiquitous learning applications and worldwide educational websites such as MOOC (Massive Open Online Courses) are rapidly producing large volume of user data. Current delayed analysis processing in adaptive language learning systems is difficult to cope with the high-speed and high-volume data streams. To overcome this problem, we introduce a complex event processing (CEP) framework for an Adaptive Language Learning System. The system consists of an event adapter sub-system that can process various inputs such as voice, video, text and other interaction events. The event adapter extracts relevant data to support the operational events module, the learning activity events module and the learner knowledge space events module. These three modules in the event hierarchies provide support to the learner adaptation and learner visual analytics modules. In this study, we conduct three simulations to evaluate the initialization time, delay time and throughput of the proposed system. Each of the experiments simulates 1000 learners and 1000 rules and generates 10 events per second. The results indicate the CEP framework is efficient with a processing delay of less than 1.2μs and throughput of 80,000 events per second. We conclude by discussing the study’s implications and suggest ideas for future research.
Article
Using massive amounts of data to recognize photos and speech, deep-learning computers are taking a big step towards true artificial intelligence.
NASA's Management of the Deep Space Network
  • Audits Office Of
Office of Audits, "NASA's Management of the Deep Space Network," Office of Inspector General, Report No. IG-15-013, 26 Mar. 2015.
NASA Deep Space Network: Automation Improvements in the Follow-the-Sun Era
  • M D Johnston
  • M Levesque
  • S Malhotra
  • D Tran
  • R Verma
  • S Zendejas
Johnston, M. D., Levesque, M., Malhotra, S., Tran, D., Verma, R., and Zendejas, S., "NASA Deep Space Network: Automation Improvements in the Follow-the-Sun Era," 24 th International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 2015.
  • T H Davenport
Davenport, T. H., "The Confusing Landscape of 'Cognitive Computing'," The CIO Report (The Wall Street Journal), 17 Dec. 2014, URL: http://blogs.wsj.com/cio/2014/12/17/the-confusing-landscape-of-cognitive-computing/?mg=id-wsj.
British Tourists' Tweets Get Them Denied Entry to the U.S
  • K Springer
Springer, K., "British Tourists' Tweets Get Them Denied Entry to the U.S.," TIME Newsfeed, 31 Jan. 2012, URL: http://newsfeed.time.com/2012/01/31/british-tourists-tweets-get-them-denied-entry-to-the-u-s/.
Spark: Cluster Computing with Working Sets
  • M Zaharia
  • M Chowdhury
  • M J Franklin
  • S Shenker
  • I Stoica
Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., Stoica, I., "Spark: Cluster Computing with Working Sets," 2 nd USENIX Workshop on Hot Topics in Cloud Computing, Boston, MA, 2010.
NASADeepSpaceNetwork:Automation ImprovementsintheFollow-the-SunEra
  • M D Johnston
  • M Levesque
  • S Malhotra
  • D Tran
  • R Verma
  • S Andzendejas
Computerscience:Thelearningmachines
  • N Jones