Figure 5 - uploaded by Roselyne Barreto Tchoua
Content may be subject to copyright.
Calculator flowchart 

Calculator flowchart 

Source publication
Article
Full-text available
The emergence of leadership class computing is creating a tsunami of data from petascale simulations. Results are typically analyzed by dozens of scientists. In order for the scientist to digest the vast amount of data being produced from the simulations and auxiliary programs, it is critical to automate the effort to manage, analyze, visualize, an...

Context in source publication

Context 1
... presents the scientists with pre-defined visualizations. In other words users visualize the output of pre-determined workflows which go through a series of operations on raw data to generate images and movies. The purpose of analysis on the dashboard is to help users extract scientific knowledge by post-processing the data themselves, giving them more choices and options in manipulating information. There are currently two approaches to integrating analysis on the dashboard. The first one is the eSimMon calculator and the second one is using external tool notably R . These two methods were implemented after gathering a set of commonly used analysis routines across applications, from simple subtractions to statistical analysis on physical variables. This type of analysis can be considered as “built - in” the dashboard. In the future eSimMon will include a “plug - in” capability, allowing users to upload and run their own customized analysis routine. The calculator is a tool originally implemented for XGC1 variables. It works with xy-plots generated from NetCDF data files and can be extended to other types of xy-plots. Figure 3 shows a snap shot of the calculator. The idea is as follows. When users look at movies (flv files) on the dashboard we want them to think of variables such as density, temperature etc. This is more intuitive for scientists. They do not want to think of tracking a .png file back to the original NetCDF file to perform analysis. Let‟s say they quickly want to subtract two different types of densities for which data is stored in different files. This may not be an operation so common that it is part of the monitoring workflow, yet it may be telling about a certain phenomenon before performing more complex analysis. The purpose of the calculator is to provide a quick and easy way to do simple operations to observe a particular trend without getting into too many details. The back end component of the calculator works as a serial process, a series of steps to generate a result movie. On the front end users are oblivious to analysis workflows. They simply alternatively click on different plots to select variables and on calculator buttons to select operators. For example for an XGC1 run, to take the difference between the same variable ion__density from different runs a user would click on the first variable, then select the minus sign on the calculator, then the second variable and finally hit the equal button on the calculator to start the analysis. Figure 4 tentatively describes this process on the front end while Figure 5 is a flowchart of the back end processing. Simple checks are done as the user performs these steps to guide him or her and make sure the operation does not contain obvious problems such as two variables side by side without operators in between. The user sees the operation being composed as he clicks on different variables and operators. The goal is to be as intuitive and simple as possible on the front end while a more involved process gets started on the back end. First the back end needs to identify which shot (the dashboard allows viewing results from two different shots at a time), which variables and which symbols are in the expression. The back end then checks whether or not there is provenance information available for each variable. When this is done, it proceeds to check if there is an analysis directory for this particular user and shot; if not it creates it. Since the raw data is in NetCDF files in our case, the back end runs a conversion routine to create separate ASCII files containing data for single variables at single time steps. After that it creates a separate python routine to perform the actual operation. It replaces each operator by their python equivalent and replaces the variables by filenames. The python routine loops through files and time steps to generate a series of images. The back end then runs the python program, and then runs a command to generate a movie from all the images to finally return an flv file. The steps may seem involved but the user has no knowledge of it. As he or she clicks on the equal sign, a new analysis appears on the left menu and when he or she drags it on the main panel, the movie appears. Some shots have several hundreds of time steps; in these cases the process does not seem instantaneous. Users see a “buzy” cursor which lets them know when the analysis is done and they can drag and drop its result. Currently the analysis intermediate data is stored in ASCII files, which becomes slow as the simulation data grows. In the future we plan on writing this data files in a more efficient format such as NetCDF instead of using ...

Similar publications

Conference Paper
Full-text available
It is difficult to write parallel programs that are correct. This is because of the potential for data races , when parallel tasks access shared data in complex and unexpected ways. A classic approach to addressing this problem is dynamic race detection, which has the benefits of working transparently to the programmer and not raising any false ala...

Citations

... In addition Adobe has invested considerable effort in making their ubiquitous Flash Player fast, robust, scalable and reliable. Among the Web 2.0 technologies available, Flash provides all the desired features for the dashboard, including interactivity and cross-browser consistency but more importantly scalability and support for videos and vector graphics [16]. On the back end, the eSimMon dashboard uses PHP and a MySQL database to make the links between user requests on the interface and raw data files. ...
Research
Full-text available
Abstract—The purpose of the Fusion Simulation Project is to develop a predictive capability for integrated modeling of magnetically confined burning plasmas. In support of this mission, the Center for Plasma Edge Simulation has developed an End-to-end Framework for Fusion Integrated Simulation (EFFIS) that combines critical computer science technologies in an effective manner to support leadership class computing and the coupling of complex plasma physics models. We describe here the main components of EFFIS and how they are being utilized to address our goal of integrated predictive plasma edge simulation.
... In addition Adobe has invested considerable effort in making their ubiquitous Flash Player fast, robust, scalable and reliable. Among the Web 2.0 technologies available, Flash provides all the desired features for the dashboard, including interactivity and cross-browser consistency but more importantly scalability and support for videos and vector graphics [16]. On the back end, the eSimMon dashboard uses PHP and a MySQL database to make the links between user requests on the interface and raw data files. ...
... To that end, we built the Framework for Integrated End to End Scientific Data Management Technologies for Applications (FIESTA) shown inFigure 1. The foundation components of FIESTA are: 1) fast adaptable I/O, 2) workflow management, and 3) a collaborative portal we refer to as the eSimMon [2] dashboard. FIESTA's auxiliary components are components for code coupling, provenance and metadata capture, wide-area data movement, and visualization. ...
Article
Full-text available
Collaboratively monitoring and analyzing large scale simulations from petascale computers is an important area of research and development within the scientific community. This paper addresses these issues when teams of colleagues from different research areas work together to help understand the complex data generated from these simulations. In particular, we address the issues when geographically diverse teams of disparate researchers work together to understand the complex science being simulated on high performance computers. Most application scientists want to focus on the sciences and spend a minimum amount of time learning new tools or adopting new techniques to monitor and analyze their simulation data. The challenge of eSimMon, our web-based system is to decrease or eliminate some of the hurdles on the scientists' path to scientific discovery, and allow these collaborations to flourish..
... True artificial intelligence is still the " stuff " of science fiction stories and possibly future. To avoid and mitigate mistakes in computer-driven modeling and decision making we often include a human into " the loop " — for example, FAA flight controllers, real-time network and computer security threat identification and mitigation analysts, inclement weather decision making personnel, or scientists involved in very expensive high-end simulations, e.g.,[1]. Well designed complex computer analytics and decision making systems, and workflows, include points where humans are inserted into the process to monitor, augment, direct, take over, or stop processes. ...
Article
Today real-time analytics of large data sets is invariably computer-assisted and often includes a "human-in-the-loop". Humans differ from each other and all have a very limited innate capacity to process new information in real-time. This introduces statistical and systematic uncertainties into observations, analyses and decisions humans make when they are "in the loop". Humans also have unconscious and conscious biases, and these can introduce (major) systematic errors into human assisted or human driven analytics. This note briefly discusses the issues and the (considerable) implications they can have on real-time analytics that involves humans, including software interfaces, learning, and reaction of humans in emergencies. © 2012 IFIP International Federation for Information Processing.
... The eSiMon dashboard was created to help scientists monitor, manage, and collaborate efficiently with teams of researchers working on large high-performance computing (HPC) machines [BKM+09]. eSiMon was designed to be used efficiently across all browsers and platforms and uses Adobe Flash for the frontend of the system. The five main forms of data that are presented to the user are (1) a list of the variables created during the run, (2) extra metadata, such as input files, used during the run, (3) movies and images of the variables, which are constantly updated during and after the run, (4) postprocessing data, along with the provenance information for this data, and (5) vector data. ...
Article
Full-text available
In 2006, the SciDAC Scientific Data Management (SDM) Center proposed to continue its work deploying leading edge data management and analysis capabilities to scientific applications. One of three thrust areas within the proposed center was focused on Scientific Process Automation (SPA) using workflow technology. As a founding member of the Kepler consortium [LAB+09], the SDM Center team was well positioned to begin deploying workflows immediately. We were also keenly aware of some of the deficiencies in Kepler when applied to high performance computing workflows, which allowed us to focus our research and development efforts on critical new capabilities which were ultimately integrated into the Kepler open source distribution, benefiting the entire community. Significant work was required to ensure Kepler was capable of supporting large-scale production runs for SciDAC applications. Our work on generic actors and templates have improved the portability of workflows across machines and provided a higher level of abstraction for workflow developers. Fault tolerance and provenance tracking were obvious areas for improvement within Kepler given the longevity and complexity of our target workflows. To monitor workflow execution, we developed and deployed a web-based dashboard. We then generalized this interface and released it so it could be deployed at other locations. Outreach has always been a primary focus of our work and we had many successful deployments across a number of scientific domains while continually publishing and presenting our work. This short paper describes our most significant accomplishments over the past 5 years. Additional information about the SDM Center can be found in the companion paper: The Scientific Data Management Center: Available Technologies and Highlights.
... In addition Adobe has invested considerable effort in making their ubiquitous Flash Player fast, robust, scalable and reliable. Among the Web 2.0 technologies available, Flash provides all the desired features for the dashboard, including interactivity and cross-browser consistency but more importantly scalability and support for videos and vector graphics [16]. On the back end, the eSimMon dashboard uses PHP and a MySQL database to make the links between user requests on the interface and raw data files. ...
Article
Full-text available
EFFIS is a set of tools developed for working with large-scale simulations. EFFIS is used by researchers in the Center for Plasma Edge Simulation, as well as many other areas of science. EFFIS is composed of services including adaptable I/O, workflows, dashboards, visualization, code coupling, wide-area data movement, and provenance capturing. One of the unique aspects of EFFIS is that it transparently allows users to switch from code coupling on disk to coupling in memory, using the concept of a shared space in a staging area. The staging area is a small fraction of the compute nodes needed to run the large-scale simulation, but it is used for the construction of I/O pipelines and a code-coupling infrastructure. This allows the scientist to make minor changes for the code to work with ADIOS), and then with no changes perform complex transformations, and analytics, which all occur in situ with the simulation. In this talk, we will focus on the technologies CPES uses, which are scalable and can be used on anything from workstations to petascale machines.
... To that end, we built the Framework for Integrated End to End Scientific Data Management Technologies for Applications (FIESTA) shown inFigure 1. The foundation components of FIESTA are: 1) fast adaptable I/O, 2) workflow management, and 3) a collaborative portal we refer to as the eSimMon [2] dashboard. FIESTA's auxiliary components are components for code coupling, provenance and metadata capture, wide-area data movement, and visualization. ...
Conference Paper
Full-text available
Collaboratively monitoring and analyzing large scale simulations from petascale computers is an important area of research and development within the scientific community. This paper addresses these issues when teams of colleagues from different research areas work together to help understand the complex data generated from these simulations. In particular, we address the issues when geographically diverse teams of disparate researchers work together to understand the complex science being simulated on high performance computers. Most application scientists want to focus on the sciences and spend a minimum amount of time learning new tools or adopting new techniques to monitor and analyze their simulation data. The challenge of eSimMon, our web-based system is to decrease or eliminate some of the hurdles on the scientists' path to scientific discovery, and allow these collaborations to flourish.
... By transparently allowing this to occur in the I/O layer of the simulation, the workflow monitoring system can start seeing files being generated, and transfer the file(s) over to another computer system, and then run a series of analysis and visualization tasks. The generated images are accessible on our dashboard [5], a web portal, where the scientist can see the results and then kill the simulation if the results look bad, otherwise let the workflow turn off the memory-to-disk portion of the I/O to allow the coupled simulation to return back to full speed using only memory-tomemory coupling. Our approach to code coupling is to extend the ADIOS componentized I/O framework [6], described in section 3.1, to support in-memory coupling by switching the transport method from I/O to in-memory coupling. ...
... The Scientific Process Automation group (SPA) of the DOE Scientific Data Management Center (SDM) developed the Kepler provenance framework [13] that we use in EFFIS to record and retrieve the data lineage. For example, if a user wants to execute an analysis job on the dashboard, the user selects which variables to include in the analysis and executes the analysis without knowledge about the actual location and names of the files [5]. Data for analysis is selected by the user as a movie or a frame of the movie. ...
Conference Paper
Full-text available
In order to understand the complex physics of mother nature, physicist often use many approximations to understand one area of physics and then write a simulation to reduce these equations to ones that can be solved on a computer. Different approximations lead to different equations that model different physics, which can often lead to a completely different simulation code. As computers become more powerful, scientists can either write one simulation that models all of the physics or they produce several codes each for different portions of the physics and then 'couple' these codes together. In this paper, we concentrate on the latter, where we look at our code coupling approach for modeling a full device fusion reactor. There are many approaches to code coupling. Our first approach was using Kepler workflows to loosely couple three codes via files (memory-to-disk-to-memory coupling). This paper describes our new approach moving towards using memory-to-memory data exchange to allow for a tighter coupling. Our approach focuses on a method which brings together scientific workflows along with staging I/O methods for code coupling. Staging methods use additional compute nodes to perform additional tasks such as data analysis, visualization, and NxM transfers for code coupling. In order to transparently allow application scientist to switch from memory to memory coupling to memory to disk to memory coupling, we have been developing a framework that can switch between these two I/O methods and then automate other workflow tasks. Our hybrid approach allows application scientist to easily switch between in-memory coupling and file-based coupling on-the-fly, which aids debugging these complex configurations.
... The whole file metaphor disappears in favor of thinking about the data directly. The eSiMon [15] system from ORNL offers a view of how this might work from a user's perspective. ...
Article
Full-text available
Scientific simulations have a different relationship with all of the data generated than many data analysis systems that support applications like the Large Hadron Collider and the SLOAN Sky Survey. In many cases, simulations need to gen-erate large number of intermediate data sets that ultimately are thrown away once some analysis routines are applied to the data. This generates some summarized, derived result that inspires some scientific insight. Traditionally, these rou-tines use the storage array to persist the intermediate results between each step of the data analysis process. The volume and frequency of this data can be overwhelming compared with the available IO bandwidth on the machine. To han-dle this volume and frequency, current research efforts are determining how to move the storage of intermediate data from the storage array into the memory of the compute area. Then, the analysis routines are incorporated to create Inte-grated Application Workflows (IAWs). Data staging tech-niques require some mechanism to replace the semantics of-fered by the file system to control data movement and access. As part of an HPC-focused transaction services project, a first pass at a transactional metadata service for in compute area data storage is being developed.