Calculator flowchart

Source publication

Collaboration portal for petascale simulations

Article

Full-text available

Jan 2009

The emergence of leadership class computing is creating a tsunami of data from petascale simulations. Results are typically analyzed by dozens of scientists. In order for the scientist to digest the vast amount of data being produced from the simulations and auxiliary programs, it is critical to automate the effort to manage, analyze, visualize, an...

Context 1

... presents the scientists with pre-defined visualizations. In other words users visualize the output of pre-determined workflows which go through a series of operations on raw data to generate images and movies. The purpose of analysis on the dashboard is to help users extract scientific knowledge by post-processing the data themselves, giving them more choices and options in manipulating information. There are currently two approaches to integrating analysis on the dashboard. The first one is the eSimMon calculator and the second one is using external tool notably R . These two methods were implemented after gathering a set of commonly used analysis routines across applications, from simple subtractions to statistical analysis on physical variables. This type of analysis can be considered as “built - in” the dashboard. In the future eSimMon will include a “plug - in” capability, allowing users to upload and run their own customized analysis routine. The calculator is a tool originally implemented for XGC1 variables. It works with xy-plots generated from NetCDF data files and can be extended to other types of xy-plots. Figure 3 shows a snap shot of the calculator. The idea is as follows. When users look at movies (flv files) on the dashboard we want them to think of variables such as density, temperature etc. This is more intuitive for scientists. They do not want to think of tracking a .png file back to the original NetCDF file to perform analysis. Let‟s say they quickly want to subtract two different types of densities for which data is stored in different files. This may not be an operation so common that it is part of the monitoring workflow, yet it may be telling about a certain phenomenon before performing more complex analysis. The purpose of the calculator is to provide a quick and easy way to do simple operations to observe a particular trend without getting into too many details. The back end component of the calculator works as a serial process, a series of steps to generate a result movie. On the front end users are oblivious to analysis workflows. They simply alternatively click on different plots to select variables and on calculator buttons to select operators. For example for an XGC1 run, to take the difference between the same variable ion__density from different runs a user would click on the first variable, then select the minus sign on the calculator, then the second variable and finally hit the equal button on the calculator to start the analysis. Figure 4 tentatively describes this process on the front end while Figure 5 is a flowchart of the back end processing. Simple checks are done as the user performs these steps to guide him or her and make sure the operation does not contain obvious problems such as two variables side by side without operators in between. The user sees the operation being composed as he clicks on different variables and operators. The goal is to be as intuitive and simple as possible on the front end while a more involved process gets started on the back end. First the back end needs to identify which shot (the dashboard allows viewing results from two different shots at a time), which variables and which symbols are in the expression. The back end then checks whether or not there is provenance information available for each variable. When this is done, it proceeds to check if there is an analysis directory for this particular user and shot; if not it creates it. Since the raw data is in NetCDF files in our case, the back end runs a conversion routine to create separate ASCII files containing data for single variables at single time steps. After that it creates a separate python routine to perform the actual operation. It replaces each operator by their python equivalent and replaces the variables by filenames. The python routine loops through files and time steps to generate a series of images. The back end then runs the python program, and then runs a command to generate a movie from all the images to finally return an flv file. The steps may seem involved but the user has no knowledge of it. As he or she clicks on the equal sign, a new analysis appears on the left menu and when he or she drags it on the main panel, the movie appears. Some shots have several hundreds of time steps; in these cases the process does not seem instantaneous. Users see a “buzy” cursor which lets them know when the analysis is done and they can drag and drop its result. Currently the analysis intermediate data is stored in ASCII files, which becomes slow as the simulation data grows. In the future we plan on writing this data files in a more efficient format such as NetCDF instead of using ...

View in full-text

PO-0892: CancerData.org: open source biomedical data sharing to facilitate oncological research

Article

Full-text available

Mar 2013

Reducing idle time with collaboration and data sharing

Technical Report

Full-text available

Aug 2018

Networks for Distributing Information

Chapter

Full-text available

Jul 1997

Permission Regions for Race-Free Parallelism

Conference Paper

Full-text available

Sep 2011

It is difficult to write parallel programs that are correct. This is because of the potential for data races , when parallel tasks access shared data in complex and unexpected ways. A classic approach to addressing this problem is dynamic race detection, which has the benefits of working transparently to the programmer and not raising any false ala...

Real-Time Information Dissemination Requirements for Illinois per New Federal Rule: Project Extension (Phase II)

Technical Report

Full-text available

Feb 2016

EFFIS: End-to-end Framework for Fusion Integrated Simulation

Research

Full-text available

Dec 2015

Arie Shoshani

Abstract—The purpose of the Fusion Simulation Project is to develop a predictive capability for integrated modeling of magnetically confined burning plasmas. In support of this mission, the Center for Plasma Edge Simulation has developed an End-to-end Framework for Fusion Integrated Simulation (EFFIS) that combines critical computer science technologies in an effective manner to support leadership class computing and the coupling of complex plasma physics models. We describe here the main components of EFFIS and how they are being utilized to address our goal of integrated predictive plasma edge simulation.

EFFIS paper

Data

Full-text available

Dec 2015

Collaborative Monitoring and Analysis for Simulation Scientists

Article

Full-text available

May 2012

Collaboratively monitoring and analyzing large scale simulations from petascale computers is an important area of research and development within the scientific community. This paper addresses these issues when teams of colleagues from different research areas work together to help understand the complex data generated from these simulations. In particular, we address the issues when geographically diverse teams of disparate researchers work together to understand the complex science being simulated on high performance computers. Most application scientists want to focus on the sciences and spend a minimum amount of time learning new tools or adopting new techniques to monitor and analyze their simulation data. The challenge of eSimMon, our web-based system is to decrease or eliminate some of the hurdles on the scientists' path to scientific discovery, and allow these collaborations to flourish..

A Note on Uncertainty in Real-Time Analytics

Article

Jan 2012

Mladen A. Vouk

Today real-time analytics of large data sets is invariably computer-assisted and often includes a "human-in-the-loop". Humans differ from each other and all have a very limited innate capacity to process new information in real-time. This introduces statistical and systematic uncertainties into observations, analyses and decisions humans make when they are "in the loop". Humans also have unconscious and conscious biases, and these can introduce (major) systematic errors into human assisted or human driven analytics. This note briefly discusses the issues and the (considerable) implications they can have on real-time analytics that involves humans, including software interfaces, learning, and reaction of humans in emergencies. © 2012 IFIP International Federation for Information Processing.

Working with Workflows: Highlights from 5 years Building Scientific Workflows

Article

Full-text available

Jan 2011

In 2006, the SciDAC Scientific Data Management (SDM) Center proposed to continue its work deploying leading edge data management and analysis capabilities to scientific applications. One of three thrust areas within the proposed center was focused on Scientific Process Automation (SPA) using workflow technology. As a founding member of the Kepler consortium [LAB+09], the SDM Center team was well positioned to begin deploying workflows immediately. We were also keenly aware of some of the deficiencies in Kepler when applied to high performance computing workflows, which allowed us to focus our research and development efforts on critical new capabilities which were ultimately integrated into the Kepler open source distribution, benefiting the entire community. Significant work was required to ensure Kepler was capable of supporting large-scale production runs for SciDAC applications. Our work on generic actors and templates have improved the portability of workflows across machines and provided a higher level of abstraction for workflow developers. Fault tolerance and provenance tracking were obvious areas for improvement within Kepler given the longevity and complexity of our target workflows. To monitor workflow execution, we developed and deployed a web-based dashboard. We then generalized this interface and released it so it could be deployed at other locations. Outreach has always been a primary focus of our work and we had many successful deployments across a number of scientific domains while continually publishing and presenting our work. This short paper describes our most significant accomplishments over the past 5 years. Additional information about the SDM Center can be found in the companion paper: The Scientific Data Management Center: Available Technologies and Highlights.

End-to-end Framework for Fusion Integrated Simulation (EFFIS)

Article

Full-text available

Nov 2010

EFFIS is a set of tools developed for working with large-scale simulations. EFFIS is used by researchers in the Center for Plasma Edge Simulation, as well as many other areas of science. EFFIS is composed of services including adaptable I/O, workflows, dashboards, visualization, code coupling, wide-area data movement, and provenance capturing. One of the unique aspects of EFFIS is that it transparently allows users to switch from code coupling on disk to coupling in memory, using the concept of a shared space in a staging area. The staging area is a small fraction of the compute nodes needed to run the large-scale simulation, but it is used for the construction of I/O pipelines and a code-coupling infrastructure. This allows the scientist to make minor changes for the code to work with ADIOS), and then with no changes perform complex transformations, and analytics, which all occur in situ with the simulation. In this talk, we will focus on the technologies CPES uses, which are scalable and can be used on anything from workstations to petascale machines.

Collaborative monitoring and analysis for simulation scientists

Conference Paper

Full-text available

Jun 2010

Plasma fusion code coupling using scalable I/O services and scientific workflows

Conference Paper

Full-text available

Nov 2009

In order to understand the complex physics of mother nature, physicist often use many approximations to understand one area of physics and then write a simulation to reduce these equations to ones that can be solved on a computer. Different approximations lead to different equations that model different physics, which can often lead to a completely different simulation code. As computers become more powerful, scientists can either write one simulation that models all of the physics or they produce several codes each for different portions of the physics and then 'couple' these codes together. In this paper, we concentrate on the latter, where we look at our code coupling approach for modeling a full device fusion reactor. There are many approaches to code coupling. Our first approach was using Kepler workflows to loosely couple three codes via files (memory-to-disk-to-memory coupling). This paper describes our new approach moving towards using memory-to-memory data exchange to allow for a tighter coupling. Our approach focuses on a method which brings together scientific workflows along with staging I/O methods for code coupling. Staging methods use additional compute nodes to perform additional tasks such as data analysis, visualization, and NxM transfers for code coupling. In order to transparently allow application scientist to switch from memory to memory coupling to memory to disk to memory coupling, we have been developing a framework that can switch between these two I/O methods and then automate other workflow tasks. Our hybrid approach allows application scientist to easily switch between in-memory coupling and file-based coupling on-the-fly, which aids debugging these complex configurations.

Transactional Parallel Metadata Services for Integrated Application Workflows

Article

Full-text available

Scientific simulations have a different relationship with all of the data generated than many data analysis systems that support applications like the Large Hadron Collider and the SLOAN Sky Survey. In many cases, simulations need to gen-erate large number of intermediate data sets that ultimately are thrown away once some analysis routines are applied to the data. This generates some summarized, derived result that inspires some scientific insight. Traditionally, these rou-tines use the storage array to persist the intermediate results between each step of the data analysis process. The volume and frequency of this data can be overwhelming compared with the available IO bandwidth on the machine. To han-dle this volume and frequency, current research efforts are determining how to move the storage of intermediate data from the storage array into the memory of the compute area. Then, the analysis routines are incorporated to create Inte-grated Application Workflows (IAWs). Data staging tech-niques require some mechanism to replace the semantics of-fered by the file system to control data movement and access. As part of an HPC-focused transaction services project, a first pass at a transactional metadata service for in compute area data storage is being developed.

Efficient, Failure Resilient Transactions for Parallel and Distributed Computing

Conference Paper

Nov 2014

Calculator flowchart

Context in source publication

Similar publications

Citations