ArticlePDF Available

Laboratory information management system for membrane protein structure initiative - From gene to crystal

Authors:

Abstract and Figures

Membrane Protein Structure Initiative (MPSI) exploits laboratory competencies to work collaboratively and distribute work among the different sites. This is possible as protein structure determination requires a series of steps, starting with target selection, through cloning, expression, purification, crystallization and finally structure determination. Distributed sites create a unique set of challenges for integrating and passing on information on the progress of targets. This role is played by the Protein Information Management System (PIMS), which is a laboratory information management system (LIMS), serving as a hub for MPSI, allowing collaborative structural proteomics to be carried out in a distributed fashion. It holds key information on the progress of cloning, expression, purification and crystallization of proteins. PIMS is employed to track the status of protein targets and to manage constructs, primers, experiments, protocols, sample locations and their detailed histories: thus playing a key role in MPSI data exchange. It also serves as the centre of a federation of interoperable information resources such as local laboratory information systems and international archival resources, like PDB or NCBI. During the challenging task of PIMS integration, within the MPSI, we discovered a number of prerequisites for successful PIMS integration. In this article we share our experiences and provide invaluable insights into the process of LIMS adaptation. This information should be of interest to partners who are thinking about using LIMS as a data centre for their collaborative efforts.
Content may be subject to copyright.
A preview of the PDF is not available
... For example, in the Biotechnology and Biological Sciences Research Council BBSRC-funded Membrane Protein Structure Initiative (MPSI), which involves eight universities collaborating on the project, data produced at one site, such as the University of Leeds, have to be accessible to any other member of the consortium and indeed to the BBSRC funding agency. To achieve this, the MPSI consortium has employed a LIMS known as the Protein Information Management System (PIMS) [13,14] to record experimental information on the high throughput production and characterisation of membrane proteins. The present study describes the extension of this system for DNA sequencing facilities. ...
Article
Full-text available
Facilities that provide a service for DNA sequencing typically support large numbers of users and experiment types. The cost of services is often reduced by the use of liquid handling robots but the efficiency of such facilities is hampered because the software for such robots does not usually integrate well with the systems that run the sequencing machines. Accordingly, there is a need for software systems capable of integrating different robotic systems and managing sample information for DNA sequencing services. In this paper, we describe an extension to the Protein Information Management System (PIMS) that is designed for DNA sequencing facilities. The new version of PIMS has a user-friendly web interface and integrates all aspects of the sequencing process, including sample submission, handling and tracking, together with capture and management of the data. The PIMS sequencing extension has been in production since July 2009 at the University of Leeds DNA Sequencing Facility. It has completely replaced manual data handling and simplified the tasks of data management and user communication. Samples from 45 groups have been processed with an average throughput of 10000 samples per month. The current version of the PIMS sequencing extension works with Applied Biosystems 3130XL 96-well plate sequencer and MWG 4204 or Aviso Theonyx liquid handling robots, but is readily adaptable for use with other combinations of robots. PIMS has been extended to provide a user-friendly and integrated data management solution for DNA sequencing facilities that is accessed through a normal web browser and allows simultaneous access by multiple users as well as facility managers. The system integrates sequencing and liquid handling robots, manages the data flow, and provides remote access to the sequencing results. The software is freely available, for academic users, from http://www.pims-lims.org/.
... This greatly facilitates the process of building and modifying modules, which is essential for the iterative process of defining and refining workflow specifications. Rapid template-based creation and customization of new modules is a key feature of WIST that distinguishes it from other open source LIMS software such as Sesame ( Zolnai et al., 2003) and free-to-academics systems such as PIMS ( Troshin et al., 2008). ...
Article
Full-text available
Workflow Information Storage Toolkit (WIST) is a set of application programming interfaces and web applications that allow for the rapid development of customized laboratory information management systems (LIMS). WIST provides common LIMS input components, and allows them to be arranged and configured using a flexible language that specifies each component's visual and semantic characteristics. WIST includes a complete set of web applications for adding, editing and viewing data, as well as a powerful setup tool that can build new LIMS modules by analyzing existing database schema. Availability and implementation: WIST is implemented in Perl and may be obtained from http://vimss.sf.net under the BSD license. Contact: jmchandonia@lbl.gov Supplementary information: Supplementary data are available at Bioinformatics online.
Article
Using our school's agricultural informatization degree functions and characteristics of the teaching platform, we introduce the agricultural informatization network course teaching demonstration system implementation process, the main module, database and platform construction. At the same time the detailed design is discussed. According to practical application, it shows that the teaching system can improve the teaching research, guarantee the quality of teaching, and have good demonstration effect.
Article
Full-text available
The techniques used in protein production and structural biology have been developing rapidly, but techniques for recording the laboratory information produced have not kept pace. One approach is the development of laboratory information-management systems (LIMS), which typically use a relational database schema to model and store results from a laboratory workflow. The underlying philosophy and implementation of the Protein Information Management System (PiMS), a LIMS development specifically targeted at the flexible and unpredictable workflows of protein-production research laboratories of all scales, is described. PiMS is a web-based Java application that uses either Postgres or Oracle as the underlying relational database-management system. PiMS is available under a free licence to all academic laboratories either for local installation or for use as a managed service.
Article
Full-text available
A computing infrastructure (Sesame) has been designed to manage and link individual steps in complex projects. Sesame is being developed to support a large-scale structural proteomics pilot project. When complete, the system is expected to manage all steps from target selection to data-bank deposition and report writing. We report here on the design criteria of the Sesame system and on results demonstrating successful achievement of the basic goals of its architecture. The Sesame software package, which follows the client/server paradigm, consists of a framework, which supports secure interactions among the three tiers of the system (the client, server, and database tiers), and application modules that carry out specific tasks. The framework utilizes industry standards. The client tier is written in Java2 and can be accessed anywhere through the Internet. All the development on the server tier is also carried out in Java2 so as to accommodate a wide variety of computer platforms. The database tier employs a commercial database management system. Each Sesame application module consists of a simple user interface in the client tier, corresponding objects in the server tier, and relevant data stored in the centralized database. For security, access to stored data is controlled by access privileges. The system facilitates both local and remote collaborations. Because users interact with the system using Java Web Start or through a web browser, access is limited only by the availability of an Internet connection. We describe several Sesame modules that have been developed to the point where they are being utilized routinely to support steps involved in structural and functional proteomics. This software is available to parties interested in using it and assisting to guide its further development.
Article
Full-text available
Macromolecular crystallography requires simple yet effective means of organizing and managing the large amounts of data generated by crystallization experiments. There are several freely available web-based Laboratory Information Management Systems (LIMS) that assist in these tasks. These, however, rely on the limited user interfaces allowed in HTML-based web pages. To address this limitation, a new LIMS for protein crystallization, which features a novel rich graphical user interface (GUI) to a relational database, has been developed. This application, which is called CLIMS (Crystallography LIMS), assists in all aspects of protein-crystallization projects: protein expression, handling, crystallization optimization, visualization of results and preliminary diffraction data. Extensive use of templates, particularly for commercial screens and common optimization grid screens, exploits the redundancy in experimental setups. The crystallization tray is the central focus of the graphical interface, thus facilitating rapid visualization and annotation of results. CLIMS was developed specifically to cater for the needs of individual laboratories requiring an intuitive and robust system for managing crystallization experiments and is freely available.
Article
Full-text available
Vector NTI is a well-balanced desktop application integrated for molecular sequence analysis and biological data management. It has a centralised database and five application modules: Vector NTI, AlignX, BioAnnotator, ContigExpress and GenomBench. In this review, the features and functions available in this software are examined. These include database management, primer design, virtual cloning, alignments, sequence assembly, 3D molecular viewer and internet tools. Some problems encountered when using this software are also discussed. It is hoped that this review will introduce this software to more molecular biologists so they can make better-informed decisions when choosing computational tools to facilitate their everyday laboratory work. This tool can save time and enhance analysis but it requires some learning on the user's part and there are some issues that need to be addressed by the developer.
Article
Full-text available
Structural GenomiX, Inc. (SGX), four New York area institutions, and two University of California schools have formed the New York Structural GenomiX Research Consortium (NYSGXRC), an industrial/academic Research Consortium that exploits individual core competencies to support all aspects of the NIH-NIGMS funded Protein Structure Initiative (PSI), including protein family classification and target selection, generation of protein for biophysical analyses, sample preparation for structural studies, structure determination and analyses, and dissemination of results. At the end of the PSI Pilot Study Phase (PSI-1), the NYSGXRC will be capable of producing 100-200 experimentally determined protein structures annually. All Consortium activities can be scaled to increase production capacity significantly during the Production Phase of the PSI (PSI-2). The Consortium utilizes both centralized and de-centralized production teams with clearly defined deliverables and hand-off procedures that are supported by a web-based target/sample tracking system (SGX Laboratory Information Data Management System, LIMS, and NYSGXRC Internal Consortium Experimental Database, ICE-DB). Consortium management is provided by an Executive Committee, which is composed of the PI and all Co-PIs. Progress to date is tracked on a publicly available Consortium web site (http://www.nysgxrc.org) and all DNA/protein reagents and experimental protocols are distributed freely from the New York City Area institutions. In addition to meeting the requirements of the Pilot Study Phase and preparing for the Production Phase of the PSI, the NYSGXRC aims to develop modular technologies that are transferable to structural biology laboratories in both academe and industry. The NYSGXRC PI and Co-PIs intend the PSI to have a transforming effect on the disciplines of X-ray crystallography and NMR spectroscopy of biological macromolecules. Working with other PSI-funded Centers, the NYSGXRC seeks to create the structural biology laboratory of the future. Herein, we present an overview of the organization of the NYSGXRC and describe progress toward development of a high-throughput Gene-->Structure platform. An analysis of current and projected consortium metrics reflects progress to date and delineates opportunities for further technology development.
Article
Full-text available
SPINE (Structural Proteomics In Europe) was established in 2002 as an integrated research project to develop new methods and technologies for high-throughput structural biology. Development areas were broken down into workpackages and this article gives an overview of ongoing activity in the bioinformatics workpackage. Developments cover target selection, target registration, wet and dry laboratory data management and structure annotation as they pertain to high-throughput studies. Some individual projects and developments are discussed in detail, while those that are covered elsewhere in this issue are treated more briefly. In particular, this overview focuses on the infrastructure of the software that allows the experimentalist to move projects through different areas that are crucial to high-throughput studies, leading to the collation of large data sets which are managed and eventually archived and/or deposited.
Article
Full-text available
Data management has been identified as a crucial issue in all large-scale experimental projects. In this type of project, many different persons manipulate multiple objects in different locations; thus, unless complete and accurate records are maintained, it is extremely difficult to understand exactly what has been done, when it was done, who did it, and what exact protocol was used. All of this information is essential for use in publications, reusing successful protocols, determining why a target has failed, and validating and optimizing protocols. Although data management solutions have been in place for certain focused activities (e.g., genome sequencing and microarray experiments), they are just emerging for more widespread projects, such as structural genomics, metabolomics, and systems biology as a whole. The complexity of experimental procedures, and the diversity and high rate of development of protocols used in a single center, or across various centers, have important consequences for the design of information management systems. Because procedures are carried out by both machines and hand, the system must be capable of handling data entry both from robotic systems and by means of a user-friendly interface. The information management system needs to be flexible so it can handle changes in existing protocols or newly added protocols. Because no commercial information management systems have had the needed features, most structural genomics groups have developed their own solutions. This chapter discusses the advantages of using a LIMS (laboratory information management system), for day-to-day management of structural genomics projects, and also for data mining. This chapter reviews different solutions currently in place or under development with emphasis on three systems developed by the authors: Xtrack, Sesame (developed at the Center for Eukaryotic Structural Genomics under the US Protein Structural Genomics Initiative), and HalX (developed at the Yeast Structural Genomics Laboratory, in collaboration with the European SPINE project).
Article
One of the most complex aspects of document formatting is the processing of references to remote objects such as headings and figures. In the case of a forward reference to an object that occurs later in the document, two formatting passes are usually needed before the document converges to a stable state. Some documents require more than two passes to converge, and cases are known of documents that never converge but oscillate between two unstable states. This paper describes the techniques used for resolving references and detecting document convergence by the Interactive Composition and Editing Facility, Version 2 (ICEF2). ICEF2 is an interactive formatting system that allows users to move about in a document, editing and reformatting pages. The concepts of formatting pass and document convergence are discussed in the context of interactive formatting. A description is given of the ICEF2 data store, a small relational database manager with special features for detecting document convergence. A sample ICEF2 style definition is discussed to illustrate how ICEF2 deals with document elements whose appearance depends on their location on the page.
Article
With access to sequences of entire human genomes plus those of various model organisms and many important microbial pathogens, structural biology is on the verge of a dramatic transformation. Our newfound wealth of sequence information will serve as the foundation for an important initiative in structural genomics. We are poised to embark on a systematic program of high-throughput X-ray crystallography and NMR spectroscopy aimed at developing a comprehensive view of the protein structure universe. Structural genomics will yield a large number of experimental protein structures (tens of thousands) and an even larger number of calculated comparative protein structure models (millions). This enormous body of structural data will be freely available, and promises to accelerate scientific discovery in all areas of biological science, including biodiversity and evolution in natural ecosystems, agricultural plant genetics, breeding of farm and domestic animals, and human health and disease.
Article
High-throughput structural biology is a focus of a number of academic and pharmaceutical laboratories around the world. The use of X-ray crystallography in these efforts is critically dependent on high-throughput protein crystallization. The application of current protocols yields crystal leads for approximately 30% of the input proteins and well-diffracting crystals for a smaller fraction. Increasing the success rate will require a multidisciplinary approach that must invoke techniques from molecular biology, protein biochemistry, biophysics, artificial intelligence, and automation.
Article
Data management has emerged as one of the central issues in the high-throughput processes of taking a protein target sequence through to a protein sample. To simplify this task, and following extensive consultation with the international structural genomics community, we describe here a model of the data related to protein production. The model is suitable for both large and small facilities for use in tracking samples, experiments, and results through the many procedures involved. The model is described in Unified Modeling Language (UML). In addition, we present relational database schemas derived from the UML. These relational schemas are already in use in a number of data management projects.