Figure 1 - uploaded by Caroline Privault
Content may be subject to copyright.
shows a representation of the system document workflow in the. 

shows a representation of the system document workflow in the. 

Source publication
Article
Full-text available
This paper describes a tool for assisting lawyers and paralegal teams during document review in eDiscovery. The tool combines a machine learning technology (CategoriX) and advanced multi-touch interface capable of not only addressing the usual cost, time and accuracy issues in document review, but also of facilitating the work of the review teams b...

Context in source publication

Context 1
... the end, the global set of annotated documents is modelled as a classifier, assessed for consistency though "possible mis-tagging" and outlier detection, and exported for further re-use or cross-validation. Figure 1 displays a representation of the global document workflow in 6 different stages. • Stage A' represents the un-organized set of documents that the user has to annotate. ...

Similar publications

Conference Paper
Full-text available
This paper describes a tool for assisting lawyers and paralegal teams during document review in eDiscovery. The tool combines a machine learning technology (CategoriX) and advanced multi- touch interface capabilities to not only address the usual cost, time and accuracy issues in document review, but to also facilitate the work of the review teams...

Citations

... A formal model of legal argumentation provides for the representation of arguments and their relations, and for the specification of their semantics. Researchers in AI and Law have produced a variety of computable models of legal argumentation [Verheij, 2003;Gordon et al., 2007;Prakken, 2010] that widely differ for such characteristics. ...
... Research in AI and Law has evolved in the last years, giving rise to a number of applications of machine learning and text analytics to the legal domain. These include: (1) legal text analytics for identifying and representing legal concepts, to define kinds of annotations, concepts, and relations in documents ; (2) machine learning to manage complex non-structured information sets through machine learning techniques ("predictive coding") [Privault et al., 2010] and to recognize facts and discussions in legal cases [Wiltshire Jr et al., 2002]; ...
Conference Paper
Artificial Intelligence and Law is undergoing a critical transformation. Traditionally focused on the development of expert systems and on a scholarly effort to develop theories and methods for knowledge representation and reasoning in the legal domain, this discipline is now adapting to a sudden change of scenery. No longer confined to the walls of academia, it has welcomed new actors, such as businesses and companies, who are willing to play a major role and seize new opportunities offered by the same transformational impact that recent AI breakthroughs are having on many other areas. As it happens, commercial interests create new opportunities but they also represent a potential threat to consumers, as the balance of power seems increasingly determined by the availability of data. We believe that while this transformation is still in progress, time is ripe for the next frontier of this field of study, where a new shift of balance may be enabled by tools and services that can be of service not only to businesses but also to consumers and, more generally, the civil society. We call that frontier consumer-empowering AI.
... "Disco" is a prototype for assisting knowledge workers in reviewing documents in various domains: it combines a tangible interface with machine learning algorithms, rule-based entity extraction and advanced search capabilities [18]. The next diagram outlines the different components of the system, (see Figure 1). ...
Conference Paper
Full-text available
As advanced technologies, such as data mining become part of the everyday workflow of document reviews in litigations, keyword-search still appears to serve as a cornerstone approach in responsive or privilege review. Keywords are conceptually easy to understand and help culling documents at the early stages of the review. But developing proper keywords to minimize the risk of under/over-inclusiveness can lead to complex strategies. To cope with the burden of designing search terms, we propose to use word embedding techniques in a dynamic manner. This paper describes a system leveraging semantic models in a smart review environment in order to support knowledge workers in eDiscovery.
... Disco combines a tangible user interface (TUI) with machine learning and advanced search capabilities (Privault et al., 2010;Xerox, 2010). At session startup, the user loads a collection of documents that is displayed on a touchscreen in a "Wall view": each document is represented by a tile on the wall. ...
Conference Paper
Full-text available
This paper presents Disco, a prototype for supporting knowledge workers in exploring, reviewing and sorting collections of textual data. The goal is to facilitate, accelerate and improve the discovery of information. To this end, it combines Semantic Relatedness techniques with a review workflow developed in a tangible environment. Disco uses a semantic model that is leveraged on-line in the course of search sessions, and accessed through natural hand-gesture, in a simple and intuitive way.
... Une paire de rôles est ainsi définie : le couple d'expert et de novice de domaine. L'intuition de cette contribution est qu'en plus d'assurer les paradigmes de la collaboration, un utilisateur doit pouvoir accéder à des documents qui correspondent à son niveau d'expertise qui, plus particulièrement pour les novices (Agichtein et al., 2006;Privault et al., 2010), s'affine au fur et à mesure des documents visités. 2. Le second modèle s'intéresse à une distinction horizontale des niveaux d'expertise des collaborateurs, faisant l'hypothèse que chaque utilisateur est expert d'un sous-domaine du besoin en information et contribue à la résolution du besoin selon son domaine de compétence. ...
... Ces deux rôles collaborent avec les clients pour identifier leur besoin et fournir les documents utiles. Egalement, les juristes collaborent entre eux afin que les moins expérimentés -contact reviewers-bénéficient de l'expérience des experts -lead counsel- (Privault et al., 2010;Wang and Soergel, 2010). ...
... Dans ce contexte, la notion d'expertise a été également abordée dans une taxonomie des rôles (Golovchinsky et al., 2009b) qui distingue les niveaux d'expertise entre utilisateurs selon leurs compétences de recherche ou de domaine rassemblant leurs connaissances thématiques du besoin en information. Nous nous intéressons précisément à ce deuxième aspect où les niveaux d'expertise des collaborateurs peuvent être perçus selon deux approches : (a) une distinction verticale des niveaux d'expertise des collaborateurs, leur conférant ainsi les rôles d'expert et de novice de domaine (Soulier et al., 2014b,c) (Attfield et al., 2010;Privault et al., 2010), le domaine bibliothécaire (Rudd and Rudd, 1986;Twidale et al., 1997) et académique (Foster, 2006 (White and Dumais, 2009). Selon l'approche basée sur la distinction horizontale des niveaux d'expertise, le profil doit permettre d'identifier le degré d'expertise des collaborateurs vis-à-vis des facettes du besoin en information. ...
Article
The research topic of this document deals with a particular setting of information retrieval (IR), referred to as collaborative information retrieval (CIR), in which a set of multiple collaborators share the same information need. Collaboration is particularly used in case of complex tasks in which an individual user may have insufficient knowledge and may benefit from the expertise/knowledge or complementarity of other collaborators. This multi-user context rises several challenges in terms of search interfaces as well as ranking models, since new paradigms must be considered, namely division of labor, sharing of knowledge and awareness. These paradigms aim at avoiding redundancy between collaborators in order to reach a synergic effect within the collaboration process. Several approaches have been proposed in the literature. First, search interfaces have been oriented towards a user mediation in order to support collaborators' actions through information storage or communication tools. Second, more close to our contributions, previous work focus on the information access issue by designing ranking models adapted to collaborative environments dealing with the challenges of (1) personalizing result set to collaborators, (2) favoring the sharing of knowledge, (3) dividing the labor among collaborators and/or (4) considering particular roles of collaborators within the information seeking process. In this thesis, we focus, more particularly, on two main aspects of the collaboration: - The expertise of collaborators by proposing retrieval models adapted to the domain expertise level of collaborators. The expertise levels might be vertical, in the case of domain expert and novice, or horizontal when collaborators have different subdomain expertise. We, therefore, propose two CIR models on two steps including a document relevance scoring with respect to each role and a document allocation to user roles through the Expectation–Maximization (EM) learning method applied on the document relevance scoring in order to assign documents to the most likely suited user. - The complementarity of collaborators throughout the information seeking process by mining their roles on the assumptions that collaborators might be different and complementary in some skills. We propose two algorithms based either on predefined roles or latent roles which (1) learns about the roles of the collaborators using various search-related features for each individual involved in the search session, and (2) adapts the document ranking to the mined roles of collaborators.
... More particularly, the pair of roles of domain expert and domain novice, which is addressed in this paper, is based on the assumption that collaborators have different domain expertise levels. Examples of this pair of roles can be found in four application domains: 1) the medical domain (McMullan, 2006;ECDPC, 2011) in which patients and physicians collaborate in order to find and analyze medical information using the web considering that patients dispose of much more time and motivation and can leverage from physicians' domain expertise in order to distinguish, for instance, similar symptoms; 2) the e-Discovery domain (Attfield et al., 2010;Privault et al., 2010) in which the assessment of privileged documents is performed by experts from different trades, namely lawyers, reviewers and lead counsel; 3) the librarian domain (Rudd and Rudd, 1986;Twidale et al., 1997) in which users and librarians collaborate for satisfying users' bibliographic information; and 4) the question-answering domain (Horowitz and Kamvar, 2010;White and Richardson, 2012) in which users collaborate with the asker for solving his/her own information need. Moreover, previous work surrounding user search behavior domain (Allen, 1991;Hembrooke et al., 2005;White et al., 2009) found that users with these two types of roles based on domain expertise level act differently within a search session in terms of submitted queries, used vocabulary or search success. ...
... (b) The pair reviewers-lawyers, where the task is similar to the first pair. Lawyers, viewed as novices, can benefit from reviewers' experience towards subtleties of keywords search tools (Privault et al., 2010). ...
Article
Collaborative information retrieval involves retrieval settings in which a group of users collaborates to satisfy the same underlying need. One core issue of collaborative IR models involves either supporting collaboration with adapted tools or developing IR models for a multiple-user context and providing a ranked list of documents adapted for each collaborator. In this paper, we introduce the first document-ranking model supporting collaboration between two users characterized by roles relying on different domain expertise levels. Specifically, we propose a two-step ranking model: we first compute a document-relevance score, taking into consideration domain expertise-based roles. We introduce specificity and novelty factors into language-model smoothing, and then we assign, via an Expectation–Maximization algorithm, documents to the best-suited collaborator. Our experiments employ a simulation-based framework of collaborative information retrieval and show the significant effectiveness of our model at different search levels.
... This situation however is changing. Technology that can semiautomatically categorize the documents and collaboratively assist in e-discovery is being developed and trialed [19]. Likewise, database-like tools [11] have appeared on the market to assist the phase of case construction by letting the teams to store relevant entities and construct an outline of case defense or attack. ...
Conference Paper
Full-text available
We have designed a system to support collaborative legal case reasoning and building. The design is based on our understanding of the corporate litigation domain acquired through analysis of the literature, interviews of various parties involved in corporate litigation processes, and studies of the commercial tools already available. In this paper we illustrate the designed system and in particular the interaction modes that it supports that we believe address a number of the requirements that emerged through our analysis. We also describe its main components and their integration, including a knowledge model that represents the domain, and a natural language processing component for extracting semantic information. A description of a prototype system is also provided.
... Research in the areas of e-government and e-transparency has shown that this is an important issue [14,29]. There has been a lot of research in the AI & Law area for tools to assist people and specialists such as lawyers with searching, viewing, and working with government data [18,24]. We hope that with the addition of this outlier detection feature to the Many Bills system we can further contribute to this. ...
Conference Paper
Reading congressional legislation, also known as bills, is often tedious because bills tend to be long and written in complex language. In IBM Many Bills, an interactive web-based visualization of legislation, users of different backgrounds can browse bills and quickly explore parts that are of interest to them. One task users have is to be able to locate sections that don't seem to fit with the overall topic of the bill. In this paper, we present novel techniques to determine which sections within a bill are likely to be outliers by employing approaches from information retrieval. The most promising techniques first detect the most topically relevant parts of a bill by ranking its sections, followed by a comparison between these topically relevant parts and the remaining sections in the bill. To compare sections we use various dissimilarity metrics based on Kullback-Leibler Divergence. The results indicate that these techniques are more successful than a classification based approach. Finally, we analyze how the dissimilarity metrics succeed in discriminating between sections that are strong outliers versus those that are 'milder' outliers.
Article
We have designed a system to support collaborative case reasoning and building in corporate litigation cases, that is, processes of bringing and pursuing lawsuits. The design is based on our understanding of the domain acquired through analysis of the literature, interviews of various parties involved in corporate litigation processes, and studies of the commercial tools already available. In this paper we illustrate the designed system and in particular the interaction modes that it supports that we believe address a number of the requirements that emerged through our analysis. We also describe its main components and their integration, including a knowledge model that represents the domain, and a natural language processing component for extracting semantic information. A description of a prototype system is also provided.