Figure 6 - uploaded by Zhangyang Wang
Content may be subject to copyright.
Weights map for a random split.

Weights map for a random split.

Source publication
Conference Paper
Full-text available
Over the past decade a wide spectrum of machine learning models have been developed to model the neurodegenerative diseases, associating biomarkers, especially non-intrusive neuroimaging markers, with key clinical scores measuring the cognitive status of patients. Multi-task learning (MTL) has been commonly utilized by these studies to address high...

Contexts in source publication

Context 1
... 5(c) reveals clustering pattern among different tasks where some tasks are similar with each other, which again supports the multi-task assumptions. Figure 6 shows the value of weights associated with one random split, where most weights are close to zero. ...
Context 2
... we provide statistics for the data in real experiments. The mean and standard deviation for each given target is calculated and distribution for all targets are plotted. We also examine the relationship between targets by computing the correlation for pair- wise targets. Results are shown in Figure 5. We can see that all the targets are non-negative, and most target means are within 50 with standard deviation following the similar trend. Figure 5(c) reveals clustering pattern among different tasks where some tasks are similar with each other, which again supports the multi-task assumptions. Figure 6 shows the value of weights associated with one random split, where most weights are close to ...

Similar publications

Article
Full-text available
As a foundation and typical task in natural language processing, text classification has been widely applied in many fields. However, as the basis of text classification, most existing corpus are imbalanced and often result in the classifier tending its performance to those categories with more texts. In this paper, we propose a background knowledg...

Citations

... The distribution estimation is also important for option pricing [5] and computing confidence intervals [6], as described in [4]. In [7], neurodegenerative clinical scores are predicted based on brain imaging (structural magnetic resonance imaging). The clinical scores have lower and upper bounds [7]. ...
... In [7], neurodegenerative clinical scores are predicted based on brain imaging (structural magnetic resonance imaging). The clinical scores have lower and upper bounds [7]. Precipitation prediction from meteorologic measurements taken from multiple stations across Germany was done in [8]. ...
... In our opinion, the work that is the closest to ours is the subspace network [7]. In this section, we elaborate on the similarities and differences between our work and the subspace network. ...
Article
Full-text available
In censored regression, the outcomes are a mixture of known values (uncensored) and open intervals (censored), meaning that the outcome is either known with precision or is an unknown value above or below a known threshold. The use of censored data is widespread, and correctly modeling it is essential for many applications. Although the literature on censored regression is vast, deep learning approaches have been less frequently applied. This paper proposes three loss functions for training neural networks on censored data using gradient backpropagation: the tobit likelihood, the censored mean squared error, and the censored mean absolute error. We experimented with three variations in the tobit likelihood that arose from different ways of modeling the standard deviation variable: as a fixed value, a reparametrization, and an estimation using a separate neural network for heteroscedastic data. The tobit model yielded better results, but the other two losses are simpler to implement. Another central idea of our research was that data are often censored and truncated simultaneously. The proposed losses can handle simultaneous censoring and truncation at arbitrary values from above and below.
... Here we derive Ω from pre-trained weights W, which is different from previous low-rank techniques for model compression since we perform prior-training decomposition and can not access ∆W before finetuning. In this way, we actually assume that ∆W shares a similar crucial subspace Ω with W (Sun et al., 2018). We use the GreBsmo algorithm (Zhou & Tao, 2013) to solve this optimization problem. ...
Preprint
Full-text available
Gigantic pre-trained models have become central to natural language processing (NLP), serving as the starting point for fine-tuning towards a range of downstream tasks. However, two pain points persist for this paradigm: (a) as the pre-trained models grow bigger (e.g., 175B parameters for GPT-3), even the fine-tuning process can be time-consuming and computationally expensive; (b) the fine-tuned model has the same size as its starting point by default, which is neither sensible due to its more specialized functionality, nor practical since many fine-tuned models will be deployed in resource-constrained environments. To address these pain points, we propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights. Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning - by enforcing sparsity-aware weight updates on top of the pre-trained weights; and (ii) resource-efficient inference - by encouraging a sparse weight structure towards the final fine-tuned model. We leverage sparsity in these two directions by exploiting both unstructured and structured sparse patterns in pre-trained language models via magnitude-based pruning and $\ell_1$ sparse regularization. Extensive experiments and in-depth investigations, with diverse network backbones (i.e., BERT, GPT-2, and DeBERTa) on dozens of datasets, consistently demonstrate highly impressive parameter-/training-/inference-efficiency, while maintaining competitive downstream transfer performance. For instance, our DSEE-BERT obtains about $35\%$ inference FLOPs savings with <1% trainable parameters and comparable performance to conventional fine-tuning. Codes are available in https://github.com/VITA-Group/DSEE.
... We name this feature as subspace learning. To make L 2 ight hardware-aware, we trade expensive full-space trainability for efficient subspace gradient evaluation, i.e., ∂L ∂Σ which coincides with the general frequency-domain ONNs [18,19] and subspace NN design concept [42]. Since this learning stage involves stochasticity, it turns out to be the efficiency bottleneck, especially the backward pass. ...
Preprint
Silicon-photonics-based optical neural network (ONN) is a promising hardware platform that could represent a paradigm shift in efficient AI with its CMOS-compatibility, flexibility, ultra-low execution latency, and high energy efficiency. In-situ training on the online programmable photonic chips is appealing but still encounters challenging issues in on-chip implementability, scalability, and efficiency. In this work, we propose a closed-loop ONN on-chip learning framework L2ight to enable scalable ONN mapping and efficient in-situ learning. L2ight adopts a three-stage learning flow that first calibrates the complicated photonic circuit states under challenging physical constraints, then performs photonic core mapping via combined analytical solving and zeroth-order optimization. A subspace learning procedure with multi-level sparsity is integrated into L2ight to enable in-situ gradient evaluation and fast adaptation, unleashing the power of optics for real on-chip intelligence. Extensive experiments demonstrate our proposed L2ight outperforms prior ONN training protocols with 3-order-of-magnitude higher scalability and over 30X better efficiency, when benchmarked on various models and learning tasks. This synergistic framework is the first scalable on-chip learning solution that pushes this emerging field from intractable to scalable and further to efficient for next-generation self-learnable photonic neural chips. From a co-design perspective, L2ight also provides essential insights for hardware-restricted unitary subspace optimization and efficient sparse training. We open-source our framework at https://github.com/JeremieMelo/L2ight.
... The importance and originality of those clinical technologies are that it explores biomarkers for locating the progression of the patient's disease. A variety of data driven based machine learning techniques [5], [16]- [18] like deep learning models [19], [20], multi-task modeling [5], [16], [17]and survival model [18], have been investigated to deal with these data for better prediction of AD progression. ...
... The importance and originality of those clinical technologies are that it explores biomarkers for locating the progression of the patient's disease. A variety of data driven based machine learning techniques [5], [16]- [18] like deep learning models [19], [20], multi-task modeling [5], [16], [17]and survival model [18], have been investigated to deal with these data for better prediction of AD progression. ...
Conference Paper
Full-text available
The prediction and modeling of chronic diseases such as Alzheimer's disease (AD) has received widespread attention in recent years. The field is tightly integrated with medical care, and recent advances in machine learning technology provide opportunities to train AD disease progression models. This trend has led to the exploration and design of new machine learning techniques for multimodal medical and health datasets to predict the occurrence and modeling process of AD. The purpose of this article is to perform a longitudinal and tracking analysis by machine learning models and explore core regions of the brain's components associated with AD progression that are important for brain degeneration. We summarized the biomarkers related with the progression of AD and checked them by neuropathology. On the basis of this, further generalization to the corresponding functional brain blocks was provided; the results showed that this is in line with the results in the field of clinical medicine and provides technical support for physician-assisted diagnosis.
... These methods select the subspace while seeking the weights of the decision functions by minimizing a convex optimisation problem on the sum of the joint regularization and the loss [21][22]. These regularized multi-task learning approaches [23][24][25][26][27][28] have been used in many Alzheimer's disease progression and deliver promising performance in biomarker feature selections. MTFL techniques has some potential to explore the impacts of factors on COVID-19 CFR. ...
... Structural regularization methods in MTL constrains optimization by using regularization terms and shares information between tasks. In the research of Alzheimer's disease (AD) progress, MTL has made outstanding contributions, there are many prior work [21][22][23][24][25][26][27][28][29] that model relationships among tasks using novel regularization. In order to solve our regression model, we have main considered two typical single task model (Ridge and Lasso regression) [29] and one state-of-the-art MTFL method (fused sparse group lasso) [28] in our formulation. ...
Preprint
Full-text available
Recent outbreak of COVID-19 has led a rapid global spread around the world. Many countries have implemented timely intensive suppression to minimize the infections, but resulted in high case fatality rate (CFR) due to critical demand of health resources. Other country-based factors such as sociocultural issues, ageing population etc., has also influenced practical effectiveness of taking interventions to improve morality in early phase. To better understand the relationship of these factors across different countries with COVID-19 CFR is of primary importance to prepare for potentially second wave of COVID-19 infections. In the paper, we propose a novel regularized multi-task learning based factor analysis approach for quantifying country-based factors affecting CFR in early phase of COVID-19 epidemic. We formulate the prediction of CFR progression as a ML regression problem with observed CFR and other countries-based factors. In this formulation, all CFR related factors were categorized into 6 sectors with 27 indicators. We proposed a hybrid feature selection method combining filter, wrapper and tree-based models to calibrate initial factors for a preliminary feature interaction. Then we adopted two typical single task model (Ridge and Lasso regression) and one state-of-the-art MTFL method (fused sparse group lasso) in our formulation. The fused sparse group Lasso (FSGL) method allows the simultaneous selection of a common set of country-based factors for multiple time points of COVID-19 epidemic and also enables incorporating temporal smoothness of each factor over the whole early phase period. Finally, we proposed one novel temporal voting feature selection scheme to balance the weight instability of multiple factors in our MTFL model.
... Note that these assumptions rely on the fact that the model can differentiate these two groups with accurately. interval censoring approach to deal with the disease progression problem for neurodegenerative diseases [193][194][195]. Interval censoring, in statistics, defines a sampling scheme or an incomplete data structure. ...
... The interval censoring approach was already successfully applied to machine learning models, including both shallow [193,194] and deep learning approaches [195], to conveniently model Alzheimer's disease patients. Therefore, the future work could benefit of a similar approach in the stratification of IPD patients. ...
Conference Paper
Full-text available
Prion diseases are a group of rare neurodegenerative conditions characterised by a high rate of progression and highly heterogeneous phenotypes. Whilst the most common form of prion disease occurs sporadically (sporadic Creutzfeldt-Jakob disease, sCJD), other forms are caused by inheritance of prion protein gene mutations or exposure to prions. To date, there are no accurate imaging biomarkers that can be used to predict the future diagnosis of a subject or to quantify the progression of symptoms over time. Besides, CJD is commonly mistaken for other forms of dementia. Due to the large heterogeneity of phenotypes of prion disease and the lack of a consistent spatial pattern of disease progression, the approaches used to study other types of neurodegenerative diseases are not satisfactory to capture the progression of the human form of prion disease. Using a tailored framework, I extracted quantitative imaging biomarkers for characterisation of patients with Prion diseases. Following the extraction of patient-specific imaging biomarkers from multiple images, I implemented a Gaussian Process approach to correlated symptoms with disease types and stages. The model was used on three different tasks: diagnosis, differential diagnosis and stratification, addressing an unmet need to automatically identify patients with or at risk of developing Prion disease. The work presented in this thesis has been extensively validated in a unique Prion disease cohort, comprising both the inherited and sporadic forms of the disease. The model has shown to be effective in the prediction of this illness. Furthermore, this approach may have used in other disorders with heterogeneous imaging features, being an added value for the understanding of neurodegenerative diseases. Lastly, given the rarity of this disease, I also addressed the issue of missing data and the limitations raised by it. Overall, this work presents progress towards modelling of Prion diseases and which computational methodologies are potentially suitable for its characterisation.
... Modeling the brain as a network using a connectome approach allows us to gain systems-level insights into large-scale neuronal communication abnormalities associated with brain diseases (such as AD) and may also yield novel features to assist diagnosis and prognosis. The brain's structural network -derived from a tractography algorithm applied to diffusion-weighted MRI data [16,17]-can capture global structural changes caused by different brain diseases including Alzheimer's disease [15,[18][19][20][21]. Prior work [22] has shown the potential of analysis of brain structural networks in Alzheimer's research. ...
Chapter
Mild Cognitive Impairment (MCI) is a clinically intermediate stage in the course of Alzheimer’s disease (AD). MCI does not always lead to dementia. Some MCI patients may stay in the MCI status for the rest of their life, while others will develop AD eventually. Therefore, classification methods that help to distinguish MCI from earlier or later stages of the disease are important to understand the progression of AD. In this paper, we propose a novel computational framework - named Augmented Graph Embedding, or AGE - to tackle this challenge. In this new AGE framework, the random walk approach is first applied to brain structural networks derived from diffusion-weighted MRI to extract nodal feature vectors. A technique adapted from natural language processing is used to analyze these nodal feature vectors, and a multimodal augmentation procedure is adopted to improve classification accuracy. We validated this new AGE framework on data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Results show advantages of the proposed framework, compared to a range of existing methods.
Conference Paper
Multi-task feature learning (MTFL) methods play a key role in predicting Alzheimer’s disease (AD) progression. These studies adhere to a unified feature-sharing framework to promote information exchange on relevant disease progression tasks. MTFL not only utilise the inherent properties of tasks to enhance prediction performance, but also yields weights that are capable to indicate nuanced changes of related AD biomarkers. Task regularized priors, however, introduced by MTFL lead to uncertainty in biomarkers selection, particularly amidst a plethora of highly interrelated biomarkers in a high dimensional space. There is little attention on studying how to design feasible experimental protocols for assessment of MTFL models. To narrow this knowledge gap, we proposed a Randomize Multi-task Feature Learning (RMFL) approach to effectively model and predict AD progression. As task increases, the results show that the RMFL is not only stable and interpretable, but also reduced by 0.2 in normalized mean square error compared to single-task models like Lasso, Ridge. Our method is also adaptable as a general regression framework to predict other chronic disease progression.
Chapter
Self-supervised (SS) learning has achieved remarkable success in learning strong representation for in-domain few-shot and semi-supervised tasks. However, when transferring such representations to downstream tasks with domain shifts, the performance degrades compared to its supervised counterpart, especially at the few-shot regime. In this paper, we proposed to boost the transferability of the self-supervised pre-trained models on cross-domain tasks via a novel self-supervised alignment step on the target domain using only unlabeled data before conducting the downstream supervised fine-tuning. A new reparameterization of the pre-trained weights is also presented to mitigate the potential catastrophic forgetting during the alignment step. It involves low-rank and sparse decomposition, that can elegantly balance between preserving the source domain knowledge without forgetting (via fixing the low-rank subspace), and the extra flexibility to absorb the new out-of-the-domain knowledge (via freeing the sparse residual). Our resultant framework, termed Decomposition-and-Alignment (DnA), significantly improves the few-shot transfer performance of the SS pre-trained model to downstream tasks with domain gaps. (The code is released at https://github.com/VITA-Group/DnA).KeywordsSelf-supervised learningTransfer few-shotLow-rank