Schematic representation of the SVM algorithm classification process. We take as input the preselected training sample consisting of (in the case of this work) three distinct classes of objects. The SVM is taught how to distinguish one class from the others based on the discriminating properties chosen as feature vectors. Then, the classifier is trained by tuning the free parameters ( C and γ ). If the result reaches a high enough accuracy rate (the number of objects from the training sample that are correctly recognised by the classifier) without overfit- ting (the resulting hyperplane does not confine the sources of a specific type too tightly), it will be used to classify the unknown objects (test sample). If the accuracy is not satisfactory, a di ff erent parameter space (or training sample, if possible) is chosen to tune C and γ . After a number of iterations, which allow the classifier to reach high enough e ffi ciency level, a real sample can be classified using the discriminant hyperplanes.

Source publication

The VIMOS Public Extragalactic Redshift Survey (VIPERS) A support vector machine classification of galaxies, stars, and AGNs

Article

Full-text available

Mar 2013

The aim of this work is to develop a comprehensive method for classifying sources in large sky surveys and we apply the techniques to the VIMOS Public Extragalactic Redshift Survey (VIPERS). Using the optical (u*, g', r', i') and NIR data (z', Ks), we develop a classifier, based on broad-band photometry, for identifying stars, AGNs and galaxies imp...

Context 1

... schematic representation of the SVM algorithm classification process, beginning with choosing the training sample, tuning C and γ parameters, self-checking of the classifier, and finally, classifying the real sample is shown in Fig. 2. For our analysis we used LIBSVM 6 (Chang & Lin 2011), an integrated software for support vector classification, which allows for multiclass classification. We used R 7 , a free software environment for statistical computing and graphics, with e1071 interface (Meyer 2001) package installed. The successful application of an SVM algorithm requires a carefully selected training sample – a set of objects with confirmed classes which will serve as a template for distinguishing the sources whose class we want to determine. Since this work is fo- cused on the selection of galaxies, AGNs, and stars we select as a training sample a set of sources whose basic class (galaxy, AGN or star) was established with the highest reliability thanks to their high quality spectra (their redshift being measured with the highest confidence flag within the VIPERS or VVDS surveys). For these sources the accurate photometric information provided by the CFHTLS wide-survey and the WIRCam follow-up observations of the VIPERS / VVDS fields, provided the colour information needed to create the discriminant vectors for training our SVM algorithm. We produced a model (the optimised C and γ parameters based on the training data), which predicts the target values of the test data given only the test data attributes (Hsu et al. 2010). As a galaxy training sample we used the sources with the best redshift measurements in both the W1 and W4 VIPERS fields (VIPERS Zflag = 4, corresponding to the highest confidence level of redshift measurements and thus of spectroscopic classification as a galaxy). It is useful to remember that VIPERS is preselected not only in magnitude ( i < 22 . 5) but also in colours: ( r − i ) > 0 . 5 ∗ ( u ∗ − g ) or ( r − i ) > 0 . 7. We have divided the galaxy training set into i -based apparent magnitude-binned samples and trained the classifier on each subset. As a galaxy training sample we used 16 271 galaxies: 1884, 5483, 6778, and 3226 for 19 i 20, 20 i 21, 21 i 22, and 22 i < 22 . 5 apparent magnitude-bins, respectively. Based on our initial tests, we decided to divide our galaxy sample into the magnitude bins to separate more e ffi ciently di ff erent groups of galaxies seen in di ff erent i apparent magnitude ranges to improve their classification. Figure 3 shows that galaxies in di ff erent magnitude bins occupy di ff erent areas of the colour–colour plots, partly because of di ff erent redshift range and di ff erent morphology. Given the small number of AGNs detected in the VIPERS fields with the VIPERS Zflag = 14, we increased the AGN sample by using all AGNs which had at least 99% confidence level of spectroscopic classification (VIPERS Zflag 13 and 14, in total 398 objects). AGN spectra are quite easy to recognise, so a lower flag on the quality of the measured redshift does not infringe on the reliability of the classification as an AGN. There are two ways that an AGN can be observed in ...

View in full-text

Fig. 1.-The architecture diagrams of the GAN frameworks used in this...

Fig. 2.-GAN with a simple architecture and no input. We trained the...

Fig. 3.-Five examples of GAN improving resolution of cutouts from...

Fig. 4.-Peak finder on different resolutions of simulated blend sample....

Fig. 5.-Performance of GAN in deblending as a function of distance and...

Deblending Galaxies with Generative Adversarial Networks

Preprint

Full-text available

Nov 2022

Deep generative models including generative adversarial networks (GANs) are powerful unsupervised tools in learning the distributions of data sets. Building a simple GAN architecture in PyTorch and training on the CANDELS data set, we generate galaxy images with the Hubble Space Telescope resolution starting from a noise vector. We proceed by modif...

Lyman break and UV-selected galaxies at z ~ 1: II. PACS-100um/160um FIR detections

Article

Full-text available

Jun 2013

We report the PACS-100um/160um detections of a sample of 42 GALEX-selected and FIR-detected Lyman break galaxies (LBGs) at z ~ 1 located in the COSMOS field and analyze their ultra-violet (UV) to far-infrared (FIR) properties. The detection of these LBGs in the FIR indicates that they have a dust content high enough so that its emission can be dire...

The VIMOS Public Extragalactic Redshift Survey (VIPERS). An unprecedented view of galaxies and large-scale structure at 0.5<z<1.2

Article

Full-text available

Mar 2013

We describe the construction and general features of VIPERS, the VIMOS Public Extragalactic Redshift Survey. This `Large Programme' has been using the ESO VLT with the aim of building a spectroscopic sample of ~100,000 galaxies with i_{AB}<22.5 and 0.5<z<1.5. The survey covers a total area of ~24 deg^2 within the CFHTLS-Wide W1 and W4 fields. VIPER...

HST/WFC3 Near-Infrared spectroscopy of quenched galaxies at z~1.5 from the WISP Survey: Stellar population properties

Article

Full-text available

Sep 2013

We combine Hubble Space Telescope (HST) G102 and G141 near-IR (NIR) grism spectroscopy with HST/WFC3-UVIS, HST/WFC3-IR, and Spitzer/IRAC [3.6 μm] photometry to assemble a sample of massive (log (M star/M ☉) ~ 11.0) and quenched (specific star formation rate <0.01 Gyr–1) galaxies at z ~ 1.5. Our sample of 41 galaxies is the largest with G102+G141 NI...

Machine Learning Detects Multiplicity of the First Stars in Stellar Archaeology Data

Article

Full-text available

Mar 2023
ASTROPHYS J

In unveiling the nature of the first stars, the main astronomical clue is the elemental compositions of the second generation of stars, observed as extremely metal-poor (EMP) stars, in the Milky Way. However, no observational constraint was available on their multiplicity, which is crucial for understanding early phases of galaxy formation. We develop a new data-driven method to classify observed EMP stars into mono- or multi-enriched stars with support vector machines. We also use our own nucleosynthesis yields of core-collapse supernovae with mixing fallback that can explain many of the observed EMP stars. Our method predicts, for the first time, that 31.8% ± 2.3% of 462 analyzed EMP stars are classified as mono-enriched. This means that the majority of EMP stars are likely multi-enriched, suggesting that the first stars were born in small clusters. Lower-metallicity stars are more likely to be enriched by a single supernova, most of which have high carbon enhancement. We also find that Fe, Mg. Ca, and C are the most informative elements for this classification. In addition, oxygen is very informative despite its low observability. Our data-driven method sheds a new light on solving the mystery of the first stars from the complex data set of Galactic archeology surveys.

Machine learning detects multiplicity of the first stars in stellar archaeology data

Preprint

Full-text available

Feb 2023

In unveiling the nature of the first stars, the main astronomical clue is the elemental compositions of the second generation of stars, observed as extremely metal-poor (EMP) stars, in our Milky Way Galaxy. However, no observational constraint was available on their multiplicity, which is crucial for understanding early phases of galaxy formation. We develop a new data-driven method to classify observed EMP stars into mono- or multi-enriched stars with Support Vector Machines. We also use our own nucleosynthesis yields of core-collapse supernovae with mixing-fallback that can explain many of observed EMP stars. Our method predicts, for the first time, that $31.8\% \pm 2.3\%$ of 462 analyzed EMP stars are classified as mono-enriched. This means that the majority of EMP stars are likely multi-enriched, suggesting that the first stars were born in small clusters. Lower metallicity stars are more likely to be enriched by a single supernova, most of which have high carbon enhancement. We also find that Fe, Mg. Ca, and C are the most informative elements for this classification. In addition, oxygen is very informative despite its low observability. Our data-driven method sheds a new light on solving the mystery of the first stars from the complex data set of Galactic archaeology surveys.

Data mining techniques on astronomical spectra data. II : Classification Analysis

Preprint

Full-text available

Dec 2022

Classification is valuable and necessary in spectral analysis, especially for data-driven mining. Along with the rapid development of spectral surveys, a variety of classification techniques have been successfully applied to astronomical data processing. However, it is difficult to select an appropriate classification method in practical scenarios due to the different algorithmic ideas and data characteristics. Here, we present the second work in the data mining series - a review of spectral classification techniques. This work also consists of three parts: a systematic overview of current literature, experimental analyses of commonly used classification algorithms and source codes used in this paper. Firstly, we carefully investigate the current classification methods in astronomical literature and organize these methods into ten types based on their algorithmic ideas. For each type of algorithm, the analysis is organized from the following three perspectives. (1) their current applications and usage frequencies in spectral classification are summarized; (2) their basic ideas are introduced and preliminarily analysed; (3) the advantages and caveats of each type of algorithm are discussed. Secondly, the classification performance of different algorithms on the unified data sets is analysed. Experimental data are selected from the LAMOST survey and SDSS survey. Six groups of spectral data sets are designed from data characteristics, data qualities and data volumes to examine the performance of these algorithms. Then the scores of nine basic algorithms are shown and discussed in the experimental analysis. Finally, nine basic algorithms source codes written in python and manuals for usage and improvement are provided.

Spectral Energy Distributions in Three Deep-drilling Fields of the Vera C. Rubin Observatory Legacy Survey of Space and Time: Source Classification and Galaxy Properties

Article

Full-text available

Sep 2022
ASTROPHYS J SUPPL S

W-CDF-S, ELAIS-S1, and XMM-LSS will be three Deep-Drilling Fields (DDFs) of the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST), but their extensive multiwavelength data have not been fully utilized as done in the COSMOS field, another LSST DDF. To prepare for future science, we fit source spectral energy distributions (SEDs) from X-ray to far-infrared in these three fields mainly to derive galaxy stellar masses and star formation rates. We use CIGALE v2022.0, a code that has been regularly developed and evaluated, for the SED fitting. Our catalog includes 0.8 million sources covering 4.9 deg ² in W-CDF-S, 0.8 million sources covering 3.4 deg ² in ELAIS-S1, and 1.2 million sources covering 4.9 deg ² in XMM-LSS. Besides fitting normal galaxies, we also select candidates that may host active galactic nuclei (AGNs) or are experiencing recent star formation variations and use models specifically designed for these sources to fit their SEDs; this increases the utility of our catalog for various projects in the future. We calibrate our measurements by comparison with those in well-studied smaller regions and briefly discuss the implications of our results. We also perform detailed tests of the completeness and purity of SED-selected AGNs. Our data can be retrieved from a public website.

Spectral Energy Distributions in Three Deep-Drilling Fields of the Vera C. Rubin Observatory Legacy Survey of Space and Time: Source Classification and Galaxy Properties

Preprint

Jun 2022

W-CDF-S, ELAIS-S1, and XMM-LSS will be three Deep-Drilling Fields (DDFs) of the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST), but their extensive multi-wavelength data have not been fully utilized as done in the COSMOS field, another LSST DDF. To prepare for future science, we fit source spectral energy distributions (SEDs) from X-ray to far-infrared in these three fields mainly to derive galaxy stellar masses and star-formation rates. We use CIGALE v2022.0, a code that has been regularly developed and evaluated, for the SED fitting. Our catalog includes 0.8 million sources covering $4.9~\mathrm{deg^2}$ in W-CDF-S, 0.8 million sources covering $3.4~\mathrm{deg^2}$ in ELAIS-S1, and 1.2 million sources covering $4.9~\mathrm{deg^2}$ in XMM-LSS. Besides fitting normal galaxies, we also select candidates that may host active galactic nuclei (AGNs) or are experiencing recent star-formation variations and use models specifically designed for these sources to fit their SEDs; this increases the utility of our catalog for various projects in the future. We calibrate our measurements by comparison with those in well-studied smaller regions and briefly discuss the implications of our results. We also perform detailed tests of the completeness and purity of SED-selected AGNs. Our data can be retrieved from a public website.

Deep transfer learning for star cluster classification: I. application to the PHANGS–HST survey

Article

Apr 2020
MON NOT R ASTRON SOC

We present the results of a proof-of-concept experiment that demonstrates that deep learning can successfully be used for production-scale classification of compact star clusters detected in Hubble Space Telescope(HST) ultraviolet-optical imaging of nearby spiral galaxies (⁠D≲20Mpc⁠) in the Physics at High Angular Resolution in Nearby GalaxieS (PHANGS)–HST survey. Given the relatively small nature of existing, human-labelled star cluster samples, we transfer the knowledge of state-of-the-art neural network models for real-object recognition to classify star clusters candidates into four morphological classes. We perform a series of experiments to determine the dependence of classification performance on neural network architecture (ResNet18 and VGG19-BN), training data sets curated by either a single expert or three astronomers, and the size of the images used for training. We find that the overall classification accuracies are not significantly affected by these choices. The networks are used to classify star cluster candidates in the PHANGS–HST galaxy NGC 1559, which was not included in the training samples. The resulting prediction accuracies are 70 per cent, 40 per cent, 40–50 per cent, and 50–70 per cent for class 1, 2, 3 star clusters, and class 4 non-clusters, respectively. This performance is competitive with consistency achieved in previously published human and automated quantitative classification of star cluster candidate samples (70–80 per cent, 40–50 per cent, 40–50 per cent, and 60–70 per cent). The methods introduced herein lay the foundations to automate classification for star clusters at scale, and exhibit the need to prepare a standardized data set of human-labelled star cluster classifications, agreed upon by a full range of experts in the field, to further improve the performance of the networks introduced in this study.

Deep Transfer Learning for Star Cluster Classification: I. Application to the PHANGS-HST Survey

Preprint

Sep 2019

We present the results of a proof-of-concept experiment which demonstrates that deep learning can successfully be used for production-scale classification of compact star clusters detected in HST UV-optical imaging of nearby spiral galaxies in the PHANGS-HST survey. Given the relatively small and unbalanced nature of existing, human-labelled star cluster datasets, we transfer the knowledge of neural network models for real-object recognition to classify star clusters candidates into four morphological classes. We show that human classification is at the 66%:37%:40%:61% agreement level for the four classes considered. Our findings indicate that deep learning algorithms achieve 76%:63%:59%:70% for a star cluster sample within 4Mpc < D <10Mpc. We tested the robustness of our deep learning algorithms to generalize to different cluster images using the first data obtained by PHANGS-HST of NGC1559, which is more distant at D = 19Mpc, and found that deep learning produces classification accuracies 73%:42%:52%:67%. We furnish evidence for the robustness of these analyses by using two different neural network models for image classification, trained multiple times from the ground up to assess the variance and stability of our results. We quantified the importance of the NUV, U, B, V and I images for morphological classification with our deep learning models, and find that the V-band is the key contributor as human classifications are based on images taken in that filter. This work lays the foundations to automate classification for these objects at scale, and the creation of a standardized dataset.

Identification of Young Stellar Object candidates in the Gaia DR2 x AllWISE catalogue with machine learning methods

Article

Aug 2019
MON NOT R ASTRON SOC

The second Gaia Data Release (DR2) contains astrometric and photometric data for more than 1.6 billion objects with mean Gaia G magnitude <20.7, including many Young Stellar Objects (YSOs) in different evolutionary stages. In order to explore the YSO population of the Milky Way, we combined the Gaia DR2 data base with Wide-field Infrared Survey Explorer (WISE) and Planck measurements and made an all-sky probabilistic catalogue of YSOs using machine learning techniques, such as Support Vector Machines, Random Forests, or Neural Networks. Our input catalogue contains 103 million objects from the DR2xAllWISE cross-match table. We classified each object into four main classes: YSOs, extragalactic objects, main-sequence stars, and evolved stars. At a 90 per cent probability threshold, we identified 1 129 295 YSO candidates. To demonstrate the quality and potential of our YSO catalogue, here we present two applications of it. (1) We explore the 3D structure of the Orion A star-forming complex and show that the spatial distribution of the YSOs classified by our procedure is in agreement with recent results from the literature. (2) We use our catalogue to classify published Gaia Science Alerts. As Gaia measures the sources at multiple epochs, it can efficiently discover transient events, including sudden brightness changes of YSOs caused by dynamic processes of their circumstellar disc. However, in many cases the physical nature of the published alert sources are not known. A cross-check with our new catalogue shows that about 30 per cent more of the published Gaia alerts can most likely be attributed to YSO activity. The catalogue can be also useful to identify YSOs among future Gaia alerts.

The Dark Energy Survey: Data Release 1

Article

Full-text available

Nov 2018

We describe the first public data release of the Dark Energy Survey, DES DR1, consisting of reduced single-epoch images, co-added images, co-added source catalogs, and associated products and services assembled over the first 3 yr of DES science operations. DES DR1 is based on optical/near-infrared imaging from 345 distinct nights (2013 August to 2016 February) by the Dark Energy Camera mounted on the 4 m Blanco telescope at the Cerro Tololo Inter-American Observatory in Chile. We release data from the DES wide-area survey covering ~5000 deg2 of the southern Galactic cap in five broad photometric bands, grizY. DES DR1 has a median delivered point-spread function of $g=1.12$, r = 0.96, i = 0.88, z = 0.84, and Y = 0farcs90 FWHM, a photometric precision of <1% in all bands, and an astrometric precision of 151 $\,\mathrm{mas}$. The median co-added catalog depth for a 1farcs95 diameter aperture at signal-to-noise ratio (S/N) = 10 is g = 24.33, r = 24.08, i = 23.44, z = 22.69, and Y = 21.44 $\,\mathrm{mag}$ . DES DR1 includes nearly 400 million distinct astronomical objects detected in ~10,000 co-add tiles of size 0.534 deg2 produced from ~39,000 individual exposures. Benchmark galaxy and stellar samples contain ~310 million and ~80 million objects, respectively, following a basic object quality selection. These data are accessible through a range of interfaces, including query web clients, image cutout servers, jupyter notebooks, and an interactive co-add image visualization tool. DES DR1 constitutes the largest photometric data set to date at the achieved depth and photometric precision.

Star-galaxy classification in the Dark Energy Survey Y1 dataset

Article

Sep 2018
MON NOT R ASTRON SOC

We perform a comparison of different approaches to star–galaxy classification using the broad-band photometric data from Year 1 of the Dark Energy Survey. This is done by performing a wide range of tests with and without external ‘truth’ information, which can be ported to other similar data sets. We make a broad evaluation of the performance of the classifiers in two science cases with DES data that are most affected by this systematic effect: large-scale structure and Milky Way studies. In general, even though the default morphological classifiers used for DES Y1 cosmology studies are sufficient to maintain a low level of systematic contamination from stellar misclassification, contamination can be reduced to the O(1 per cent) level by using multi-epoch and infrared information from external data sets. For Milky Way studies, the stellar sample can be augmented by |${\sim }20{{\ \rm per\ cent}}$| for a given flux limit. Reference catalogues used in this work are available at http://des.ncsa.illinois.edu/releases/y1a1.

Context in source publication

Similar publications

Citations