Article

Sample Selection Bias Specification Error

Authors:
To read the full-text of this research, you can request a copy directly from the author.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

... To fortify the primary findings, we conducted various tests to address endogeneity, sensitivity and robustness concerns, consistent with prior research methodologies. Initially, an endogeneity test was performed using propensity score matching and Heckman (1979) techniques. Subsequently, we conducted global sensitivity analyses at the study level. ...
... To ensure the robustness of our analysis and account for potential sources of variation, we incorporate year-and industry-fixed effects into our research model. In addition, we also include endogeneity and robustness tests to rigorous our main findings, such as propensity score matching, Heckman (1979) and entropy balancing approach. To address our research questions and hypotheses effectively, we have formulated the following research model: ...
... Propensity score matching procedures Our initial endogeneity analysis involved conducting the Heckman (1979) Breaking barriers analysis by performing a propensity score matching procedure (Rosenbaum and Rubin, 1983). This procedure aimed to select companies with similar sets of characteristics that could potentially influence corporate climate disclosure but varied in whether their CEO possessed a STEM educational background. ...
Article
Purpose This study aims to investigate the association between chief executive officer (CEO) educational backgrounds in science, technology, engineering and mathematics (STEM) and climate change disclosure within Indonesian companies. Design/methodology/approach Using data spanning from 2017 to 2022 from all publicly traded companies, the study uses ordinary least squares with fixed effects and robust standard error to evaluate the proposed hypothesis. In addition, a series of endogeneity tests are incorporated to bolster the robustness of the findings. Findings The study reveals that CEOs with a STEM educational background are more inclined to participate in corporate climate change disclosure compared to their counterparts with a non-STEM background. These results emphasize the significant role CEO educational backgrounds play in shaping a company’s approach to sustainability, specifically in the realm of climate change disclosure. The insights gleaned from this research hold valuable implications for various stakeholders, including top management and investors aiming to enhance corporate sustainability. Recognizing the influence of CEO characteristics, particularly a STEM educational background, proves pivotal in improving corporate climate change disclosure. Stakeholders can leverage this understanding to formulate and implement effective strategies toward realizing a company’s sustainability vision. Originality/value Notably, this study stands out as it was conducted within the context of Indonesia, a nation actively encouraging nonsocial graduates to assume crucial positions within the Republic of Indonesia.
... Considering the possibility that commuting workers do not represent a random sample of the population, that is, that both observable and unobservable characteristics influence the decision regarding commuting, it is necessary to add correction for sample selection bias to the wage equation. Proposed by Heckman (1979), the following equation is then formulated: ...
... By assuming that commuters do not constitute a random sample of the population, that is, that they form a positively selected group, the first stage of the method proposed by Heckman (1979) is applied, with the correction of sample selection bias. This is obtained through a probit model, so that the characteristics that influence the decision process regarding commuting are estimated as in Eq. (3). ...
... They are considering the procedure proposed by Heckman (1979), where the vector of variables X , which contains the variables with influence on the commuting decision, can cundoubtedly present variables in common with those contained in the vector Z , which is composed of the determining variables for the income equation. However, there is a need for at least one of the variables contained in X not to be included in the vector Z . ...
Article
This article aims to analyse whether there is favourable migration selectivity in formal work in Ceará. In other words, whether the worker's unobservable characteristics influence commuting and labour income differentials. For this purpose, microdata from the annual social information list—RAIS of the Brazilian Ministry of Economy—MEB for 2009 and 2019 were used. Furthermore, the methodology used was a two-stage Heckman model with correction for sample selection bias for both years. The results show that commuting migrants in formal work in Ceará are not positively selected. Furthermore, income differentials among commuting migrants are determined by individuals' socioeconomic and demographic characteristics and the labour market, such as education, race/colour, gender, and occupation sector.
... Because firms can voluntarily choose to disclose carbon emissions, the empirical test on market value has a self-selection bias. To control for self-selection, we employ a two-stage estimating approach (Heckman 1979). 5 We add instrumental variables: FRNSALE, ENV_ISO, ENV_IRRG, INSINVESTOR and BTM. ...
... Next, we employ a regression model to test H2. To control for choice of disclosure on carbon emissions, we employ a two-stage approach to alleviate self-selection bias (Heckman 1979). Specifically, we calculate the inverse Mill's ratio (IMR) from the Heckman first-stage ...
... Consistent with prior research on market value, we find positive and significant coefficients on SIZE, ORTNI, CGRANK, PTB, and SRET but a negative and significant coefficient on LEV (p-value < 0.05), indicating that larger firms with less financial leverage and better operating performance, market-to-book ratio, stock return, and corporate governance increase market value in Table 5. Note: This table provides results to control for selection of carbon emission disclosure. We employ two-stage of Heckman (1979), and estimate the inverse Mill's ratio (IMR) from Equation (1) and then add IMR in the Equation (2) to correct for selfselection problem. MV is defined as MKTVAL divided by total assets. ...
Article
Our paper explores the association between management quality and carbon emission disclosures. We assert that high-quality managers have more abilities and resources to measure and manage their firm's carbon emissions, leading to increased voluntary carbon emission disclosures. As expected, our results show that high-quality management is positively associated with the likelihood of carbon emission disclosures. After controlling for self-selection bias, we further find that high-quality management can enhance the positive effects of carbon emission disclosures on market value. Finally, we observe that high-quality managers are positively associated with reduced carbon emissions. Overall, our study offers an incremental contribution to the extant literature by showing that management quality is a key factor driving carbon emission disclosures.
... Several robustness analyses are undertaken, namely: entropy balancing to mitigate observable selection bias arising from imbalances in firm characteristics; Heckman's (1979) two-stage approach to mitigate selection bias arising from unobservable heterogeneity; firm and country fixed effects to address time-invariant unobserved characteristics within firms/countries that correlate with the explanatory variables; and two-stage instrumental variable (IV) analysis to address reverse causality. ...
... These findings are interpreted to mean that firms with better CCP have a lower level of firm risk, with high-quality country-level governance accentuating this association. Our findings are robust using, as previously mentioned, entropy balancing; Heckman's (1979) two-stage analysis; firm and country fixed effects; and instrumental variable (IV) analysis. We also find that country-level business culture, emissions trading schemes (ETSs), climate change performance and attention to carbon emissions accentuate the negative association between CCP and firm risk. ...
... Consequently, the empirical association between CCP and firm risk could be influenced by unobservable self-selection bias. To address this potential bias, we utilise Heckman's (1979) two-stage analysis, following prior studies (Matsumura et al., 2014;Griffin et al., 2017;Bose et al., 2021). In the first stage, we analyse the firm's decision to report CCP data, expanding our sample to include firms that opted to not disclose this information during the sample period. ...
Article
Full-text available
This study examines the association between corporate carbon performance (CCP) and firm risk using a sample of 9,212 firm-year observations from 13 countries in the Asia-Pacific region over the period 2002-2021. We also examine the moderating role of the quality of country-level governance in the association between CCP and firm risk. We find that CCP is negatively associated with a firm's total, idiosyncratic and systematic risk and that country-level governance quality accentuates the negative association between CCP and firm risk. We also find that country-level business culture, emissions trading schemes, climate change performance and attention to carbon emissions accentuate the negative association between CCP and firm risk. Given the growing demands from regulatory bodies for increased transparency on carbon performance, the insights gained from our research hold significant relevance for regulators, policy makers, investors, financial analysts, scholars and businesses.
... Their method relies on the assumption that the refusal behaviour in different populations is comparable. Reniers et al, 8 Bärnighausen et al, 10 Hogan et al 11 adjusted non-response bias by a Heckman-type selection model, 12 which allows non-response to be informative but requires the existence of a valid instrumental variable that satisfies the exclusion criteria of explaining non-response but not the outcome. Arpino et al 13 constructed bounds based on the partial identification approach of Menski. ...
... We use a set of two equations: A selection equation and an outcome equation, on the subset of observations with = 1. The selection equation is defined as = 0 + 11 residence + 12 age + 13 region + 14 interviewer + 2 2 + 3 3 + 4 4 + , (S. 12) and the outcome equation is defined as = 0 + 11 residence + 12 age + 13 region + 14 interviewer + 2 2 + 3 3 + 4 4 + + , (S. 13) where ( , ) are generated using a (0, Σ) distribution. The quantity is used to model spatial correlations in HIV rates. ...
... For the method of Marra et al. 24 , we use only interviewer as the IV. The procedure is implemented via three equations: a selection equation with age, rural, and region as predictors, an outcome equation with the same variables as predictors and a third equation that models the copula between selection and outcome (see Heckman 12 ) using region only as predictor. In the simulations, we tried a selection of representative copulas: Normal, Frank, Clayton rotated 90 degrees, and Clayton rotated 270 degrees and then chose the best among them based on AIC. ...
Article
Full-text available
HIV estimation using data from the demographic and health surveys (DHS) is limited by the presence of non‐response and test refusals. Conventional adjustments such as imputation require the data to be missing at random. Methods that use instrumental variables allow the possibility that prevalence is different between the respondents and non‐respondents, but their performance depends critically on the validity of the instrument. Using Manski's partial identification approach, we form instrumental variable bounds for HIV prevalence from a pool of candidate instruments. Our method does not require all candidate instruments to be valid. We use a simulation study to evaluate and compare our method against its competitors. We illustrate the proposed method using DHS data from Zambia, Malawi and Kenya. Our simulations show that imputation leads to seriously biased results even under mild violations of non‐random missingness. Using worst case identification bounds that do not make assumptions about the non‐response mechanism is robust but not informative. By taking the union of instrumental variable bounds balances informativeness of the bounds and robustness to inclusion of some invalid instruments. Non‐response and refusals are ubiquitous in population based HIV data such as those collected under the DHS. Partial identification bounds provide a robust solution to HIV prevalence estimation without strong assumptions. Union bounds are significantly more informative than the worst case bounds without sacrificing credibility.
... SSB is well known for decades [25,26,3], and it has been discussed across several disciplines, including social science and econometrics [27,28], environmental studies [62], finance [35], causality [31,4], fairness [20], healthcare [44], and machine learning [66]. It is also an important consideration for clinical study design as the results of a biased study may not apply to a real patient population [6,48]. ...
... SSB was initially discussed in econometrics and received great attention [3,26,59] because in econometrics data collection often relies on surveys where participants are self-selected and are not representative of the population [25]. Heckman [28], in a Nobel Prize-winning work, developed a method to correct SSB, using the probability of selection into the sample. However, this method is only applicable to linear regression algorithms. ...
... For example, Du et al. [20] discusses SSB for fairness when dependent variable values of a set of samples from training data are missing as a result of another hidden process. Their framework adopts the classic Heckman model [28] for bias correction with Lagrange duality, achieving fairness in regression setting based on a variety of fairness notions. Similarly, to address systematic biases in risk prediction across demographic groups, Hong et al. [32] employed imbalance learning, transfer learning, and federated learning techniques. ...
Preprint
Full-text available
While machine learning algorithms hold promise for personalised medicine, their clinical adoption remains limited. One critical factor contributing to this restraint is sample selection bias (SSB) which refers to the study population being less representative of the target population, leading to biased and potentially harmful decisions. Despite being well-known in the literature, SSB remains scarcely studied in machine learning for healthcare. Moreover, the existing techniques try to correct the bias by balancing distributions between the study and the target populations, which may result in a loss of predictive performance. To address these problems, our study illustrates the potential risks associated with SSB by examining SSB's impact on the performance of machine learning algorithms. Most importantly, we propose a new research direction for addressing SSB, based on the target population identification rather than the bias correction. Specifically, we propose two independent networks (T-Net) and a multitasking network (MT-Net) for addressing SSB, where one network/task identifies the target subpopulation which is representative of the study population and the second makes predictions for the identified subpopulation. Our empirical results with synthetic and semi-synthetic datasets highlight that SSB can lead to a large drop in the performance of an algorithm for the target population as compared with the study population, as well as a substantial difference in the performance for the target subpopulations that are representative of the selected and the non-selected patients from the study population. Furthermore, our proposed techniques demonstrate robustness across various settings, including different dataset sizes, event rates, and selection rates, outperforming the existing bias correction techniques.
... Therefore, the problem could be controlled through statistical methods such as Instrumental variables (IVs) and Heckman estimation models (Baker, 2000). For this reason, Heckman's (1979) two-stage estimation model with inverse mill's ratio as a correction factor for selection bias was used over IVs instead of the mentioned techniques. ...
... Since, statistical analysis based on non-randomly selected samples could lead to erroneous conclusion of project intervention; Heckman's (1979) two-stage estimation model was used to analyze a stratified non-random sample. This was applied to discern primarily a particular subset of with and without intervention sample (Wooldridge, 2001). ...
Article
Full-text available
Development intervention is increasingly reported as a means of improving the livelihoods of the vulnerable rural people. However, little information based on appropriate methodological approaches is available on diverse outcomes. This study was conducted to assess the livelihood impact of Tanzania Social Action Fund intervention in Agriculture for vulnerable communities in Makete and Rungwe districts. This research examined the effectiveness of intervention in food security of recipients in both districts. A quasi-experimental design was used to collect a sample of 192 and 108 recipient and non-recipient households including triangulation approaches, respectively. Heckman selection model two-stage estimation approach was employed to analyze cross-sectional data. Results show that there were no difference in food security between recipients and non recipients. Based on these findings, it is concluded that participation had no positive effect on food security. Therefore, it is recommended that intervention should be on prevention basis rather than coping strategies.
... The findings of the robustness test are consistent with the baseline results. To address endogeneity concerns, we use a Heckman two-stage estimation method to correct and control for self-selection bias (Heckman 1979). To control for a potential bias arising from omitted variables in our model we conduct further tests using the parametrisation recommended by Oster (2019), excluding zero in the bounds for our main outcomes. ...
... The association between the risk committee and the IR index could be considered endogenous, bearing a selfselection bias due to the possibility of omitted and significant variables affecting the firm's decision to form a risk committee. To address this potential self-selection bias, this study uses Heckman's two-stage approach reported in Panel A of Table 5 (Heckman 1979). We construct a logistical model for the probability of the formation of a risk committee and compute the inverse Mills ratio (IMR) (Huang et al. 2014). ...
Article
This study investigates the relationship between the risk committee (existence and effectiveness) and the quality of integrated reports of the top 200 listed companies on the Australian Securities Exchange (ASX). A composite ordinal proxy for the firms’ integrated reporting was constructed using data that were hand‐collected from annual reports. The main result reports that the existence of a standalone risk committee is negatively and significantly associated with the quality of integrated reporting; however, integrated reporting is positively associated with firms adopting a combined risk and audit committee and risk committee effectiveness .
... Específicamente, se necesita abordar la probabilidad de que la calidad en la divulgación de carbono pueda, a su vez, afectar a estas variables predictoras. Para mitigar los efectos endógenos y el potencial sesgo de selección, se emplea la corrección de Heckman en dos etapas (Heckman, 1979). La endogeneidad es identificada como un problema de causalidad inversa, donde una o más variables explicativas son influenciadas por la variable dependiente (Greene, 2018;Wooldridge, 2010). ...
... A fin de controlar esta influencia endógena, se incorpora una variable exógena al modelo (Rodríguez-Jasso, 2021). El sesgo de selección se aborda mediante la aplicación de la técnica de corrección en dos etapas de Heckman (Broadstock et al., 2018;He et al., 2019;Heckman, 1979). ...
Thesis
Full-text available
Propósito – Esta investigación examina los efectos que tienen la discrecionalidad gerencial, factores de gobierno corporativo y de estructura de propiedad sobre la calidad de la divulgación de carbono en las empresas que participan en el Carbon Disclosure Project y cotizan en el mercado de valores mexicano (Bolsa Mexicana de Valores y Bolsa Institucional de Valores). Marco teórico – El estudio se basa en la teoría de agencia y teoría institucional que contemplan tanto dinámicas internas como externas de las organizaciones. Además, la investigación adopta el concepto de la riqueza socioemocional que persiguen las empresas familiares. Diseño/metodología/enfoque – Se emplea una muestra de 71 empresas para el periodo 2016 – 2022. Se aplica una regresión panel logística ordinal utilizando STATA 17 para evaluar la relación entre la discreción gerencial y la divulgación de carbono, además de las variables de gobierno corporativo y estructura de propiedad. Hallazgos – La discreción gerencial y dualidad gerencial no afectan la calidad de la divulgación de carbono. Por su parte, la independencia del consejo y la presencia de un comité medioambiental reflejan ser efectivos para el monitoreo y mejora de prácticas sostenibles. Por último, De manera inesperada, la centralidad empresarial y propiedad familiar impactan negativamente en la calidad de divulgación. Investigación, implicaciones prácticas y sociales – Los resultados aportan una comprensión más profunda de los determinantes que influyen en la calidad de la divulgación de información de carbono de las empresas, con implicaciones para la toma de decisiones gerenciales y política pública estratégicas en relación con la sostenibilidad. Originalidad/valor – Este estudio contribuye al avance de la Ciencia Administrativa al explorar la conexión entre la discreción gerencial y la calidad de la divulgación de carbono, resaltando la relevancia del efecto que arrojan la centralidad empresarial y la propiedad familiar. Ofrece perspectivas nuevas para la práctica corporativa y estratégica en el contexto de la responsabilidad ambiental.
... The second form is sample selection bias, which occurs when the sample is comprised of a non-random subset of observations. This is addressed using the two-step method proposed by Heckman (1979). The third form is simultaneity or reverse causality, where the dependent and independent variables reciprocally affect each other. ...
... For the robustness test of the model, considering that the dependent variable, enterprise export DVAR, is typically based on imputed data, this study employs the Tobit model to assess model robustness, taking into account the bounds of imputation. Table 4, columns (1), (2), present the regression outcomes from employing Heckman's (1979) two-step method to evaluate sample selection bias. The inverse Mills ratio (IMR) noted in column (2) is −0.415 and achieves statistical significance, suggesting the presence of sample selection bias within this research. ...
Article
Full-text available
This paper examines the impact of environmental decentralization on the export domestic value-added rate of enterprises using combined data from 2000–2014 from China Industrial Enterprise Database, China Customs Database, WIOD, China Environment Yearbook and China Enterprise Patent Database. The research findings show that the overall environmental decentralization has an inverted U-shaped impact on enterprises’ export DVAR, with 94.4% of the sample in the promotion interval. 73.2% of ordinary trade enterprises and 85.7% of processing trade enterprises are in the suppressive interval of the U-shaped impact of administrative decentralization; 69.2% of ordinary trade enterprises are in the suppressive interval of the U-shaped impact of monitoring decentralization, and 85.7% of processing trade enterprises are in the promotion range of the inverted U-shaped impact; 66.0% of ordinary trade enterprises and 86.7% of processing trade enterprises are in the suppression range of the U-shaped impact of monitoring decentralization. In addition, cost markup and R&D innovation as mediating variables are important transmission channels for environmental decentralization to influence enterprises’ export DVAR.
... In all models, we adjusted for sample selectivity due to missing exposure and outcome data, relative to the initially recruited sample, using a two-stage Heckman selection strategy [40]. Initially, we predicted an indicator of selection with socio-demographic factors, namely, age, race, sex and PIR using probit regression, which yielded an inverse mills ratio (IMR)-a function of the probability of being selected given those socio-demographic factors. ...
... Initially, we predicted an indicator of selection with socio-demographic factors, namely, age, race, sex and PIR using probit regression, which yielded an inverse mills ratio (IMR)-a function of the probability of being selected given those socio-demographic factors. Subsequently, we estimated our Cox proportional hazards regression models adjusted for the IMR in addition to afore-mentioned covariates [40,41]. ...
Article
Full-text available
Neurofilament light chain (NfL) is a neuron-specific structural protein released into the extracellular space, including body fluids, upon neuroaxonal damage. Despite evidence of a link in neurological disorders, few studies have examined the association of serum NfL with mortality in population-based studies. Data from the National Health and Nutrition Survey were utilized including 2,071 Non-Hispanic White, Non-Hispanic Black and Hispanic adult participants and adult participants of other ethnic groups (20–85 years) with serum NfL measurements who were followed for ≤ 6 years till 2019. We tested the association of serum NfL with mortality in the overall population and stratified by sex with the addition of potential interactive and mediating effects of cardio-metabolic risk factors and nutritional biomarkers. Elevated serum NfL levels (above median group) were associated with mortality risk compared to the below median NfL group in the overall sample (P = 0.010), with trends observed within each sex group (P < 0.10). When examining Loge NfL as a continuum, one standard deviation of Loge NfL was associated with an increased mortality risk (HR = 1.88, 95% CI 1.60–2.20, P < 0.001) in the reduced model adjusted for age, sex, race, and poverty income ratio; a finding only slightly attenuated with the adjustment of lifestyle and health-related factors. Four-way decomposition indicated that there was, among others, mediated interaction between NfL and HbA1c and a pure inconsistent mediation with 25(OH)D3 in predicting all-cause mortality, in models adjusted for all other covariates. Furthermore, urinary albumin-to-creatinine ratio interacted synergistically with NfL in relation to mortality risk both on the additive and multiplicative scales. These data indicate that elevated serum NfL levels were associated with all-cause mortality in a nationally representative sample of US adults.
... The Heckman model employs a two-equation system to correct for this bias-one focusing on selection into the sample (opt-in when an outcome is observed-the sample selection equation), and the main equation linking the covariates of interest to the outcome (the outcome equation) [73]. The first equation, the selection equation, employs a logit model to estimate the likelihood of inclusion in the citizen scientist sample based on observable characteristics. ...
... The LR test is based on comparing the loglikelihood of the full model (allowing for correlation between errors) to the log-likelihood of a restricted model (omitted selection equation, thus assuming no correlation between errors). A significant LR test suggests that the full model with correlation between errors provides a better fit to the data than the restricted model [73,76]. Table 3 summarizes the results of the goodness of fit analyses for a one-factor model per latent variable. ...
Article
Full-text available
Citizen science, where non-specialists collaborate with scientists, has surged in popularity. While it offers an innovative approach to research involvement, the domain of agri-environmental research participation, particularly in terms of citizen recruitment and retention, remains relatively unexplored. To investigate how what factors influence initial and sustained participation in an agronomic citizen science project, we performed a large survey during the case-study “Soy in 1000 Gardens”. We obtained data on citizens motivations, general values, environmental concern, prior citizen science experience, and knowledge regarding sustainable food consumption and garden management and applied a two-step selection model to correct for potential self-selection bias on our participation outcomes. Initially, citizen scientists appear to be mostly motivated by gaining knowledge, having fun social interactions and environmental concern with regards to the effects on others, while the desire for enhancing or protecting their ego is less prominent. They also display higher knowledge and self-transcending values. Sustained participants however, are significantly older and share a stronger sense of moral obligation than their dropped-out counterparts. Moreover, prior experience seems to positively influence the length of their participation, while higher knowledge has a positive impact on the amount of data contributed. These insights offer strategies for tailored engagement that should emphasize collective impact, align with intrinsic values, and foster a sense of moral duty, with potential to enhance agri-environmental citizen science initiatives’ effectiveness in addressing environmental challenges.
... This may result in a situation where some households that intend to engage in land transfers actually do not participate. This implies that the sample consists of self-selected households, which could potentially introduce a sample selection problem into the model [45,46]. Additionally, considering that the explanatory variables in the model are binary, this paper uses the Heckman two-step model to address endogeneity issues arising from unobservable variables. ...
... When self-selection bias is present, it is necessary to assume that individuals choose whether to participate in land transfers based on unobservable variables. Therefore, following common practices in authoritative literature, this paper uses the Heckman two-step model to control for these issues [45]. ...
Preprint
Full-text available
The United Nations Sustainable Development Goals call for the eradication of poverty, and China has proposed the Rural Revitalization Strategy based on the achievements of its poverty alleviation efforts in 2020. As a vital component of this strategy, the impact of land transfer on farmers' income growth has become a hot topic in both theoretical and practical sectors. This paper utilizes data from the CFPS database for the years 2010-2018 to construct OLS regression models and a Heckman two-stage model to analyze the effects of land transfers on the incomes of different types and sources among farmers. The empirical results show: First, land transfers do promote income growth among farmers, but the effect is primarily positive for those transferring out land, with negligible impact on those acquiring land. Second, the impact of land transfers varies between different types of income for transferees and transferors. Wage income contributes up to 88.26% to the income growth of transferors, a significantly higher rate than the decrease in business income; however, the increase in business income for transferees does not offset the decline in wage income, leading to no significant change in overall income. Further sensitivity analysis using the Heckman two-stage model confirms the robustness of these findings. The conclusions of this study provide theoretical and empirical evidence to optimize land transfer policies, enhance participation in land transfers, and ultimately achieve the mission of rural revitalization.
... required. Additionally, it is necessary to generate estimates specifically for those who receive labour income. This requires an initial step to be taken before running the appropriate work model, in which it is indicated whether they are employed or not (Rodriguez Lozano, 2016). All of the previously indicated factors would lead to selection bias.Heckman (1979) presents a two-step methodology to address this issue. The initial step involves employing a probit model to assess the likelihood of an event taking place, with the dependent variable. Subsequently, the outcomes are examined using an OLS (Ordinary Least Squares) approach that utilises the inverse of Mills. Instead of this, Büchel and v ...
... In the second stage, the probability of having a suitable job is estimated(Büchel and van Ham, 2003). Given the binary nature of employment, where individuals are either hired or not, we utilise a probit model(Büchel and van Ham, 2003).Based on the two-step model ofHeckman (1979) with the variation ofBüchel and van Ham (2003), the employment decision and that of adequate employment depend on an Index that isJOURNAL OF SMART ECONOMIC GROWTHwww.jseg.ro ISSN: one or more explanatory variables for the purposes of this paper. ...
Article
Full-text available
This study employs the Heckman selection model to analyse the dynamics of adequate employment in Ecuador, particularly in the context of spatial determinants. The findings reveal that while formal education significantly boosts employment prospects, its impact on job adequacy is nuanced by regional disparities and local market characteristics. Furthermore, the study uncovers gender disparities, with women facing greater obstacles in both employment access and adequacy. The role of commuting emerges as a critical factor, where infrastructure and urban planning are shown to influence job suitability. Comparatively, the Ecuadorian labour market demonstrates a higher sensitivity to spatial determinants and accessibility issues than its European Union counterparts, indicating that local employment opportunities are more constrained by geographical and infrastructural factors. This research underscores the need for integrated employment policies that consider educational alignment, mobility enhancement, and inclusivity, aimed at strengthening the link between acquired skills and job market needs within the unique Ecuadorian context JEL classification: I24, I25, J01, J24
... The empirical results of the 2SLS regression remain the same as those of the fixed effect model. In addition, a Heckman two-step regression (Heckman, 1979) for QFII's ownership and information asymmetry levels shows that our results are not affected by self-selection. Finally, when we employ the dummy variable of QFII's shareholdings and the volume of QFII's shareholdings as substitute measures of QFII's participation or take the stock market shocks of 2007, 2008, and 2015 into account in the robustness test, the quantitative relation between QFII's participation and liquidity remains the same. ...
Article
Full-text available
Our paper investigates how qualified foreign institutional investors (QFII) impact stock liquidity in the Chinese A-share market using data for 2005 to 2019. Contrary to previous findings, we find that QFII enhance stock liquidity. Specifically, QFII’s participation is negatively associated with the individual stock illiquidity and positively related to stock trading volume. Moreover, using a step-by-step procedure, we provide evidence that QFII raises stock liquidity by ameliorating information asymmetry. QFII will attract more market attention and improve firm disclosure quality. We address possible endogeneity with fixed effects and instrumental variables, and our findings are robust to self-selection bias, stock market shocks, and alternative explanatory variables.
... To formulate the joint distribution of the data and the missing-data pattern, we consider in this paper the selection model (Heckman 1979), which factorizes it into the product of the marginal data density and the missing-data mechanism (1). This approach has the great advantage of allowing imputation of the missing values and density estimation throughout the parameter estimation of the mixture model. ...
Article
Full-text available
Model-based unsupervised learning, as any learning task, stalls as soon as missing data occurs. This is even more true when the missing data are informative, or said missing not at random (MNAR). In this paper, we propose model-based clustering algorithms designed to handle very general types of missing data, including MNAR data. To do so, we introduce a mixture model for different types of data (continuous, count, categorical and mixed) to jointly model the data distribution and the MNAR mechanism, remaining vigilant to the relative degrees of freedom of each. Several MNAR models are discussed, for which the cause of the missingness can depend on both the values of the missing variable themselves and on the class membership. However, we focus on a specific MNAR model, called MNARz, for which the missingness only depends on the class membership. We first underline its ease of estimation, by showing that the statistical inference can be carried out on the data matrix concatenated with the missing mask considering finally a standard MAR mechanism. Consequently, we propose to perform clustering using the Expectation Maximization algorithm, specially developed for this simplified reinterpretation. Finally, we assess the numerical performances of the proposed methods on synthetic data and on the real medical registry TraumaBase as well.
... The exclusion of those firms that did not appear on the list may have resulted in sample selection bias and endogeneity [76]. Hence, the research introduced the Heckman two-stage statistics method to address this problem [77]. The first step incorporated some control variables associated with dependent variables for selection with a probit model. ...
... This approach has a number of advantages over existing methods for incomplete datasets. Sam-ple selection models consider regressions with samples where the dependent variable is sometimes missing, and obtain point identification by modeling the selection process (Heckman, 1979;Das et al., 2003). These models require the data include a variable changing the probability of observation but not the dependent variable. ...
Preprint
Full-text available
Missing data is pervasive in econometric applications, and rarely is it plausible that the data are missing (completely) at random. This paper proposes a methodology for studying the robustness of results drawn from incomplete datasets. Selection is measured as the squared Hellinger divergence between the distributions of complete and incomplete observations, which has a natural interpretation. The breakdown point is defined as the minimal amount of selection needed to overturn a given result. Reporting point estimates and lower confidence intervals of the breakdown point is a simple, concise way to communicate the robustness of a result. An estimator of the breakdown point of a result drawn from a generalized method of moments model is proposed and shown root-n consistent and asymptotically normal under mild assumptions. Lower confidence intervals of the breakdown point are simple to construct. The paper concludes with a simulation study illustrating the finite sample performance of the estimators in several common models.
... Para llevarlo a cabo, haremos uso de la metodología de descomposición de la brecha salarial denominada Oaxaca-Blinder (OB), la cual fue desarrollada por Oaxaca (1973) y Blinder (1973 a partir de un procedimiento de regresión por Mínimos Cuadrados Ordinarios. Si bien, como lo señalan Fortin, Lemieux y Firpo (2011), esta metodología ha desarrollado avances para considerar problemas como el sesgo de selección en el mercado laboral que pueden presentarse en el caso de las mujeres, o diferencias de la brecha a partir de la distribución, nosotros usamos la medición OB porque la muestra es sólo de trabajadores, por lo que no es necesario considerar alguna corrección de selección como la desarrollada por Heckman (1979). ...
Chapter
Full-text available
La ocupación de jornalero agrícola ha sido históricamente relacionada con condiciones laborales precarias, incertidumbre laboral y pobreza. En México existe una amplia bibliografía que da cuenta de estas condiciones en diversas regiones, considerando también la diferenciación por tipos de cultivo (básicos y comerciales), las diferencias en las condiciones por género y adscripción étnica, sin embargo, en su mayoría se tratan de abordajes específicos y sin representatividad nacional.
... In contrast to the graphical methods, the sensitivity analysis method models the mechanism of selective publication using the selection function and would give us a more insightful interpretation of PB. Copas (1999) and Shi (2000, 2001) first introduced a selection model based on the Heckman model (Heckman, 1976(Heckman, , 1979 and assumed whether a study would be published or not was determined by a latent Gaussian random variable. We refer to the sensitivity analysis method proposed by Copas and Shi (2001) as the Copas-Heckman selection model. ...
Preprint
Full-text available
Publication bias (PB) poses a significant threat to meta-analysis, as studies yielding notable results are more likely to be published in scientific journals. Sensitivity analysis provides a flexible method to address PB and to examine the impact of unpublished studies. A selection model based on t-statistics to sensitivity analysis is proposed by Copas. This t-statistics selection model is interpretable and enables the modeling of biased publication sampling across studies, as indicated by the asymmetry in the funnel-plot. In meta-analysis of diagnostic studies, the summary receiver operating characteristic curve is an essential tool for synthesizing the bivariate outcomes of sensitivity and specificity reported by individual studies. Previous studies address PB upon the bivariate normal model but these methods rely on the normal approximation for the empirical logit-transformed sensitivity and specificity, which is not suitable for sparse data scenarios. Compared to the bivariate normal model, the bivariate binomial model which replaces the normal approximation in the within-study model with the exact within-study model has better finite sample properties. In this study, we applied the Copas t-statistics selection model to the meta-analysis of diagnostic studies using the bivariate binomial model. To our knowledge, this is the first study to apply the Copas t-statistics selection model to the bivariate binomial model. We have evaluated our proposed method through several real-world meta-analyses of diagnostic studies and simulation studies.
... Researchers increasingly express concern about endogeneity bias in SLG diversity research (Yang et al., 2019). To address potential bias due to self-selection, we followed J o u r n a l P r e -p r o o f recommendations by Certo et al. (2016) to analyze the data using Heckman's two-stage selfselection model (Heckman, 1979). In the first stage of the method, we used a probit model to identify the selection of observations for the second stage. ...
Article
Full-text available
The demographic composition of a firm’s Board of Directors (BoD) and Top Management Team (TMT) has important consequences for organizational processes and outcomes. However, researchers have focused on the independent effects of diversity in these strategic leadership groups (SLGs), foregoing how it affects their interactions. We adopt a strategic leadership system perspective to account for tasks that a firm’s BoD and TMT perform independently, as well as shared tasks performed at their interface. Focusing on the innovation process as a context for strategic decision-making and implementation, we hypothesize inverted u-shaped associations for independent effects of BoD and TMT gender compositions on innovation inputs and TMT gender composition on outcomes. To account for interactions at their interface, we also propose moderating effects between BoD and TMT gender compositions on their relationships with innovation input and outcomes. We find support for our hypotheses within a panel of highly innovative U.S. firms between 2005 and 2018. These findings have important implications for strategic leadership and diversity researchers and may provide guidance on balancing the gender composition of SLGs at firms that pursue innovation.
... Per stimare la penalizzazione etnica sul reddito, sono impiegati modelli di Heckman, separati per genere e regione, che consentono di tenere conto della diversa selezione di immigrati e nativi nell'occupazione dipendente (Heckman 1976;1979) 8 . Questo metodo è particolarmente adatto, considerando che il reddito può essere osservato soltanto tra coloro che lavorano (in un'occupazione dipendente, nel caso dei dati RFL). ...
Article
Full-text available
Il modello di inclusione dei migranti nei mercati del lavoro in Italia è caratterizzato da un compromesso, o trade-off, tra una bassa penalizzazione etnica nella possibilità di essere occupati e un’elevata penalizzazione per quanto riguarda la qualità del lavoro. Utilizzando i dati dell’Indagine sulle Forze di lavoro italiane (2009-2020), questo studio descrive il modello di inclusione dei lavoratori stranieri a livello regionale, evidenziando le variazioni nei percorsi occupazionali dei lavoratori stranieri.
... Under the sequential decision-making process, the econometric specification preceding equation 2 consists of market participation decision equations and livestock supply equations assumed to be mutually exclusive. In the market participation analysis, most of the studies apply either the sample selection model (Heckman 1979), Tobit's (1958) model, or the Cragg (1971) double-hurdle model. In this study, the mutual exclusivity assumption renders the participation decision as a set of discrete choices, and DH models were found ideal as they allow for a separation between the initial decisions to participate (Y>0 vs Y=0) and the decision of how much quantity, Q given Q>0. ...
Article
This paper aims to investigate the role of transaction costs in determining market participation of smallholder livestock farmers in the southern rangelands of Kenya. A double-hurdle model was used to establish whether or not a household participated in cattle and small ruminants markets, and how much they sold conditional upon having decided to be market participants. Secondary data from the Agricultural Sectoral Development Support Program belonging to 1512 households spread across 10 pastoral and agro-pastoral counties was used in estimating the model. The transaction costs that influence the level of market participation include ownership of transport facility, access to veterinary services, distance to the livestock market, pasture land size, and size of the tropical livestock units. Those that hindered market participation included access to off-farm income while those who have large pasture lands and tropical livestock units, and access to veterinary services, motorcycles, or radio are more likely to participate. Policy measures, such as policies dealing with land reform and extension services are necessary while others require indirect intervention and private sector involvement such as road networks, market availability, and macro-credit facilities.
... Besides these considerations, we should remark that this variable was chosen as a suitable proxy that condensed "unmeasured ability" (motivation, general characteristics, and skills that determine the decision to register, or other issues such as disability), the omission of which seriously plagues statistical models. In fact, the omission of a covariate that affects both the outcome and the included covariates resulted in an omitted effect that was absorbed by the error term, leading to the classical problem of omitted ability bias (Heckman 1979), meaning that the model attributes the effect of the missing variables to those that were included. Hence, we specify PES for this reason. ...
Article
Full-text available
In line with the existing literature, the primary focus of the present paper is on understanding the multifaceted factors contributing to unequal employment opportunities for women and the potential implications for both individuals and society. Specifically, the objective is to identify meaningful risk factors that affect the probability of being employed for women in the 20–49 age group, exploring possible demographic, educational, social, and family factors, as well as territorial context factors. The analysis is conducted on the three most populous European countries (Italy, France, and Germany) as representatives of different welfare regimes. The analysis exploits the rich information available in the micro-data of the Labour Force Survey (2021) as well as Eurostat regional statistics considering individuals nested in regions (NUTS 2). A deep analysis of empirical findings sheds light on employment determinants and motivations for not working, which appear to be essentially related to family and demographic factors. These results reveal the country-specific profiles that indicate greater risk of non-employment and also provide a basis for suggesting different policy implications.
... 57 Subsequently, these dimensions were quantified with a score between 0 and 100, according to the percentage of a text that corresponds to each dimension (e.g., cognitive language ranges in our database from 0 to 14.89). In a two-step procedure, 58 we first predict (1) the probability of a future attack from a terror organization. In the second step, we further break down the dependent variable to predict (2) the days until the next attack from that organization and (3) the likelihood of a novel attack in the future from that terror organization. ...
Article
Full-text available
In this article, we argue that the process of predicting terrorist attacks needs to integrate the evolving dynamic of terrorism and we make a case for novelty as crucial feature to encompass terrorism's changing nature. To predict when and how terrorist organizations will conduct their next attack, and whether it will have a novel approach, we base our analysis on media coverage. As media continuously covers political, economic, and societal analyses on a national and international scale, it provides rich information that can fuel early-warning systems for terror attacks. We analyze the content of 2,173,544 newspaper articles, reporting on 42,252 terror attacks by 1,121 organizations. Our analyses show that content of media coverage relates to the interval until the following attack from the same terror organization as well as whether they will conduct a novel and even more devastating terror attack. Hence, our approach and findings can contribute to building early-warning systems.
... Model seleksi sampel pertama kali dikenalkan oleh Heckman (1979). Model pemilihan sampel terdiri dari dua persamaan yaitu persamaan seleksi dan persamaan hasil. ...
Article
Full-text available
The linear regression model is a statistical tool used to model the causal relationship of a dependent variable based on one or several independent or explanatory variables. In scenarios where the dependent variable is a censored variable and there is potential to exist sample selection, the sample selection model can be an alternative in analyzing this relationship. In the Heckman sample selection model, independent variables have the possibility of having an endogeneity effect, where they should be treated as endogenous variables in both the outcome equation and the selection equation instead of as exogenous variables. In result, by including endogenous covariates in the Heckman sample selection model, the sample selection model equation will have more than one equation and makes it a simultaneous equation. To estimate simultaneous equations, simple estimation methods such as the maximum likelihood estimator method are no longer appropriate. In this study, we will discuss the estimation of sample selection models with endogenous covariates utilizing the full information maximum estimator (FIML) approach. The sample selection model with endogenous covariates was then applied to the women labor supply data of Tomas Mroz's research and compared with several models. Based on the MSE and SSE values obtained from the linear regression model, Tobit regression model, Heckman sample selection model, and sample selection model with endogenous covariates, it was concluded that the Heckman sample selection model is the best model that fit the dataset since it yields the best results with the smallest MSE and SSE values
... Comprehensive reviews of these methods are available, including Little [1995], Molenberghs and Kenward [2007], and Daniels and Hogan [2008]. These likelihood-based models vary in how they factorize the joint distribution of the outcome and missing data processes, including selection models [Heckman, 1979, Wu and Carroll, 1988a, Diggle and Kenward, 1994, pattern-mixture models [Wu and Bailey, 1989, Little, 1993, shared parameter models [Wu and Carroll, 1988b, De Gruttola and Tu, 1994, Pulkstenis et al., 1998, and mixed effects hybrid models Little, 2009, Ahn et al., 2013]. ...
Preprint
In various biomedical studies, the focus of analysis centers on the magnitudes of data, particularly when algebraic signs are irrelevant or lost. To analyze the magnitude outcomes in repeated measures studies, using models with random effects is essential. This is because random effects can account for individual heterogeneity, enhancing parameter estimation precision. However, there are currently no established regression methods that incorporate random effects and are specifically designed for magnitude outcomes. This article bridges this gap by introducing Bayesian regression modeling approaches for analyzing magnitude data, with a key focus on the incorporation of random effects. Additionally, the proposed method is extended to address multiple causes of informative dropout, commonly encountered in repeated measures studies. To tackle the missing data challenge arising from dropout, a joint modeling strategy is developed, building upon the previously introduced regression techniques. Two numerical simulation studies are conducted to assess the validity of our method. The chosen simulation scenarios aim to resemble the conditions of our motivating study. The results demonstrate that the proposed method for magnitude data exhibits good performance in terms of both estimation accuracy and precision, and the joint models effectively mitigate bias due to missing data. Finally, we apply proposed models to analyze the magnitude data from the motivating study, investigating if sex impacts the magnitude change in diaphragm thickness over time for ICU patients.
... As no issue receives 100% of the investments, the upper-bounded nature of our dependent variable does not raise any concern. As for its lower-bound, stemming from the party's decision not to engage with the focal issue in a given election, we computed the inverse Mills' ratio from the selection equation and then added it as a control variable (Heckman, 1979). We used the (square rooted) percentage of uncoded sentences-that is, the text unrelated to any specific political issue-as the exclusion restriction in the first stage. ...
Article
Full-text available
Our research addresses how organizations manage a shift from a single to a hybrid identity, a question that the identity literature still is grappling with. We address this question by reflecting on how organizations develop hybrid identities in response to institutional decline. Identity hybridization, we predict, takes place in stages via strategies that gradually hybridize the identity. We study how British political parties hybridized their identities in response to the decline of social-class politics over the period 1950-2015. Quantitative and qualitative analyses of the identity projections of three political parties in their election manifestos provide support for our hypotheses.
... Roy (1951) makes it clear that estimating an equation on a selectively obtained sub-sample of the population can lead to bias. Heckman (1979) presents the selection bias arising from a wage equation estimated only on working women, whereas activity behaviour is the result of a trade-off in which the wage the individual can obtain on the labour market plays a role. ...
Article
Full-text available
Background: As of March 2018, the newly created Kribi port has entered into competition with the existing Douala port in Cameroon. The purpose of this paper is to assess the induced effects generated by this new port on Douala Port customs activities over the 2019-2021 three-year period. Setting: Annual data spanning 2007-2018 was used. Aims: This research develops the concept of compensatory residual effect in order to fix the matter of biases due to exogenous time-related factors, and apply it to the assessment of induced effects generated by the opening of the said new port. Methods: The paper applies the double exponential smoothing method for estimating counterfactual values and then combines the with-without method with the concept of compensatory residual effect. Results: The commissioning of the Kribi Port has induced a transfer of customs performance from the Douala Port to Kribi Port estimated at 13,5, 92,4 and 174,7 billion FCFA of customs revenue in 2019, 2020 and 2021 respectively, representing 18%, 75% and 88% of Kribi Port’s activities. Conclusion: Based on these findings, the document recommends an in-depth diagnostic study of the determinants of the two ports' attractiveness, to improve port competitiveness to optimize economic performance at the national level. Contribution: As developed in this paper, the compensatory residual effect significantly contributes to the ex-post evaluation of port competitiveness. It can be applied to assessment studies within organizations such as banks, provided the assumptions made in this paper.
... The Heckman-two stage model is widely adopted to correct for such self-selection bias through accounting for differences in farmers' tendency for adoption (Bezu and Holden, 2008). Specifically, Heckman (1979) introduced a two-stage model with two separate regressions. The first stage estimates the probability of adoption by using a probit model on our binary adoption variable -that is one if the household adopted a CIMMYT variety on the main plot and zero otherwise. ...
... The second group of variables measures different dimensions related to technology and automation, and the third describes the basic firms' demography. To avoid any selectivity bias, the authors ran parallel regression models utilizing Heckman selection models (Heckman, 1979) using a probit link function (Agresti, 2015), and the results of the models were very similar to the original regression models which are presented in this paper. This has given the authors of this paper further confidence in the robustness of the models in addition to the standard test reported in the Analysis and Results Section. ...
Article
Full-text available
This paper assesses empirically the COVID-19 effect on businesses and the potential dynamic changes regarding post-COVID-19 automation and technology penetration using various logistic regression models. A field survey was used to collect the necessary data for testing various hypotheses. This study demonstrates the severity of the pandemic on businesses and how it has changed their perspectives on technology as a critical aspect of survival and future success. The results showed that capital-intensive firms are more resilient to the crisis. In addition, the firms that were affected severely in terms of employment due to the pandemic believe that technology will significantly impact hiring, investment, and value added. This paper investigates a unique phenomenon represented by COVID-19, its impact on businesses in a resource-rich context and their responsiveness concerning technology deployment and automation.
... These characteristics of the 2PM make it different, for instance, from the sample selection model proposed byHeckman ( 1979). ...
Chapter
Full-text available
Despite growing numbers of cases around the globe, the literature on judicial decision-making in corruption prosecutions remains underdeveloped. Why do judges convict some defendants in corruption cases and not others? Why do judges apply harsher sentences to some defendants than others? In sum, what explains variation in the severity with which the judicial system judges public figures and private citizens accused of corruption? This chapter draws on the experience of Brazil’s Operation Car Wash ( Lava Jato , in Portuguese) to begin to address these questions.
... However, there are two categories of farm households in the sample -(1) Investors on productive assets (64.5 per cent) and (2) non-investors (35.51 per cent). In such situations, sample selection bias could arise and as a result of which standard regression equation yield biased results (Heckman, 1976 and1979). Therefore, Heckman selection model has been used in this study since it assumes that there is an underlying regression relationship between the extent of investment (regression equation) and investment decision of the farmer (selection equation). ...
... Following the suggestion of Wooldridge (1995), I assume that θ i depends solely on the time-averaged values of z it and specify the correlation between z it and θ i following Mundlak (1978)'s approach. The error terms v it and u it follow the given distribution: where The econometric model presented above is estimated using the two-step estimation method developed by Heckman (1979). The first step is to estimate Eq. (2) as a conventional random-effect probit model. ...
Article
Full-text available
This study analyzes the mechanisms by which short-time work (STW) schemes affect firm’s employment adjustments, using establishment-level data during the Great Recession from Japan. The findings show that STW leads to a decrease in both hiring and separations, with no significant positive effect on net employment. The observed curtailed hiring can be explained within the context of how STW promotes labor hoarding. STW encourages firms to maintain redundant employment by subsidizing the costs of labor hoarding. The excess labor surplus in firms adopting STW diminishes the incentive to recruit new workers in anticipation of the recovery period. Furthermore, as firms generally lack the motivation to hoard marginal workers, STW may exacerbate job security disparities between regular and marginal workers. These findings have important implications for policy evaluation, emphasizing the need for a comprehensive understanding of the potential adverse consequences of STW on labor market entrants and marginal workers.
... We confirm many of these findings in Section 4.3, where we find that income, religious attendance, party ID, and age are all predictive of response reluctance. Our approach also shares some similarities with Brehm (1993) who applies methods proposed by Heckman (1979) and Achen (1986) to adjust for nonresponse in regression models using data from the American National Election Studies and other sources. Finally, we note that across studies, the average response rate for panelists recruited to a given survey is 34%. ...
Article
Full-text available
Survey experiments on probability samples are a popular method for investigating population-level causal questions due to their strong internal validity. However, lower survey response rates and an increased reliance on online convenience samples raise questions about the generalizability of survey experiments. We examine this concern using data from a collection of 50 survey experiments which represent a wide range of social science studies. Recruitment for these studies employed a unique double sampling strategy that first obtains a sample of “eager” respondents and then employs much more aggressive recruitment methods with the goal of adding “reluctant” respondents to the sample in a second sampling wave. This approach substantially increases the number of reluctant respondents who participate and also allows for straightforward categorization of eager and reluctant survey respondents within each sample. We find no evidence that treatment effects for eager and reluctant respondents differ substantially. Within demographic categories often used for weighting surveys, there is also little evidence of response heterogeneity between eager and reluctant respondents. Our results suggest that social science findings based on survey experiments, even in the modern era of very low response rates, provide reasonable estimates of population average treatment effects among a deeper pool of survey respondents in a wide range of settings.
... Consistent with this reasoning, one might argue that somehow, firms with a lower likelihood of facing class-action lawsuits end up hiring foreign-sounding CEOs. To resolve this issue, in table 13, we run our main test (equation 1) using the Heckman 2 stage analysis (Heckman, 1979). In table 13 panel A, the first stage models selection by predicting (using a probit model), whether CEO is in one set of firms based on the following variables: high litigation industry indicator, lag volatility, lag leverage, indicator for loss-making firm, CEO age, CEO gender and industry as independent variables. ...
Article
Full-text available
In this paper, we investigate if the perceived name-based ethnicity of CEOs has any relationship with the likelihood of shareholder class-action lawsuits. Using machine learning algorithms on CEO names, we develop an objective proxy of name-based ethnicity and find that firms managed by ‘foreign-sounding’ CEOs exhibit a lower likelihood of class-action lawsuits. Our results are robust to matched sample analysis, Heckman two-stage selection, alternate model specifications as well as use of an alternate proxy. We further find that succession of a foreign-sounding CEO by a non-foreign-sounding CEO increases the likelihood of class-action lawsuits. Our paper has important implications for firms, especially in high litigation industries or high litigation situations.
... often depends on some attributes of the covariates [11,12]. The selection bias in observational data is manifested by the presence of confounders [13], i.e., covariates that affect both the outcome and treatment, resulting in different distributions of covariates between the treated and the control groups [6,14]. Therefore, how to mitigate the impact of selection bias on counterfactual prediction is a challenging problem in TEE. ...
Article
Full-text available
Treatment effect estimation (TEE) is widely adopted in various domains such as machine learning, dvertising and marketing, and medicine. During the TEE, there normally exist selection bias on counterfactual prediction, which results in different distributions of covariates between the treated and control groups. One important challenge in TEE is to mitigate the impact of selection bias, which has attracted a lot of research in recent years. To address this challenge, existing neural network-based methods generally aim to minimize the distribution differences using integral probability metrics. However, minimizing the distribution differences may inadvertently remove outcome-related information during the balancing procedure, which has negative impact on the accuracy of TEE. In this paper, we propose a novel self-supervised learning approach to conduct TEE. Rather than minimizing the distribution differences, we first introduce the concept of virtual samples which have identical covariates as observed samples but with different treatments. In this way, we aim to simulate the scenario where each sample receives both treatment and control. Next, we propose a self-supervised domain embedding learning (SDEL) approach to conduct TEE. In SDEL, we propose to learn both treated and control embeddings for observed and virtual samples, thereby learning the effects of different treatments. To the best of our knowledge, we are the first to introduce the concept of virtual samples and the first to conduct embedding learning in TEE. Building upon SDEL, we propose a feature extraction counterfactual regression network (FE-CFR), in which we propose a feature extraction module (FEM) to estimate the importance of different covariates. Compared with existing TEE methods, our proposed self-supervised learning approach to could improve the accuracy of TEE. Extensive experiments have been conducted on benchmark datasets for TEE, and the results demonstrate that our proposed approach outperforms the compared baseline approaches.
... Then we employed the Heckman two-stage procedure [113]. To account for the probability that firms tend to have the manufacturing industry, criteria for sample selection may cause bias. ...
Article
Full-text available
Demand-supply mismatch is considered one of the most important factors that drives firm risk. Nevertheless, the difficulties in quantifying firms’ capability and ambition to deal with demand-supply mismatch drive external investors to seek valuable signals to guide investment decisions. Drawing on signaling theory, we identified supply chain board members (SCBMs) – the presence of directors of customer and supplier organizations in a focal firm’s boardroom as an effective approach to attenuate demand-supply mismatch, and accordingly indicate lower firm risk. Through a panel dataset of 1681 manufacturing firms listed in the North American market from 2010 to 2020, we empirically analyzed the effects of supply chain board members on firm idiosyncratic volatility. We found that firms with supply chain members in their boardrooms are accompanied by lower idiosyncratic volatility than those without SCBMs. We further discussed the effects of supply chain network architectures as critical signaling environments on SCBMs. The regression results indicated that the signal of SCBMs is strengthened under high eigenvector centrality, but weakened by a high level of structural holes. This study extends conventional risk evaluation literature by theorizing one type of inter-organizational relationship, SCBMs, as an effective signal of collaborative intention and commitment, which proxies a focal firm’s capability and ambition to build collective strength with external investors and stakeholders to lower firm risk. Our findings are robust under several additional tests, e.g., propensity scores matching, instrument variable regression, and Heckman’s two-stage regression.
... For these households, the expenditures expressed in dollars and the quantities purchased expressed in ounces (with standardization) are zero. To deal with this censoring issue, we rely on the sample selection model of Heckman (1976Heckman ( , 1979. Consistent with the first stage of Heckman two-step estimation procedure, we employ a probit model or selection equation (choice model) to develop profiles of U.S. households who purchase Greek or non-Greek yogurt. ...
Article
Full-text available
Using the Heckman framework, we develop profiles of households who purchase Greek yogurt and non-Greek yogurt and estimate own-price, cross-price, and income elasticities of demand. Attention is centered on the impacts of age, race, education, and ethnicity of the household head, household income, household size, region, the presence of children, and prices of Greek yogurt and non-Greek yogurt. This analysis rests on data acquired from Nielsen pertaining to 164,484 households over calendar years 2018–2020. Own-price elasticities are estimated to be −1.36 for Greek yogurt and −0.70 for non-Greek yogurt. Additionally, these yogurt products are not only substitutes but also necessities.
... However, Yotov et al. (2016) note that one major setback of the traditional log-linear approach is that it does not account for zero trade flows values since observations with zero trade flows values are simply dropped from the estimation sample when we take the log-linear version of the trade values, resulting in sample selection bias. When the trade values are transformed into a logarithmic form, zero trade flow values are dropped from the sample, since the logarithm of zero is unspecified, giving rise to missing data points and sample selection bias (Heckman, 1979). In the context of this study, there are some reasons why frequent zero trade flows may occur in coffee exports for Cameroon. ...
Article
Full-text available
Africa’s destiny hinges on accessing the lucrative markets in the northern hemisphere. However, non-tariff measures in particular sanitary and phytosanitary (SPS) measures have become a prominent tool in the regulation of international trade in agricultural and food products and have been increasingly recognized as one of the major determinants of market access, particularly to the European Union. However, there is very limited empirical evidence on the impact of sanitary and phytosanitary measures on Africa and Cameroon’s agricultural exports. This paper contributes to the literature by investigating the impact of changes in sanitary and phytosanitary measures in importing countries on coffee exports from Cameroon at the 6-digit HS level, using the gravity model and the modified Poisson pseudo-maximum likelihood estimators. The analysis is based on trade data between Cameroon and 10 major importing countries in the Organization for Economic Cooperation and Development (OECD) between 2001 and 2020. These results suggest that coffee export from Cameroon is not significantly affected or influenced by sanitary and phytosanitary measures in major importing markets; that is, the standards had weak trade effects on coffee exports. Other factors such as income, language, and labor size were significant in influencing trade flows in the export commodities. These results further point to the low productive capacity of the country’s coffee sub-sector. The supply-side constraints in the coffee sub-sector can be addressed by the government by improving access to high-yield coffee varieties by farmers, educating and training farmers on good agricultural practices through agricultural extension programs, and upgrading the market infrastructure.
... To estimate the labor income returns to health, we used a Mincerian model [20], with an additional correction for the selection bias in terms of labor participation on which labor income depends, using the approach proposed by Heckman [21]. The model estimates the changes in labor income associated with changes in the health indicator, in this case, height. ...
Article
Full-text available
Investment in health has been proposed as a mechanism to promote upward social mobility. Previous analyses have reported inconsistent estimates of the returns to investment in health in Mexico based on different models for different years. We aim to estimate returns for Mexico using data from four time points Adult height and labor income are drawn from the periodical national health and nutrition surveys–a group of relatively standardized surveys—that are representative of individuals living in the country in 2000, 2006, 2012 & 2018. These surveys collect anthropometric measurements and information on individuals’ labor income. We estimated Mincerian models separately for men and women using OLS, Heckman, instrumental variables, and Heckman with instrumental variables models. Our results indicate significant and positive returns to health for the four surveys, similar in magnitude across years for women and with variations for men. By 2018, returns to health were about 7.4% per additional centimeter in height for females and 9.3% for males. Investments in health and nutrition during childhood and adolescence that increase health capital–measured as adult height—may promote social mobility in Mexico and similar countries to the extent that these investments differentially increase health capital among the poor.
Article
Full-text available
While it is assumed that trade unions may influence the gender wage gap, evidence is scarce on this issue. This study investigates the issue in China using national longitudinal survey data from 2010 to 2020. The results reveal that the union wage premium is greater for women than for men. Furthermore, the union wage premium is more beneficial for women in the public sector compared to the private sector. The gender disparity in endowment return effect among non-union members is the primary factor contributing to the formation of the gender wage gap in both public and private sectors, with the effect being more pronounced in the public sector. Additionally, the gender disparity in unionism reduces the gender wage gap in the public sector while widening the wage gap in the private sector.
Article
Full-text available
This study concentrated on a business report that typically reveals a company’s non-financial information, aiming to uncover its strategic direction. Using text-mining techniques, the research extracted and analyzed the report’s overview sections, identifying key strategic themes categorized into the financial, customer, learning and growth, and internal process perspectives. The empirical analysis applied a two-stage model to assess how shifts in company strategies affect profitability, stability, and growth. This research provided insights into the management strategies and financial metrics within the information security sector, examining how strategic priorities shape financial health. The findings were as follows. Firstly, companies emphasizing financial strategies in their reports tended to exhibit higher profitability. Secondly, those focusing on customer-oriented strategies also reported greater profitability. Thirdly, companies prioritizing internal processes demonstrated increased organizational stability. Fourthly, an emphasis on learning and growth strategies was associated with lower stability but higher growth potential. This paper contributes to the field by offering a method to quantitatively analyze qualitative textual data, providing a more precise approach to understanding management strategies through direct content analysis of business reports. It also highlights the specific financial and strategic characteristics of information security firms, a relatively under-researched area, thereby offering valuable guidance for these companies in terms of strategic planning.
Article
Full-text available
En prenant appui sur les données de l’Enquête de Transition Vers la Vie Active (ETVA) au Congo et à l’aide d’une régression en deux étapes, intégrant l’endogénéité de l’éducation et le biais de sélection, ce papier analyse les déterminants de la satisfaction au travail dans un contexte d’emplois précaires et informels. Les résultats indiquent que la satisfaction est déterminée non seulement par les facteurs économiques classiques (salaire, heures de travail, etc.), mais aussi par le mode d’accès à l’emploi. Ces résultats orientent les implications de politiques économiques vers deux directions : d’une part, des mesures sur les conditions d’un emploi décent, et d’autre part, des directives aux agences publiques en charge de l’emploi pour une meilleure affectation des compétences. La principale limite de cette recherche porte sur la base de données. Certaines variables explicatives de cette base ne sont pas suffisamment renseignées pour les salariés du secteur informel et pour les travailleurs indépendants, il n’a pas été possible de faire une analyse comparative entre ces deux catégories de travailleurs et les salariés du secteur formel.
Article
Background Latin American and Caribbean countries are dealing with the combined challenges of pandemic-induced socicoeconomic stress and increasing public debt, potentially leading to reductions in welfare and health-care services, including primary care. We aimed to evaluate the impact of primary health-care coverage on child mortality in Latin America over the past two decades and to forecast the potential effects of primary health-care mitigation during the current economic crisis. Methods This multicountry study integrated retrospective impact evaluations in Brazil, Colombia, Ecuador, and Mexico from 2000 to 2019 with forecasting models covering up to 2030. We estimated the impact of coverage of primary health care on mortality rates in children younger than 5 years (hereafter referred to as under-5 mortality) across different age groups and causes of death, adjusting for all relevant demographic, socioeconomic, and health-care factors, with fixed-effects multivariable negative binomial models in 5647 municipalities with an adequate quality of vital statistics. We also performed several sensitivity and triangulation analyses. We integrated previous longitudinal datasets with validated dynamic microsimulation models and projected trends in under-5 mortality rates under alternative policy response scenarios until 2030. Findings High primary health-care coverage was associated with substantial reductions in post-neonatal mortality rates (rate ratio [RR] 0·72, 95% CI 0·71–0·74), toddler (ie, aged between 1 year and <5 years) mortality rates (0·75, 0·73–0·76), and under-5 mortality rates (0·81, 0·80–0·82), preventing 305 890 (95% CI 251 826–360 517) deaths of children younger than 5 years over the period 2000–19. High primary health-care coverage was also associated with lower under-5 mortality rates from nutritional deficiencies (RR 0·55, 95% CI 0·52–0·58), anaemia (0·64, 0·57–0·72), vaccine-preventable and vaccine-sensitive conditions (0·70, 0·68–0·72), and infectious gastroenteritis (0·78, 0·73–0·84). Considering a scenario of moderate economic crisis, a mitigation response strategy implemented in the period 2020–30 that increases primary health-care coverage could reduce the under-5 mortality rate by up to 23% (RR 0·77, 95% CI 0·72–0·84) when compared with a fiscal austerity response, and this strategy would avoid 142 285 (95% CI 120 217–164 378) child deaths by 2030 in Brazil, Colombia, Ecuador, and Mexico. Interpretation The improvement in primary health-care coverage in Brazil, Colombia, Ecuador, and Mexico over the past two decades has substantially contributed to improving child survival. Expansion of primary health-care coverage should be considered an effective strategy to mitigate the health effects of the current economic crisis and to achieve Sustainable Development Goals related to child health. Funding UK Medical Research Council. Translations For the Spanish and Portuguese translations of the abstract see Supplementary Materials section.
Article
Full-text available
The major objective of this study is to estimate the earnings losses by adult men (18 to 65 years old) due to the lack of adequate health conditions, based on Luft (1974) and Haveman (1995). Data from the national nutritional and health survey (PNSN) from 1989 were used. The estimates of the productivity losses are obtained through the estimation of labor market participation and earnings equations, correcting for possible sample selectivity bias. The losses of individual's earnings, for those in the labor market, were higher than for those out of it. In addition, estimates show that southeastern and northeastern regions and the urban sector present the highest earnings losses. The results permit to conclude that labor earnings losses, due to adverse health conditions, are high in Brazil, estimated in 1.6 billion dollars per year, or 258 dollars per adult man per year.
Preprint
This paper addresses the sample selection problem in panel dyadic regression analysis. Dyadic data often include many zeros in the main outcomes due to the underlying network formation process. This not only contaminates popular estimators used in practice but also complicates the inference due to the dyadic dependence structure. We extend Kyriazidou (1997)'s approach to dyadic data and characterize the asymptotic distribution of our proposed estimator. The convergence rates are $\sqrt{n}$ or $\sqrt{n^{2}h_{n}}$, depending on the degeneracy of the H\'{a}jek projection part of the estimator, where $n$ is the number of nodes and $h_{n}$ is a bandwidth. We propose a bias-corrected confidence interval and a variance estimator that adapts to the degeneracy. A Monte Carlo simulation shows the good finite performance of our estimator and highlights the importance of bias correction in both asymptotic regimes when the fraction of zeros in outcomes varies. We illustrate our procedure using data from Moretti and Wilson (2017)'s paper on migration.
ResearchGate has not been able to resolve any references for this publication.