  1. Variable Selection and Parameter Tuning for BART Modeling in the Fragile Families Challenge

    Our goal for the Fragile Families Challenge was to develop a hands-off approach that could be applied in many settings to identify relationships that theory-based models might miss. Data processing was our first and most time-consuming task, particularly handling missing values. Our second task was to reduce the number of variables for modeling, and we compared several techniques for variable selection: least absolute selection and shrinkage operator, regression with a horseshoe prior, Bayesian generalized linear models, and Bayesian additive regression trees (BART).
  2. Imputing Data for the Fragile Families Challenge: Identifying Similar Survey Questions with Semiautomated Methods

    The Fragile Families Challenge charged participants to predict six outcomes for 4,242 children and their families interviewed in the Fragile Families and Child Wellbeing Study. These outcome variables are grade point average, grit, material hardship, eviction, layoff and job training. The data set provided contained longitudinal survey and observational data collected on families and their children from birth to age 9. The authors used these data to create models to make predictions at age 15.
  3. Predicting GPA at Age 15 in the Fragile Families and Child Wellbeing Study

    In this paper, we describe in detail the different approaches we used to predict the GPA of children at the age of 15 in the context of the Fragile Families Challenge. Our best prediction improved about 18 percent in terms of mean squared error over a naive baseline prediction and performed less than 5 percent worse than the best prediction in the Fragile Families Challenge. After discussing the different predictions we made, we also discuss the predictors that tend to be robustly associated with GPA. One remarkable predictor is related to teacher observations at the age of nine.
  4. Friend Request Pending: A Comparative Assessment of Engineering- and Social Science–Inspired Approaches to Analyzing Complex Birth Cohort Survey Data

    The Fragile Families Challenge is a mass collaboration social science data challenge whose aim is to learn how various early childhood variables predict the long-term outcomes of children. The author describes a two-step approach to the Fragile Families Challenge. In step 1, a variety of fully automated approaches are used to predict child academic achievement. In total 124 models are fit, which involve most possible combinations of eight model types, two imputation strategies, two standardization approaches, and two automatic variable selection techniques using two different thresholds.
  5. Data-Specific Functions: A Comment on Kindel et al.

    In this issue, Kindel et al. describe a new approach to managing survey data in service of the Fragile Families Challenge, which they call “treating metadata as data.” Although the approach they present is a good first step, a more ambitious proposal could improve survey data analysis even more substantially. The author recommends that data collection efforts distribute an open-source set of tools for working with a particular data set the author calls data-specific functions.
  6. Successes and Struggles with Computational Reproducibility: Lessons from the Fragile Families Challenge

    Reproducibility is fundamental to science, and an important component of reproducibility is computational reproducibility: the ability of a researcher to recreate the results of a published study using the original author’s raw data and code. Although most people agree that computational reproducibility is important, it is still difficult to achieve in practice. In this article, the authors describe their approach to enabling computational reproducibility for the 12 articles in this special issue of Socius about the Fragile Families Challenge.
  7. Winning Models for Grade Point Average, Grit, and Layoff in the Fragile Families Challenge

    In this article, the authors discuss and analyze their approach to the Fragile Families Challenge. The data consisted of more than 12,000 features (covariates) about the children and their parents, schools, and overall environments from birth to age 9.
  8. Humans in the Loop: Incorporating Expert and Crowd-Sourced Knowledge for Predictions Using Survey Data

    Survey data sets are often wider than they are long. This high ratio of variables to observations raises concerns about overfitting during prediction, making informed variable selection important. Recent applications in computer science have sought to incorporate human knowledge into machine-learning methods to address these problems. The authors implement such a “human-in-the-loop” approach in the Fragile Families Challenge. The authors use surveys to elicit knowledge from experts and laypeople about the importance of different variables to different outcomes.
  9. Talking Your Self into It: How and When Accounts Shape Motivation for Action

    Following Mills, several prominent sociologists have encouraged researchers to analyze actors’ motive talk not as data on the subjective desires that move them to pursue particular ends but as post hoc accounts oriented toward justifying actions already undertaken.
  10. Equifinality and Pathways to Environmental Concern: A Fuzzy-Set Analysis

    Studying how people understand and develop concern for environmental problems is a key area of research within environmental sociology. Previous research shows that numerous social factors have measurable effects on environmental concern. However, results tend to be somewhat inconsistent across studies on this topic. One possible explanation for this is because these social factors are typically examined as independent from one another. However, these factors are interrelated in complex ways, as shown by research on the moderating effects of race and political ideology on education.