Disaggregates In Depth
What are Disaggregates?
Disaggregating data can help decision makers assess disparities, expose hidden trends, and make informed decisions that lead to more equitable outcomes. In the E-W Framework, “disaggregates” refer to background or contextual characteristics of individuals and systems by which data should be examined to analyze disparities, monitor progress, and guide action. The framework recommends that education-to-workforce (E-W) systems collect or link data on 25 disaggregates such as race and ethnicity, age group, disability status, and parental education level.
Why Disaggregate Data?
It is important to disaggregate data by both individual and system characteristics to identify, expose, and act on the structural inequities that cause different outcomes across groups. For example, disaggregating data by K-12 school type (such as whether a school is a public, charter, or private school) can help illuminate the extent to which different types of schools are serving students well. Additionally, some disaggregates will be more or less relevant in different contexts. For example, although all pre-K-to-workforce sectors should disaggregate data by background characteristics such as race and ethnicity and income level, postsecondary systems should also consider disaggregating data by factors specific to the postsecondary sector, such as students’ enrollment intensity (that is, whether they attend part-time or full-time) and field of study.
|
Avoiding disaggregation pitfalls
Data disaggregation presents two potential pitfalls: increased risk of identifying individuals and the possibility of being fooled by randomness.
Potential Pitfall 1: Identification of individuals in the data
Disaggregating data by multiple demographic characteristics (such as race and ability) within a small population (such as a school) could result in very small group sizes. For example, a school might have only three Black students with disabilities. In such cases, people with knowledge about the school could potentially identify specific students, compromising their privacy. To avoid that sort of breach, agencies might set a policy to disaggregate results only for groups with a minimum number of students (perhaps 10) or take other precautions to ensure anonymity of data on individuals (such as applying confidentiality edits). See Data Equity Principle #2 and Data Equity Principle #3 for additional guidance and resources on protecting privacy when disaggregating data.
Potential Pitfall 2: Being fooled by randomness
The second potential pitfall of data disaggregation is the possibility of being fooled by randomness. Human brains tend to perceive patterns even where none exist. When examining data from numerous disaggregated groups, we risk identifying apparent trends that are actually statistical flukes. The problem is particularly severe for small groups because reliability declines as the group size declines. For example, the math proficiency rate for a group of 10 students might be very unstable from one year to the next. Testing the statistical significance of trends or differences does not solve this problem, because we would expect one of every 20 differences examined to be significant by chance alone.
Solutions
Fortunately, there is a solution. Bayesian statistical methods can stabilize performance measures, increasing their reliability, especially for small groups. Also known as methods for small-area estimation, Bayesian stabilization borrows information from other groups and/or the historical performance of the relevant group, pulling extreme values toward averages, producing results that are more reliable and accurate. Research by Mathematica at the Mid-Atlantic Regional Educational Laboratory has shown that Bayesian stabilization can produce reliable results for groups as small as 10 students.
Bayesian stabilization can help with privacy concerns, too. Stabilizing group-level results removes the direct connection between the reported result and individuals in the group. For example, for a group of 10 students, the unadjusted percentage achieving proficiency in math must be a number ending in zero (10 percent, 20 percent, and so on). But the stabilized estimate for the group as a whole is not so constrained, because it does not depend on a proficiency adjustment for each individual student. A stabilized group-level estimate therefore cannot identify any individual as proficient or not. Alternative privacy-enhancing statistical techniques for small sample sizes can include data suppression (removing information for small subgroups), data masking or perturbation (introducing uncertainty into the dataset), or blurring (reducing data precision).2
Most state and local education agencies lack the statistical know-how to conduct Bayesian stabilization, but help is on the way. By early 2025, the Institute of Education Sciences expects to launch a publicly available tool that will enable any agency to stabilize its results, thereby improving the accuracy of data on small and intersectional subgroups.