The Marin Post

The Voice of the Community

Blog Post < Previous | Next >


Marin County Life Expectancy Study runs into Simpson's Paradox


This essay is a follow-up to my earlier one on Blue Zones and it reviews a recent Study on Marin life expectancy.

I am attempting to understand the causes of the difference in life expectancy between ethnic groups in Marin County. The mentioned Study provided no conclusive data on the subject.


Ethnic groups (Asian, Black, Hispanic, White) have very differentiated life expectancy within Marin County. Asians have the longest, next come Hispanics, then Whites, then Blacks. This life expectancy ranking of ethnic groups scales up to not only Marin County, but also for the State of California, and the US as a whole.


Keep in mind, that even though Marin County is deemed the most segregated county within the Bay Area, Blacks in Marin County have a much longer life expectancy than in the vast majority of US counties (on average about 3.5 to 4 years longer than elsewhere). Also, note that Blacks' life expectancy in Marin is now higher than the average White life expectancy in the US. Thus, the mentioned segregation in Marin County does not impair the life expectancy of Blacks compared with just any other county in the US.


In Marin County, the mentioned life expectancy ranking (Asian/Hispanic/White/Black) is out of line with income ranking or other ranking of more encompassing socioeconomic index (White/Asian/Black/Hispanic). Within this framework, Hispanics given their low socioeconomic status way outperform on life expectancy. Meanwhile, Whites way underperform. Hispanics have a much longer life expectancy than Whites even though Hispanics' income and socioeconomic status are way lower than Whites.

The reviewed Study focused on a socioeconomic index to explain life expectancy (the higher the socioeconomic index score, the longer the life expectancy). Overall, they did find a reasonably strong relationship between the two variables (correlation = 0.58). However, this study failed to explain the difference in life expectancy between different ethnic groups.

As reviewed, the ethnic groups’ life expectancy did not match up at all with income. Whites who have far higher socioeconomic status and income than Hispanics, have a much shorter life expectancy than Hispanics. Additionally, Hispanics have a much higher life expectancy than Blacks, even though they have a lower socioeconomic status (check out the section "Comparing socioeconomic conditions for Blacks vs. Hispanics").

The Study’s researchers did not gather or disclose the actual socioeconomic index scores for the different ethnic groups. If they had, they would have uncovered that the data did not support their explanatory narrative that socioeconomic status can differentiate between the four ethnic groups’ life expectancy.

This is a Simpson's paradox situation. There is a positive relationship between socioeconomic status or income vs life expectancy. But, this relationship disappears when you focus on the ethnic group level.


  1. What is the Simpson Paradox?
  2. The Study's conclusion
  3. The Study's supporting data
  4. Life expectancy for different ethnic groups
  5. Do Blacks suffer from segregation in Marin County?
  6. The relationship between life expectancy and income
  7. The health-related causes of life expectancy truncation
  8. Comparing socioeconomic conditions for Blacks vs. Hispanics
  9. What may explain ethnic groups' differences in life expectancy?

What is the Simpson Paradox?

The Simpson's paradox is a very common situation where:

1) Your overall aggregated data demonstrates a linear relationship between two variables in one direction; and

2) When disaggregated into different groups or factors, the data demonstrates linear relationships in the opposite direction or no relationship at all.

In the scatter plot below on the left, there is a positive relationship between strength (X-axis) and sprint time (Y-axis). The stronger a runner, the slower he or she is.

The scatter plot on the right shows that when you disaggregate the runners into two separate groups (probably male/female); then it clearly shows that the stronger the runner, the faster the running time.


As we will see, the mentioned Study runs into Simpson's paradox. At first, it derives a relatively strong positive relationship between a socioeconomic index and life expectancy. However, when we explore the data on a disaggregated level for various ethnicities, this relationship pretty much falls apart if we accept that household income is a reasonable proxy for their overall socioeconomic index (for which the researchers shared no data whatsoever at the ethnic group level).

The Study's conclusion

Quoting the Study:

"We found inequities in life expectancy by race, ethnicity, neighborhood, and socioeconomic level that can be addressed in order to improve the health outcomes of residents who need it most. These data will continue to be used to shape our priorities, partnerships, and programs."

The underlying assumptions are:

The Study's supporting data

The main supporting data can be visualized within their scatter plot shown below.

M_a33b.pngThe above shows a reasonably strong relationship between the California Healthy Places Index (HPI) score on the X-axis and Life Expectancy on the Y-axis. HPI is a socioeconomic index including numerous variables such as income, housing, access to insurance, etc. The HPI score is a value that appears constrained between -0.4 and + 1.4. The higher this score the overall the HPI for the specific census tract.

There are several issues with the depicted relationship that a higher HPI is associated with a longer life expectancy.

The first issue is that this relationship is sensitive to just 4 data points out of 50. If you remove these 4 data points, the linear relationship weakens markedly.


The graph on the left is a replication of the graph in the Study shown earlier (including a 95% confidence interval around the mean expected data points on the regression trend line). The relationship between HPI and life expectancy is associated with a correlation of 0.58 and an R Square of 0.336. In other words, HPI explains about one-third of the variance in life expectancy. That's actually pretty good.

When you remove just 4 outliers out of 50, as shown on the graph above on the right, the relationship weakens a lot. Now, the correlation is only 0.39. And, the R Square is 0.15. Thus, given this data set with 46 observations, HPI only explains 15% of the variance in life expectancy.

The next issue with this relationship is that it is not that linear at all. If we use a LOESS regression, to explore the non-linearity in this HPI vs life expectancy relationship, we can see a lot of curves.

M_a43.png When you look at the above graph on the right, you can see that the relationship between the two variables is pretty directionless (it goes down, then up, then flat, then down). That is far from a robust linear relationship. It may not even be a reliable non-linear relationship, but more a representation of near randomness.

The next issue is that the scatter plot does not disclose any information about the four different ethnic groups. It should have used different colors for such groups. Below is an example of a scatter plot differentiating between three different groups. The Study should have disclosed its scatter plot similarly.


Given the above, the Study and its data visualization provide no information whatsoever regarding the second assumption:

Life expectancy for different ethnic groups

First, let's review the ethnic mix as disclosed by the Study.


Next, let's look at the life expectancy of the different ethnic groups over time.


Source: IHME

Within Marin County, two out of the three ethnic minorities have a life expectancy that is significantly longer than Whites'. More precisely, 90.1% of the minority population has a life expectancy that is much longer than Whites'.

The ranking of ethnicities' life expectancy is most prevalent. It holds up not only for Marin, but for the whole State of California, and the US.


Do Blacks suffer from segregation in Marin County?

A recent UC Berkeley study indicates that by 2020, Marin became by far the most segregated county within the Bay Area.


Source: UC Berkeley

It is not entirely surprising how these researchers would reach that conclusion given that Marin County's Black population resides nearly exclusively within the small enclave of Marin City. It is more challenging to explain why Marin County's segregation, as measured by its Divergence Index, more than tripled between 1970 and 2020.

Let's investigate how Blacks within Marin County suffer from such segregation by comparing Black life expectancy in Marin vs. other counties in the US.

The IHME provides informative US maps of life expectancy at the county level for various ethnic groups. Let's pull this US map for Black life expectancy.


The map shows that Marin County is within the light yellow zone, indicating among the longest life expectancy for Blacks nationwide.

Next, let's compare Black life expectancy in Marin County vs California and the US.


As shown above, Blacks in Marin County have a life expectancy that is 3.5 to 4 years longer than in California or the US. That is a huge difference. Thus, one can advance that even though Blacks in Marin are indeed segregated (within the small enclave of Marin City), their respective life expectancy has not suffered from it, as it is a lot longer than for most other counties and the US as a whole.

The relationship between life expectancy and income

The HPI variable used by the Study is a complex index consisting of numerous socioeconomic variables with no available time series data. Instead, I used median household income as a proxy for the overall HPI, given that it is probably one of the main variables within the HPI, and it is most probably highly correlated with other significant socioeconomic variables within the HPI.

The scatter plot below compares the four ethnic groups along household income (X-axis) and life expectancy (Y-axis). The underlying time series data for both variables runs from 2013 to 2021 on an annual basis. This makes the X-axis a quasi-time variable with 9 data points running from 2013 to 2021.


Source: IHME,

Above graph's main takeaways:

The two graphs below focus on two interesting comparisons: Black vs Hispanic on the left, and Hispanic vs White on the right.

M_a27.pngSource: IHME,

If we accept that household income is a reasonable proxy of the overall HPI, the above data very much contradicts the Study's second-mentioned assumption.

The health-related causes of life expectancy truncation

The Study looks at the premature mortality rate (death per 100,000 per year before the age of 75) (PMR). You can see that the PMRs for all the causes are way higher for Blacks than for the other ethnic groups. Keep in mind that such Black PMRs are most probably much lower relative to most other counties in the US, given that Marin Blacks have a much longer life expectancy.

M_a28.pngSource: California Department of Public Health

It is interesting to look at PMR multiples. To explain the concept, let's take an example. The Black PMR for cancer is 83.4 vs 49 for the entire Marin population. The resulting Black PMR multiple for cancer is:

83.4/49 = 1.7.


Source: California Department of Public Health

Below, the 5 bar plots visualize the same data.


When looking at the PMR multiples, it is interesting to focus on Black vs. Hispanic (bar plots below). The Black ones are very high. The Hispanic ones are close to the lowest, pretty much even with Asians, and way lower than Whites (as shown above).


If the Study's second assumption was confirmed, the above data would suggest that the HPI for Hispanics was way better or higher than for Blacks. As we will see in the next section this is most unlikely.

Comparing socioeconomic conditions for Blacks vs. Hispanics

I extracted as many socioeconomic variables as I could. Many are most probably part of the HPI. And, I compared Black vs Hispanic.


As shown above, on the majority of socioeconomic indicators that are most likely included in the HPI, the Blacks fare better than the Hispanics. The Blacks are:

Income-wise, there is little difference between the two. Hispanics have a higher household income because they have larger households. Blacks have higher per capita income.

There is only one socioeconomic variable where Hispanics stand out. They have a far higher employment ratio at 66.7% vs only 46.3% for Blacks.

What may explain ethnic groups' differences in life expectancy?

If you want to understand the difference in life expectancy between ethnicities, focusing on income or related socioeconomic variables (education, English proficiency, health insurance coverage) is not that informative. As reviewed, Hispanics who must have the lowest HPI scores have a much longer life expectancy than Whites who must have the highest HPI scores.

Something else must explain the difference in life expectancy across ethnic groups.

One potential set of causal factors that may explain the life expectancy differences between ethnic groups is the lifestyle variables associated with the long life expectancy of individuals living in the Blue Zones.

Blue Zone lifestyle factors

These Blue Zone lifestyle factors include:

  1. Plant-Based Diet: People in Blue Zones eat a diet high in fresh produce. This diet is high in vitamins, and minerals associated with lower levels of cholesterol, blood sugar, and chronic diseases. While not strict vegetarians, people in Blue Zones eat meat infrequently (about five times per month).
  2. Physical Activity: People in Blue Zones are very active (walking, etc.). This constant low-intensity physical activity contributes to their overall health and longevity.
  3. Social Engagement: Strong social networks and a sense of community play a significant role in the longevity of Blue Zone populations. This includes close relationships with family and friends.
  4. Purpose in Life: Knowing one's sense of purpose is associated with several years of extra life expectancy in Blue Zones.
  5. Stress Reduction: Minimizing chronic stress is important for longevity, as it is associated with chronic inflammation and age-related diseases.
  6. Moderate Caloric Intake: People in Blue Zones tend to eat smaller meals throughout the day, avoiding overeating.
  7. Healthy Relationships: Having a strong support system and prioritizing loved ones contribute to longer, and happier lives in Blue Zones.

To the best of my knowledge, none of the above Blue Zone factors are included within the HPI used by the Study. The Blue Zone factors may explain far more of the difference in life expectancy between ethnic groups than HPI or income do.

There is another specific factor that plays a role in Blacks having a shorter life expectancy. And, that is prevalent vitamin D deficiency.

Vitamin D deficiency among the Black population

Blacks' epidermis has a greater concentration of melanin that prevents sufficient vitamin D synthesis from the sun. The resulting vitamin D deficiency has broad-based health and lifespan implications. For more on the subject, check out this interesting paper.

An effective way to assist the Black community in improving its health outcome and raising its life expectancy may be to closely monitor it for vitamin D deficiency and provide this group with vitamin D supplementation. This policy may cause Black's life expectancy to converge over time with the Hispanics', another group with a similar socioeconomic profile.