Blood pressure data analysis from NHANES dataset.

The National Health and Nutrition Examination Survey (NHANES) is a study assessing information about health and nutrition status of people in the USA. NHANES began in early 1960s as a series of surveys and from 1999 to 2018 as continuous program. It is a very important survey rising statistical data about health of USA population (

NHANES logo.

Data about age and gender were used from demographic dataset as well as systolic and diastolic blood pressure (BP) from Blood Pressure dataset. The datasets used ranged from 1999–2000 to 2017–2018 and can be found here:

The python code used for the chart 1 and summary table is available here:

The chart below (Chart 1) represents Kernel density estimation (KDE) for diastolic and systolic BP. Females are represented in red distributions, and males in blue distributions. The distribution of both genders in the left side, represents the diastolic BP, and on the right side, the systolic BP. Each horizontal graph represents age bands of ten years ([0–10 years old], [10–20 years old], [20–30 years old], [30–40 years old], [40–50 years old], [50–60 years old], [60–70 years old] and [70–80 years old]) of diastolic and systolic BP for males and females. The vertical dashed line in each age band graph represents the mean of the distribution of diastolic BP in the left side and systolic BP in the right side, in blue for males, and in red for females. In each graph, the X axis represents the BP in mmHg and the Y axis the frequency of BP values.

Many t-values are displayed in the graph. The “t diastolic” values are comparisons between male and female diastolic BP. The “t systolic” values are comparisons between male and female systolic BP. The “t female” values are comparisons between systolic and diastolic BP of females. The “t male” values are comparisons between systolic and diastolic BP of males. The “t age” values of the left side are comparisons of diastolic BP of males (in blue) or females (in red) between a given age band and the previous age band. For example, the values on “t age” on the age band of [10–20 years] are a comparison between [10–20 years] and [0–10 years] group. The age band [0–10] do not have values once it does not have a previous age band for comparison. The “t age” on the right side follows the same idea, but represents comparisons of systolic BP.

Chart 1, representing blood pressure distribution across age and gender.
Summary table of systolic and diastolic blood pressure of groups for gender and age bands (Std = Standard deviation, n = sample size).

Before start discussing the results, it is good to remember how t-value is calculated by pooled approach.

t-value formula for pooled approach (x = mean, n = sample size, s = standard deviation).

The t value, the same used in Student’s t-test, is represented as the difference between means of variables. This difference suffers the influence of how big the sample size is or of how big the standard deviation is, holding constant the other variables. Overall, a bigger sample size, increases t-value or a bigger standard deviation reduces the t-value. A test hypothesis was not the purpose of the analysis, but overall, any t value equal or bigger than 1.96 is considered as significant difference at alpha = 0.05, or any t-value smaller or equal to -1.96. The main purpose of the use of t-values here, was calculating differences between gender, diastolic and systolic BP and age bands taking a deeper analysis and considering mean, standard deviation and sample size of variables.

Also, before start talking about the results, it is good to remember the concept of systole and diastole in the cardiac cycle:

Overall, lower BP marks the diastole phase once there is ventricular relaxation and systole phase is marked by ventricular contraction, which increases the pressure made by blood over arteries. Below there is a simple distribution chart showing KDE and histogram of diastolic and systolic BP from all groups. The mean diastolic BP is 66.41 mmHg and the mean systolic BP is 119.05 mmHg represented by a solid vertical line, and the median as a dashed vertical line. The code used for this graph is available here:

Speaking about sex differences in BP, I highly recommend the following article for further understanding of the data:, and further explanation on the effects of aging in BP in both sexes, I recommend this article:

Overall, the analysis of the NHANES dataset showed that the difference between diastolic (t= -2.47) and systolic BP (t = 3.1) in the age band [0–10 years] is small compared to other age bands. This is observed, between many reasons, by a low sexual differentiation in this pre-puberty period. The difference between males and females in systolic BP reach it’s maximum value on the age band of [20–30 years] (t= 37.31) and diastolic BP on the age band of [30–40 years] (t=19.68), both BP being higher in males compared to females (this is why the t value is positive, once the variable 1 is always the male group compared to the variable 2 female).

After the greatest difference in systolic BP in the [20–30 years], the difference drops through years reaching a negative t value of -1.9 in the age band of [60–70 years]. The negative t value is saying that, on our sample, the mean systolic BP of females is higher than males. The diastolic BP difference between males and females also reduces over years. Many factors contributes to this observed result. One of those factors may be drop of sexual hormones in females, as they get older.

Using the “t age” values observed for systolic BP, we can see that males have high increase in BP, specially between the age bands of [10–20 years](t = 33.16) and [20–30 years](t=34.71) which slows down in older groups, for example [70–80 years] (t=4.17). Females, on the other hand, start with slower increase in systolic BP in the age bands[10–20 years](t=23.85) and [20–30 years](t=17.47). Then females increases their t values in the age band of [40–50 years](t=20.9), reaching the point of a higher mean systolic BP on the age band of [60–70 years] (speaking about the sample only).

Using the “t age” values for diastolic BP, I would highlight brief observations. First of all, it’s interesting to note the shape of “U” of data over years for both sexes, increasing until age band of [40–50 years] and reducing after this age band.

The purpose of calculating “t male” and “t female” was observe variations between diastolic and systolic BP in males or females respectively between age bands. In the first age band [0–10 years] we observe a “t female” of 104.14 and “t male” of 103.36, which increases in age band of [10–20 years] and falls as age increases. This was not an expected result once the difference between means increase after 40 years old. But, we also see a continuous increase in the standard deviation of the variables as the groups get older, especially on systolic BP.

It is important to highlight that many biases might be contained is this data analysis. For example, smoking, HDL, cholesterol, obesity, nutrition behaviour, alcohol consumption and diabetes were not explored between age bands and gender and might influence BP.

In resume, the difference between males and females are minimal in BP between 0–10 years old. The difference becomes clear between 10–40 years old, with males having greater systolic and diastolic BP. Those differences are reduced between 40–70 years old and swap between 70–80 years old in matters of systolic BP, and are reduced in matters of diastolic BP. Both genders suffers an increase of systolic BP as they get older. Both genders suffers increase in diastolic BP as they get older until 40–50 years, after this point, diastolic BP is reduced.

Bachelor in Biomedical Sciences, Master in Pharmacology and PhD student in neuroscience. Currently seeking experience in Data Science.