This study examines how different patterns of social media use relate to self-reported mental health outcomes. The data set was originally collected for a data science project investigating whether the amount and type of social media use are associated with changes in mental health. Two linear regression models were compared: a four-predictor model and an alternative model excluding the variable Validation. The full model was retained due to similar R² values and the statistical significance of all predictors. The final model explains about 27% of the variation in mental health scores, which is typical for social science research. All predictors show statistically significant positive associations with worse mental health. Comparison and NoPurpose exhibit the strongest relationships, while TotalTime and Validation contribute smaller but meaningful effects. Because TotalTime is ordinal, each one-category increase corresponds to a measurable rise in symptom severity. Overall, the results suggest that both the amount and nature of social media use relate to poorer mental health outcomes.
My project attempts to answer to following research questions:
Does regular social media use affect mental health?
Is there a correlation between social media perception and mental health?
The rise of social media has brought up important questions about how it affects people’s mental and emotional health, especially for young adults and frequent users. Since these platforms are now a big part of daily life, many people spend large amounts of time online. Previous research shows that certain social media habits can be linked to stress, anxiety, and lower overall well-being. Because of this, it’s important to look at how different kinds of social media use might relate to mental health.
In this project, mental health is measured using a 1–5 scale, which reflects how participants rated their own mental health at the time of the survey. This scale gives a helpful numerical measure, but it doesn’t capture the full picture of well-being. Broader ideas of well-being, like emotional stability, life satisfaction, and overall happiness, are more complex and go beyond what the data set can directly measure. Making this distinction is important so it’s clear that the project focuses on a specific mental-health score rather than every aspect of someone’s psychological well-being.
By looking at factors such as time spent online, purposeless scrolling, social comparison, and seeking validation, this project aims to understand how certain digital habits might relate to the data set’s mental health measure and what that might suggest about well-being in general. Understanding these relationships matters for research, public health, and everyday life. It also matters to me personally. As a youth ministry volunteer, I’ve noticed many high school students spending more and more time and mental energy on social media. I hope the results of this project will not only add to what we know about social media and mental health but also help me better understand how these habits might affect myself and the students I work with.
In this data set, we have 481 total surveys filled out by adults between ages 13 and 91 with 75% between 21 and 26. This data has been extensively cleaned, so not much more needed to be done. Key variables listed have been selected and renamed from the original data set for clarity. All of the variables are answered with a categorical, ordinal scale from 1 to 5 (1 being the lowest/worst and 5 being the highest/best). The variables chosen from the larger data set were best for exploring trends related to the research inquiries.
Mental Health - The mean of MentalHealth is 3.256 and it is slightly left-skewed. More than 50% of individuals reported between 3-5 on the scale, signifying that depressed or down feelings are experienced more often than not among the sample. This histogram shows that MentalHealth is worth studying further to see which predictors impact it.
Total Time - The mean of TotalTime is 3.91 and the median is 4. We see that most of the reported social media time is after a score of 3 or over 3 hours. The spike of a score of 6 is important to note.
No Purpose - The NoPurpose histogram is slight skewed left with a median of 4 and a mean of 3.55. This suggests many people report using social media without a clear purpose in mind.
Comparison - The Comparison histogram is slightly skewed right, but mostly even distribution across the observations in the sample. This suggests that there is an even spread of people comparing themselves to others on social media.
Validation - The histogram of Validation is right-skewed with a median of 2 and mean of 2.46. This suggests not many people consider themselves seeking validation by others through social media. Yet, the presence of individuals reporting a high number is enough to look into this further.
Mental Health vs. Total Time - As seen in the initial comparison of TotalTime on MentalHealth, the slope of the line is about 0.15. We can see a positive relationship between the two variables, giving us enough reason to study this further and look into the potential other predictors seen.
The response variable, MentalHealth, was treated as a continuous outcome representing overall mental health scores. Prior to model fitting, the distribution of the response variable was visually inspected using histograms and summary statistics. The distribution appeared to have no extreme skew, allowing the use of linear regression without transformation. No outliers needed to be removed since there were adequate responses on each number of the scale of 1-5.
Model 1: MentalHealth = β0 + β1 * TotalTime + β2 * NoPurpose + β3 * Comparison + β4 * Validation + ϵ
Model 2: MentalHealth = β0 + β1 * TotalTime + β2 * NoPurpose + β3 * Comparison + ϵ
Reasons for considering Model 2: As seen in histogram for Validation in the Data Exploration section, the right-skewed shape suggests that not many values are at the higher end of the scale compared to the other predictors. This could mean that the variables does not provide enough meaningful explanation for the model at the higher points. Additionally, the p-value was highest for Validation in the initial model. So this predictor was a likely candidate to be dropped in the model refinement.
Why Model 1 was selected: We have tested to see whether the presence of Validation is necessary because of its higher p-value compared to the rest. The adjusted R² of model 1 indicates that about 27% of the variation in mental health scores is explained by the four predictors combined, whereas the adjusted R² value of model 2 is roughly 26.4%. Additionally, validation had a significant p-value in the first model. Losing a significant variable would be a loss of explaining power. We will, therefore, take model 1 over model 2 because its similar adjusted R² value and since validation was significant in model 1.
As stated previously, the variables in this study are ordinal on a scale of 1-5. By treating the variables as numeric, the analysis can interpret changes in the predictors as representing gradual increases or decreases in the underlying attitudes or behaviors measured by the scale. This allows for clearer estimation of relationships between social media usage patterns and the mental-health score in the data set. However, it is still important to acknowledge that the numeric values are approximations of underlying attitudes, not precise measurements.
Although many variables in this study are measured on 1–5 Likert scales, it is common practice in social science research to treat such scales as approximately continuous when the data show reasonably even distribution across categories. Prior literature demonstrates that linear regression remains feasible when Likert scale items have at least five response options, as the intervals between points can be assumed to be roughly equal for practical, analytical purposes. Treating these variables as continuous allows for more interpretable effect estimates, simplifies model specification, and aligns with standard methodological approaches in psychology.
Call:
lm(formula = MentalHealth ~ TotalTime + NoPurpose + Comparison +
Validation, data = cleandata)
Residuals:
Min 1Q Median 3Q Max
-3.3877 -0.7979 0.0425 0.8223 3.2099
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.80793 0.20470 3.947 9.11e-05 ***
TotalTime 0.15308 0.03547 4.316 1.94e-05 ***
NoPurpose 0.21835 0.05117 4.267 2.39e-05 ***
Comparison 0.29303 0.04066 7.207 2.26e-12 ***
Validation 0.09935 0.04536 2.190 0.029 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.122 on 476 degrees of freedom
Multiple R-squared: 0.2757, Adjusted R-squared: 0.2697
F-statistic: 45.31 on 4 and 476 DF, p-value: < 2.2e-16
Model 1 consisting of all key predictors and the response variable of MentalHealth was used. The data values of TotalTime were converted from categorical to numerical according to the variable descriptor.
Residuals vs Fitted - The flat, horizontal line of residuals indicates the linearity assumption is satisfied. It also suggests that there is no major pattern in the residuals, meaning the model is capturing the relationship without systematic bias.
Q-Q Residuals - Most of the residuals follow a normal distribution, so the normality assumption can be assumed. There are slightly deviated tails at the end but are not a problem for the majority of the data.
Scale-Location - The flat horizontal line suggests no overall trend in variance. However, the odd shape of the data points are troubling. This logically makes sense though since the data is discrete or in categories. So we can roughly assume constant variance across the numbered categories.
Residuals vs Leverage - The flat, horizontal line at zero suggests all of the points are relatively independent with none of them violating a high Cook’s distance. There does not seem to be any high leverage or influential points in the data, which follows considering the numbered categories.
1. Significant Predictors and R² Value
2. Interpretation of Coefficients
Comparison (0.293) and NoPurpose (0.218) show the strongest associations with mental health. These findings suggest that social comparison and purposeless social media use are particularly influential in predicting poorer mental health outcomes.
TotalTime (0.153) and Validation (0.099) also contribute meaningfully, though with smaller effect sizes. Importantly, the TotalTime variable is ordinal, so its coefficient represents the expected change in mental health score for a one-category increase (for example, moving from “1–2 hours” to “2–3 hours”). Thus, each step up the time-use scale corresponds to an estimated 0.153-point increase in depressive symptom scores. Although the numerical values may seem modest, on a 1–5 mental health scale even small shifts can represent practically meaningful changes, especially when accumulated across individuals or when interpreted in the context of daily, habitual behaviors.
3. Model Fit and Residuals
4. Practical Significance
Beyond statistical significance, the model highlights meaningful behavioral patterns: even moderate increases in social comparison, purposeless scrolling, or time spent online correspond to measurable declines in mental health. Given the large role social media plays in daily life, these seemingly minor effects may accumulate in ways that are impactful at both personal and population levels.
5. Conclusion
Although the adjusted R² is moderate, the model provides clear evidence that total time on social media, purposeless use, comparison behaviors, and validation seeking each relate to poorer mental health outcomes. These findings align with our research goals and contribute valuable insight into how specific patterns of social media use are connected to mental health in the sample.
Unfortunately, the only mental health related variable that was usable pertained to depressed feelings. There are many other mental health aspects that could be explored for relationships to social media use. In the future, I hope that surveys can be more broad about mental health use and how it affects daily life related to social media. Additionally, the time and information restraint that I had limited my exploration of this data. There are still many comparisons and analyses that can be made with these variables that I could not address. Lastly, the categorical values given based on a range is quite ambiguous in this study. For instance, we do not know what exactly a recorded 5 for “validation” means other than a greater feeling compared to lesser values. This is subjective in some regards, so a more definitive and quantifiable measure of some variables would increase certainty of comparisons and trends in the data for the future. Lastly, 27% explained variance in the model means there are many other factors influencing mental health not captured here. Considering other variables in the data set or not listed in the survey is very important to explaining the rest of the variation.
Main Data Set:
Souvik Ahmed, and Muhesena Nasiha Syeda. (2023). Social Media and Mental Health [Data set]. Kaggle. https://www.kaggle.com/datasets/souvikahmed071/social-media-and-mental-health
---
title: "Social Media and Mental Health"
output:
flexdashboard::flex_dashboard:
theme:
version: 4
bootswatch: journal
navbar-bg: blue
orientation: columns
vertical_layout: fill
source_code: embed
---
```{r setup, include=FALSE}
#Introduce packages
pacman::p_load(dplyr, DT, flexdashboard, plotly, tidyverse)
#Load Data and manipulate
totaldata<- read_csv("C:/Users/ajbaz/OneDrive/Documents/MTH 369/Final Project Data.csv")
totaldata <- totaldata %>%
rename(
TotalTime = `8. What is the average time you spend on social media every day?`,
NoPurpose = `9. How often do you find yourself using Social media without a specific purpose?`,
Comparison = `15. On a scale of 1-5, how often do you compare yourself to other successful people through the use of social media?`,
Validation = `17. How often do you look to seek validation from features of social media?`,
MentalHealth = `18. How often do you feel depressed or down?`
)
cleandata <- totaldata %>%
select(TotalTime, NoPurpose, Comparison, Validation, MentalHealth)
attach(cleandata)
cleandata <- cleandata %>%
mutate(
TotalTime = recode(
TotalTime,
"Less than an Hour" = "1",
"Between 1 and 2 hours" = "2",
"Between 2 and 3 hours" = "3",
"Between 3 and 4 hours" = "4",
"Between 4 and 5 hours" = "5",
"More than 5 hours" = "6"
)
)
cleandata$TotalTime <- as.numeric(as.character(cleandata$TotalTime))
colnames(totaldata)
```
Introduction
===
Column {.tabset data-width=600}
---
<style>
h1 {text-align: center;}
</style>
<h1>
<font size="4">
***An Analysis of Social Media Use and Influence on Mental Health***
</font>
</h1>
### Abstract
This study examines how different patterns of social media use relate to self-reported mental health outcomes. The data set was originally collected for a data science project investigating whether the amount and type of social media use are associated with changes in mental health. Two linear regression models were compared: a four-predictor model and an alternative model excluding the variable Validation. The full model was retained due to similar R² values and the statistical significance of all predictors. The final model explains about 27% of the variation in mental health scores, which is typical for social science research. All predictors show statistically significant positive associations with worse mental health. Comparison and NoPurpose exhibit the strongest relationships, while TotalTime and Validation contribute smaller but meaningful effects. Because TotalTime is ordinal, each one-category increase corresponds to a measurable rise in symptom severity. Overall, the results suggest that both the amount and nature of social media use relate to poorer mental health outcomes.
My project attempts to answer to following research questions:
- Does regular social media use affect mental health?
- Is there a correlation between social media perception and mental health?
### Background/Significance
The rise of social media has brought up important questions about how it affects people’s mental and emotional health, especially for young adults and frequent users. Since these platforms are now a big part of daily life, many people spend large amounts of time online. Previous research shows that certain social media habits can be linked to stress, anxiety, and lower overall well-being. Because of this, it’s important to look at how different kinds of social media use might relate to mental health.
In this project, mental health is measured using a 1–5 scale, which reflects how participants rated their own mental health at the time of the survey. This scale gives a helpful numerical measure, but it doesn’t capture the full picture of well-being. Broader ideas of well-being, like emotional stability, life satisfaction, and overall happiness, are more complex and go beyond what the data set can directly measure. Making this distinction is important so it’s clear that the project focuses on a specific mental-health score rather than every aspect of someone’s psychological well-being.
By looking at factors such as time spent online, purposeless scrolling, social comparison, and seeking validation, this project aims to understand how certain digital habits might relate to the data set’s mental health measure and what that might suggest about well-being in general. Understanding these relationships matters for research, public health, and everyday life. It also matters to me personally. As a youth ministry volunteer, I’ve noticed many high school students spending more and more time and mental energy on social media. I hope the results of this project will not only add to what we know about social media and mental health but also help me better understand how these habits might affect myself and the students I work with.
### Total Data Table
```{r}
DT::datatable(cleandata)
```
Column {data-width=400}
---
### Variable Description
- **TotalTime**: Average time spent on social media daily (Categorical: <1 hours = 1, 1-2 hours = 2, 2-3 hours = 3, 3-4 hours = 4, 4-5 hours = 5, >5 hours = 6)
- **NoPurpose**: How often do you use Social media without a specific purpose (Categorical: 1-5)
- **Comparison**: How often do you compare yourself to other successful people through social media (Categorical: 1-5)
- **Validation**: How often they look to seek validation from features of social media (Categorical: 1-5)
- **MentalHealth**: Response variable, how often they feel depressed or down (Categorical: 1-5)
In this data set, we have 481 total surveys filled out by adults between ages 13 and 91 with 75% between 21 and 26. This data has been extensively cleaned, so not much more needed to be done. Key variables listed have been selected and renamed from the original data set for clarity. All of the variables are answered with a categorical, ordinal scale from 1 to 5 (1 being the lowest/worst and 5 being the highest/best). The variables chosen from the larger data set were best for exploring trends related to the research inquiries.
Data Exploration
===
Column {.tabset data-wdith=550}
---
### Mental Health
```{r, MentalHealth_hist}
ggplot(cleandata, aes(x = MentalHealth)) +
geom_histogram(binwidth = 1, fill = "#FFA07A", color = "black") +
labs(
title = "Histogram of Mental Health",
x = "Mental Health Score",
y = "Count"
) +
theme_minimal()
```
### Total Time
```{r, TotalTime_hist}
ggplot(cleandata, aes(x = TotalTime)) +
geom_histogram(binwidth = 1, fill = "#FF8C69", color = "black") +
labs(
title = "Histogram of Total Time Spent on Social Media",
x = "Total Time Spent",
y = "Count"
) +
theme_minimal()
```
### No Purpose
```{r, NoPurpose_hist}
ggplot(cleandata, aes(x = NoPurpose)) +
geom_histogram(binwidth = 1, fill = "#FF7F50", color = "black") +
labs(
title = "Histogram of No Purpose Social Media Use",
x = "NoPurpose Score",
y = "Count"
) +
theme_minimal()
```
### Comparison
```{r, Comparison_hist}
ggplot(cleandata, aes(x = Comparison)) +
geom_histogram(binwidth = 1, fill = "salmon", color = "black") +
labs(
title = "Histogram of Social Comparison",
x = "Comparison Score",
y = "Count"
) +
theme_minimal()
```
### Validation
```{r, Validation_hist}
ggplot(cleandata, aes(x = Validation)) +
geom_histogram(binwidth = 1, fill = "#FF6347", color = "black") +
labs(
title = "Histogram of Validation Seeking",
x = "Validation Score",
y = "Count"
) +
theme_minimal()
```
### MH vs TT
```{r}
ggplot(cleandata, aes(x = TotalTime, y = MentalHealth)) +
geom_jitter(width = 0.1, height = 0, alpha = 0.6, color = "black") + # scatter with jitter
stat_summary(fun = mean, geom = "line", aes(group = 1), color = "#FF4500", size = 1) + # mean trend line
stat_summary(fun = mean, geom = "point", color = "#FF4500", size = 2) + # points at mean
labs(
title = "Mental Health vs Total Time on Social Media",
x = "Total Time on Social Media (coded 1–6)",
y = "Mental Health Score"
) +
theme_minimal()
```
Column{data-width=450}
---
### EDA Summaries
**Mental Health** - The mean of MentalHealth is 3.256 and it is slightly left-skewed. More than 50% of individuals reported between 3-5 on the scale, signifying that depressed or down feelings are experienced more often than not among the sample. This histogram shows that MentalHealth is worth studying further to see which predictors impact it.
**Total Time** - The mean of TotalTime is 3.91 and the median is 4. We see that most of the reported social media time is after a score of 3 or over 3 hours. The spike of a score of 6 is important to note.
**No Purpose** - The NoPurpose histogram is slight skewed left with a median of 4 and a mean of 3.55. This suggests many people report using social media without a clear purpose in mind.
**Comparison** - The Comparison histogram is slightly skewed right, but mostly even distribution across the observations in the sample. This suggests that there is an even spread of people comparing themselves to others on social media.
**Validation** - The histogram of Validation is right-skewed with a median of 2 and mean of 2.46. This suggests not many people consider themselves seeking validation by others through social media. Yet, the presence of individuals reporting a high number is enough to look into this further.
**Mental Health vs. Total Time** - As seen in the initial comparison of TotalTime on MentalHealth, the slope of the line is about 0.15. We can see a positive relationship between the two variables, giving us enough reason to study this further and look into the potential other predictors seen.
Methods
===
Column {data-width=550}
---
### Model Comparison
The response variable, MentalHealth, was treated as a continuous outcome representing overall mental health scores. Prior to model fitting, the distribution of the response variable was visually inspected using histograms and summary statistics. The distribution appeared to have no extreme skew, allowing the use of linear regression without transformation. No outliers needed to be removed since there were adequate responses on each number of the scale of 1-5.
**Model 1**:
MentalHealth = β0 + β1 * TotalTime + β2 * NoPurpose + β3 * Comparison + β4 * Validation + ϵ
**Model 2**:
MentalHealth = β0 + β1 * TotalTime + β2 * NoPurpose + β3 * Comparison + ϵ
**Reasons for considering Model 2**:
As seen in histogram for Validation in the Data Exploration section, the right-skewed shape suggests that not many values are at the higher end of the scale compared to the other predictors. This could mean that the variables does not provide enough meaningful explanation for the model at the higher points. Additionally, the p-value was highest for Validation in the initial model. So this predictor was a likely candidate to be dropped in the model refinement.
**Why Model 1 was selected**:
We have tested to see whether the presence of Validation is necessary because of its higher p-value compared to the rest. The adjusted R² of model 1 indicates that about 27% of the variation in mental health scores is explained by the four predictors combined, whereas the adjusted R² value of model 2 is roughly 26.4%. Additionally, validation had a significant p-value in the first model. Losing a significant variable would be a loss of explaining power. We will, therefore, take model 1 over model 2 because its similar adjusted R² value and since validation was significant in model 1.
Column {data-width=450}
---
### Treating Likert Scales as Continuous and Numeric
As stated previously, the variables in this study are ordinal on a scale of 1-5. By treating the variables as numeric, the analysis can interpret changes in the predictors as representing gradual increases or decreases in the underlying attitudes or behaviors measured by the scale. This allows for clearer estimation of relationships between social media usage patterns and the mental-health score in the data set. However, it is still important to acknowledge that the numeric values are approximations of underlying attitudes, not precise measurements.
Although many variables in this study are measured on 1–5 Likert scales, it is common practice in social science research to treat such scales as approximately continuous when the data show reasonably even distribution across categories. Prior literature demonstrates that linear regression remains feasible when Likert scale items have at least five response options, as the intervals between points can be assumed to be roughly equal for practical, analytical purposes. Treating these variables as continuous allows for more interpretable effect estimates, simplifies model specification, and aligns with standard methodological approaches in psychology.
Results/Discussion
===
Column {.tabset data-width=550}
---
### Model 1 Regression Results
```{r, model1}
model1<- lm(MentalHealth~ TotalTime+NoPurpose+Comparison+Validation, cleandata)
summary(model1)
```
### Model 1 Diagnostics
```{r, diagnostics}
par(mfrow = c(2, 2))
plot(model1)
```
Column {.tabset data-width=450}
---
### Diagnostic Analysis
Model 1 consisting of all key predictors and the response variable of MentalHealth was used. The data values of TotalTime were converted from categorical to numerical according to the variable descriptor.
**Residuals vs Fitted** - The flat, horizontal line of residuals indicates the linearity assumption is satisfied.
It also suggests that there is no major pattern in the residuals, meaning the model is capturing the relationship without systematic bias.
**Q-Q Residuals** - Most of the residuals follow a normal distribution, so the normality assumption can be assumed. There are slightly deviated tails at the end but are not a problem for the majority of the data.
**Scale-Location** - The flat horizontal line suggests no overall trend in variance. However, the odd shape of the data points are troubling. This logically makes sense though since the data is discrete or in categories. So we can roughly assume constant variance across the numbered categories.
**Residuals vs Leverage** - The flat, horizontal line at zero suggests all of the points are relatively independent with none of them violating a high Cook's distance. There does not seem to be any high leverage or influential points in the data, which follows considering the numbered categories.
### Discussion
**1. Significant Predictors and R² Value**
- The adjusted R² of Model 1 indicates that about **27%** of the variation in mental health scores is explained by the four predictors combined. While this is not extremely high, it is typical for social science research, where mental health outcomes are influenced by many unmeasured factors.
- All variables have statistically significant positive coefficients, meaning that higher scores on each predictor are associated with worse reported mental health. The large t-values and very small p-values indicate that these relationships are unlikely to be due to chance.
**2. Interpretation of Coefficients**
- **Comparison (0.293)** and **NoPurpose (0.218)** show the strongest associations with mental health. These findings suggest that social comparison and purposeless social media use are particularly influential in predicting poorer mental health outcomes.
- **TotalTime (0.153)** and **Validation (0.099)** also contribute meaningfully, though with smaller effect sizes. Importantly, the TotalTime variable is ordinal, so its coefficient represents the expected change in mental health score for a one-category increase (for example, moving from “1–2 hours” to “2–3 hours”). Thus, each step up the time-use scale corresponds to an estimated 0.153-point increase in depressive symptom scores. Although the numerical values may seem modest, on a 1–5 mental health scale even small shifts can represent practically meaningful changes, especially when accumulated across individuals or when interpreted in the context of daily, habitual behaviors.
**3. Model Fit and Residuals**
- Residuals appear reasonably balanced around zero, indicating no major violations of linearity or symmetry.
- The residual standard error of approximately 1.12 suggests the average prediction deviates from the observed mental health score by about one point, consistent with expectations for psychological survey data.
**4. Practical Significance**
Beyond statistical significance, the model highlights meaningful behavioral patterns: even moderate increases in social comparison, purposeless scrolling, or time spent online correspond to measurable declines in mental health. Given the large role social media plays in daily life, these seemingly minor effects may accumulate in ways that are impactful at both personal and population levels.
**5. Conclusion**
Although the adjusted R² is moderate, the model provides clear evidence that total time on social media, purposeless use, comparison behaviors, and validation seeking each relate to poorer mental health outcomes. These findings align with our research goals and contribute valuable insight into how specific patterns of social media use are connected to mental health in the sample.
Limitations/Future
===
Column {data-width=600}
---
### Limitations/Future Direction
Unfortunately, the only mental health related variable that was usable pertained to depressed feelings. There are many other mental health aspects that could be explored for relationships to social media use. In the future, I hope that surveys can be more broad about mental health use and how it affects daily life related to social media. Additionally, the time and information restraint that I had limited my exploration of this data. There are still many comparisons and analyses that can be made with these variables that I could not address. Lastly, the categorical values given based on a range is quite ambiguous in this study. For instance, we do not know what exactly a recorded 5 for "validation" means other than a greater feeling compared to lesser values. This is subjective in some regards, so a more definitive and quantifiable measure of some variables would increase certainty of comparisons and trends in the data for the future. Lastly, 27% explained variance in the model means there are many other factors influencing mental health not captured here. Considering other variables in the data set or not listed in the survey is very important to explaining the rest of the variation.
Column {data-width=400}
---
### References
**Main Data Set**:
Souvik Ahmed, and Muhesena Nasiha Syeda. (2023). Social Media and Mental Health [Data set]. Kaggle. https://www.kaggle.com/datasets/souvikahmed071/social-media-and-mental-health
Bio
===
Column {data-width=600}
---
### About the Author
My name is Andrew Jones and I am a senior undergraduate student at the University of Dayton. I am currently pursuing a Bachelor of Science degree in Mathematics with a minor in Data Analytics. I plan on graduating in May 2026.
I am looking forward to a career in the data analytics field. My professional experience includes the Data Analytics Co-op position I held in the summer of 2025 at Crown Equipment located in New Bremen, Ohio.
Column {data-width=400}
---
### My Picture
```{r Picture, fig.width=5, echo= FALSE, fig.height= 5}
knitr::include_graphics("IMG_5009.jpeg")
```