UCLA Environmental Services Department Questions
ANSWER
Step 1: Data Description and Curiosity Questions
Let’s start by loading the data, exploring its structure, and answering some basic questions about it.
# Load necessary libraries
library(readxl)
library(dplyr)
# Load the data
rawdata <- read_excel("FY20-22 Direct Discharge Report.xlsx", sheet = "FY21-22", col_names = TRUE)
# Remove the header row
indata <- rawdata[-1,]
# Rename columns
colnames(indata) <- gsub(" ", "_", indata[1,])
indata <- indata[-1,]
# Convert Visit_Date to a Date format
indata$Visit_Date <- as.Date(as.numeric(indata$Visit_Date), origin = "1899-12-30")
# Explore the data
str(indata)
This code reads the data, cleans it, and provides basic information about its structure.
Step 2: Descriptive Statistics and Visualization
Now, let’s perform some descriptive statistics and visualizations to gain insights into the data. We will focus on trash volumes collected from sites cleaned multiple times in a fiscal year.
# Summary statistics
summary_stats <- summary(indata$Trash_Volume)
# Visualize the distribution of Trash_Volume
library(ggplot2)
ggplot(indata, aes(x = Trash_Volume)) +
geom_histogram(binwidth = 100, fill = "blue", color = "black") +
labs(title = "Distribution of Trash Volume",
x = "Trash Volume",
y = "Frequency")
# Relationship between variables (if relevant variables are available)
# Trend analysis (if relevant time-based variables are available)
# Comparison of summary statistics across categories (if relevant categorical variables are available)
In this step, we calculate summary statistics for the Trash_Volume variable and create a histogram to visualize its distribution.
Step 3: Generate and Test Hypothesis
Let’s generate a hypothesis related to trash accumulation at different sites and perform a hypothesis test to validate it. For example, you could test whether there is a significant difference in trash volume between sites that receive outreach efforts and those that do not.
# Hypothesis: Sites that receive outreach efforts have lower trash volumes compared to sites that do not.
outreach_sites <- indata %>% filter(Outreach_Efforts == "Yes")
no_outreach_sites <- indata %>% filter(Outreach_Efforts == "No")
# Perform a t-test
t_test_result <- t.test(outreach_sites$Trash_Volume, no_outreach_sites$Trash_Volume)
# Print the t-test result
t_test_result
In this step, we formulate a hypothesis and perform a t-test to test it. The result will tell us whether there is a significant difference in trash volumes between the two groups.
Step 4: Summarize Observations
Summarize your findings, including the descriptive analysis, visualizations, and the results of the hypothesis test. Interpret the results and provide insights that can help the Environmental Services Department in developing re-encampment prevention strategies.
Remember to save your Jupyter Notebook or RMarkdown Notebook along with the dataset you used for reference and submission.
QUESTION
Description
Please perform data analysis on the data provided from Environmental Services Department. More context on the department can be found on the [Service Learning] Resources page. You can choose to complete this assignment in either R or Python.
Guidance from the client:
“My recommendation is to do an analysis on trash volumes collected from sites that are cleaned multiple times in a Fiscal Year (FY), and from FY to FY. This analysis may help CSJ develop re-encampment prevention strategies/programs (focus outreach at this site?), structural barriers, illegal dumping surveillance, and other BMPs to reduce/prevent trash accumulation at these sites.”
Data: Q1 & Q2 FY 22 – Clean ups Download Q1 & Q2 FY 22 – Clean ups; FY20-22 Direct Discharge Report.xlsxDownload FY20-22 Direct Discharge Report.xlsx
Code to import:
library(readxl)
library(tidyverse)
rawdata <- read_excel("FY20-22 Direct Discharge Report.xlsx", sheet = "FY21-22", col_names = TRUE)
indata <- rawdata[-1,]
colnames(indata) <- gsub(" ", "_", indata[1,])
indata <- indata[-1,]
indata$Visit_Date <- as.Date(as.numeric(indata$Visit_Date), origin = "1899-12-30")
Submission Formats: (Jupyter Notebook OR RMarkdown Notebook (and knitted file)) AND dataset you used.
1. Data Description and Curiosity Questions about the data:
background or the context of data selected – sources, description of how it was collected, time period it represents, context in it was collected if available, [Service Learning] Resources
reason(s) why you selected it?
Description of the data:
how big is it (number of observations, variables),
how many numeric variables,
how many categorical variables,
description of the variables, if available
Are there any missing values?
Any duplicate rows?
Compute summary statistics (mean, median, mode, standard deviation, variance, range).
Select one categorical variable, compute these statistics on a numeric variable by grouping on a categorical variable
Record your observation. What did you find the most fascinating from your descriptive analysis.
2. Descriptive Statistics and Visualization (at least two out of the four listed below)
Relationship between variables
Trend
Distribution of the variable(s)
Spatial data representation
Comparison of summary statistics across categories
3. Generate at least one hypothesis and perform hypothesis test.
4. Summarize your observations
![Place Your Order Here](http://scholarywriters.com/wp-content/uploads/2023/08/Bottom-of-every-post.png)