CSUGC Programming Linear Regression Modeling Questions
ANSWER
- Load the R package MASS and its Cars93 data set:
library(MASS)
data("Cars93")
- How many rows does the Cars93 data set have?
You can use the nrow()
function to determine the number of rows:
num_rows <- nrow(Cars93)
num_rows
- What are the names of the variables in the Cars93 data set?
You can use the names()
function to get the variable names:
variable_names <- names(Cars93)
variable_names
- What are the summary statistics of the variables EngineSize and MPG.city?
You can use the summary()
function to obtain summary statistics for specific variables:
summary_engine_size <- summary(Cars93$EngineSize)
summary_mpg_city <- summary(Cars93$MPG.city)
summary_engine_size
summary_mpg_city
- Subset the Cars93 data set into a smaller data set as described:
# Calculate the median of EngineSize
median_engine_size <- median(Cars93$EngineSize)
# Subset the data
smaller_data <- subset(Cars93, EngineSize >= median_engine_size, select = c("EngineSize", "MPG.city"))
- How many rows in the smaller data set?
You can use the nrow()
function as before:
num_rows_smaller <- nrow(smaller_data)
num_rows_smaller
- What are the names of the variables in the smaller data set?
Use the names()
function on the smaller data set:
variable_names_smaller <- names(smaller_data)
variable_names_smaller
- What are the summary statistics of the variables EngineSize and MPG.city in the smaller data set?
summary_engine_size_smaller <- summary(smaller_data$EngineSize)
summary_mpg_city_smaller <- summary(smaller_data$MPG.city)
summary_engine_size_smaller
summary_mpg_city_smaller
- Display a scatter plot of the relationship between the two variables EngineSize and MPG.city:
# Install and load the ggplot2 package if you haven't already
# install.packages("ggplot2")
library(ggplot2)
# Create the scatter plot
ggplot(smaller_data, aes(x = EngineSize, y = MPG.city)) +
geom_point() +
labs(x = "EngineSize", y = "MPG.city") +
ggtitle("Scatter Plot of EngineSize vs. MPG.city")
Explanation:
- We start by loading the MASS package and the Cars93 dataset.
- We determine the number of rows in the Cars93 dataset using
nrow()
. - We obtain the variable names using the
names()
function. - We calculate and display summary statistics for the EngineSize and MPG.city variables using
summary()
. - We create a smaller dataset by subsetting the Cars93 dataset based on the condition you mentioned.
- We determine the number of rows in the smaller dataset.
- We obtain the variable names in the smaller dataset.
- We calculate and display summary statistics for the EngineSize and MPG.city variables in the smaller dataset.
- We create a scatter plot using the ggplot2 package to visualize the relationship between EngineSize and MPG.city in the smaller dataset. You will need to have the ggplot2 package installed for this step.
Please note that you may need to install the ggplot2 package if it’s not already installed in your R environment. You can do this using install.packages("ggplot2")
.
Question Description
I’m working on a programming multi-part question and need the explanation and answer to help me learn.
Load the R package MASS, and its Cars93 data set, using the R command:
library(MASS)
and answer the following questions using R code:
-
- How many rows does the Cars93 data set has?
- What are the names of the variables in the Cars93 data set?
- What are the summary statistics of the variables EngineSize and MPG.city?
- Subset the Cars93 data set into a smaller data set that excludes vehicles with EngineSize less than the median of EngineSize and excludes all variables except the EngineSize and MPG.city variables.
- How many rows in the smaller data set?
- What are the names of the variables in the smaller data set?
- What are the summary statistics of the variables EngineSize and MPG.city in the smaller data set?
- Display a scatter plot of the relationship between the two variables of EngineSize and MPG.city.
- Explain your work, along with your screenshot