Competency

In this project, you will demonstrate your mastery of the following competency:

Apply statistical techniques to address research problems

Perform hypothesis testing to address an authentic problem

Overview

In this project, you will apply inference methods for means to test your hypotheses about the housing sales market for a region of the United States. You will use appropriate sampling and statistical methods.

Scenario

You have been hired by your regional real estate company to determine if your region’s housing prices and housing square footage are significantly different from those of the national market. The regional sales director has three questions that they want to see addressed in the report:

Are housing prices in your regional market higher than the national market average?

Is the square footage for homes in your region different than the average square footage for homes in the national market?

For your region, what is the range of values for the 95% confidence interval of square footage for homes in your market?

You are given a real estate data set that has houses listed for every county in the United States. In addition, you have been given national statistics and graphs that show the national averages for housing prices and square footage. Your job is to analyze the data, complete the statistical analyses, and provide a report to the regional sales director. You will do so by completing the Project Two Template located in the What to Submit area below.

Directions

Introduction

Purpose: What was the purpose of your analysis, and what is your approach?

Define a random sample and two hypotheses (means) to analyze.

Sample: Define your sample. Take a random sample of 100 observations for your region.

Describe what is included in your sample (i.e., states, region, years or months).

Questions and type of test: For your selected sample, define two hypothesis questions and the appropriate type of test hypothesis for each. Address the following for each hypothesis:

Describe the population parameter for the variable you are analyzing.

Describe your hypothesis in your own words.

Describe the inference test you will use.

Identify the test statistic.

Level of confidence: Discuss how you will use estimation and conference intervals to help you solve the problem.

1-Tail Test

Hypothesis: Define your hypothesis.

Define the population parameter.

Write null (Ho) and alternative (Ha) hypotheses.

Specify your significance level.

Data analysis: Analyze the data and confirm assumptions have not been violated to complete this hypothesis test.

Summarize your sample data using appropriate graphical displays and summary statistics.

Provide at least one histogram of your sample data.

In a table, provide summary statistics including sample size, mean, median, and standard deviation.

Summarize your sample data, describing the center, spread, and shape in comparison to the national information.

Check the conditions.

Determine if the normal condition has been met.

Determine if there are any other conditions that you should check and whether they have been met.

Hypothesis test calculations: Complete hypothesis test calculations, providing the appropriate statistics and graphs.

Calculate the hypothesis statistics.

Determine the appropriate test statistic (t).

Calculate the probability (p value).

Interpretation: Interpret your hypothesis test results using the p value method to reject or not reject the null hypothesis.

Relate the p value and significance level.

Make the correct decision (reject or fail to reject).

Provide a conclusion in the context of your hypothesis.

2-Tail Test

Hypotheses: Define your hypothesis.

Define the population parameter.

Write null and alternative hypotheses.

State your significance level.

Data analysis: Analyze the data and confirm assumptions have not been violated to complete this hypothesis test.

Summarize your sample data using appropriate graphical displays and summary statistics.

Provide at least one histogram of your sample data.

In a table, provide summary statistics including sample size, mean, median, and standard deviation.

Summarize your sample data, describing the center, spread, and shape in comparison to the national information.

Check the assumptions.

Determine if the normal condition has been met.

Determine if there are any other conditions that should be checked on and whether they have been met.

Hypothesis test calculations: Complete hypothesis test calculations, providing the appropriate statistics and graphs.

Calculate the hypothesis statistics.

Determine the appropriate test statistic (t).

Determine the probability (p value).

Interpretation: Interpret your hypothesis test results using the p value method to reject or not reject the null hypothesis.

Relate the p value and significance level.

Make the correct decision (reject or fail to reject).

Provide a conclusion in the context of your hypothesis.

Comparison of the test results: See Question 3 from the Scenario section.

Calculate a 95% confidence interval. Show or describe your method of calculation.

Interpret a 95% confidence interval.

Final Conclusions

Summarize your findings: Refer back to the Introduction section above and summarize your findings of the sample you selected.

Discuss: Discuss whether you were surprised by the findings. Why or why not?

What to Submit

To complete this project, you must submit the following:

Project Two Template: Use this template to structure your report, and submit the finished version as a Word document.

Supporting Materials

The following resources may help support your work on the project:

Data Set: House Listing Price by Region

Use this data for input in your project report.

Document: National Statistics and Graphs

Use this data for input in your project report.

## The B&K Real Estate Company sells homes and is currently serving the Southeast r

The B&K Real Estate Company sells homes and is currently serving the Southeast region. It has recently expanded to cover the Northeast states. The B&K realtors are excited to now cover the entire East Coast and are working to prepare their southern agents to expand their reach to the Northeast.

B&K has hired your company to analyze the Northeast home listing prices in order to give information to their agents about the mean listing price at 95% confidence. Your company offers three analysis packages: one based on a sample size of 100 listings, one based on 1,000 listings, and another based on a sample size of 4,000 listings. Because there is an additional cost for data collection, your company charges more for the package with 4,000 listings than for the package with 100 listings.

Bronze Package – Sample size of 100 listings:

95% confidence interval for the mean of the Northeast house listing price has a margin of error of $24,500

Cost for service to B&K: $2,000

Silver Package – Sample size of 1,000 listings:

95% confidence interval for the mean of the Northeast house listing price has a margin of error of $7,750

Cost for service to B&K: $10,000

Gold Package – Sample size of 4,000 listings:

95% confidence interval for the mean of the Northeast house listing price has a margin of error of $3,900

Cost for service to B&K: $25,000

The B&K management team does not understand the tradeoff between confidence level, sample size, and margin of error. B&K would like you to come back with your recommendation of the sample size that would provide the sales agents with the best understanding of northeast home prices at the lowest cost for service to B&K.

In other words, which option is preferable?

Spending more on data collection and having a smaller margin of error

Spending less on data collection and having a larger margin of error

Choosing an option somewhere in the middle

For your initial post:

Formulate a recommendation and write a confidence statement in the context of this scenario. For the purposes of writing your confidence statement, assume the sample mean house listing price is $310,000 for all packages. “I am [#] % confident the true mean . . . [in context].”

Explain the factors that went into your recommendation, including a discussion of the margin of error

For your response posts to your peers, choose two different confidence intervals for your responses. Do you think the agents would prefer a different confidence interval than their management? What advantages and disadvantages would there be in having different confidence intervals for the agents? Explain your thought process and reasoning in your response.

Directions

For each discussion, you must create one initial post and follow up with at least two response posts.

For your initial post, do the following:

Write a post of 1 to 2 paragraphs.

## Competencies In this project, you will demonstrate your mastery of the following

Competencies

In this project, you will demonstrate your mastery of the following competencies:

Apply statistical techniques to address research problems

Perform regression analysis to address an authentic problem

Overview

The purpose of this project is to have you complete all of the steps of a real-world linear regression research project starting with developing a research question, then completing a comprehensive statistical analysis, and ending with summarizing your research conclusions.

Scenario

You have been hired by the D. M. Pan National Real Estate Company to develop a model to predict housing prices for homes sold in 2019. The CEO of D. M. Pan wants to use this information to help their real estate agents better determine the use of square footage as a benchmark for listing prices on homes. Your task is to provide a report predicting the housing prices based square footage. To complete this task, use the provided real estate data set for all U.S. home sales as well as national descriptive statistics and graphs provided.

Directions

Using the Project One Template located in the What to Submit section, generate a report including your tables and graphs to determine if the square footage of a house is a good indicator for what the listing price should be. Reference the National Statistics and Graphs document for national comparisons and the Real Estate Data spreadsheet (both found in the Supporting Materials section) for your statistical analysis.

Note: Present your data in a clearly labeled table and using clearly labeled graphs.

Specifically, include the following in your report:

Introduction

Describe the report: Give a brief description of the purpose of your report.

Define the question your report is trying to answer.

Explain when using linear regression is most appropriate.

When using linear regression, what would you expect the scatterplot to look like?

Explain the difference between response and predictor variables in a linear regression to justify the selection of variables.

Data Collection

Sampling the data: Select a random sample of 50 houses.

Identify your response and predictor variables.

Scatterplot: Create a scatterplot of your response and predictor variables to ensure they are appropriate for developing a linear model.

Data Analysis

Histogram: For your two variables, create histograms.

Summary statistics: For your two variables, create a table to show the mean, median, and standard deviation.

Interpret the graphs and statistics:

Based on your graphs and sample statistics, interpret the center, spread, shape, and any unusual characteristic (outliers, gaps, etc.) for the two variables.

Compare and contrast the shape, center, spread, and any unusual characteristic for your sample of house sales with the national population. Is your sample representative of national housing market sales?

Develop Your Regression Model

Scatterplot: Provide a graph of the scatterplot of the data with a line of best fit.

Explain if a regression model is appropriate to develop based on your scatterplot.

Discuss associations: Based on the scatterplot, discuss the association (direction, strength, form) in the context of your model.

Identify any possible outliers or influential points and discuss their effect on the correlation.

Discuss keeping or removing outlier data points and what impact your decision would have on your model.

Find r: Find the correlation coefficient (r).

Explain how the r value you calculated supports what you noticed in your scatterplot.

Determine the Line of Best Fit. Clearly define your variables. Find and interpret the regression equation. Assess the strength of the model.

Regression equation: Write the regression equation (i.e., line of best fit) and clearly define your variables.

Interpret regression equation: Interpret the slope and intercept in context.

Strength of the equation: Provide and interpret R-squared.

Determine the strength of the linear regression equation you developed.

Use regression equation to make predictions: Use your regression equation to predict how much you should list your home for based on the square footage of your home.

Conclusions

Summarize findings: In one paragraph, summarize your findings in clear and concise plain language for the CEO to understand. Summarize your results.

Did you see the results you expected, or was anything different from your expectations or experiences?

What changes could support different results, or help to solve a different problem?

Provide at least one question that would be interesting for follow-up research.

What to Submit

To complete this project, you must submit the following:

Project One Template: Use this template to structure your report, and submit the finished version as a Word document.

## Overview The Woodmill Company makes windows and door trim products. The first st

Overview

The Woodmill Company makes windows and door trim products. The first step in the process is to rip dimension (2 × 8,2 × 10, etc.) lumber into narrower pieces. Currently, the company uses a manual process in which an experienced operator quickly looks at a board and determines what rip widths to use. The decision is based on the knots and defects in the wood.

A company in Oregon has developed an optical scanner that can be used to determine the rip widths. The scanner is programmed to recognize defects and to determine rip widths that will optimize the value of the board. A test run of 100 boards was put through the scanner and the rip widths were identified. However, the boards were not actually ripped. A lumber grader determined the resulting values for each of the 100 boards, assuming that the rips determined by the scanner had been made. Next, the same 100 boards were manually ripped using the normal process. The grader then determined the value for each board after the manual rip process was completed. The resulting data, in the file, Woodmill Data, consists of manual rip values and scanner rip values for each of the 100 boards.

Instructions

You are a process manager at the Woodmill Company tasked with determining if an optical scanner would be beneficial. Write a 4–5 page report to your supervisor (including a cover page and a Source List page) in which you:

Summarize the Woodmill Company’s problem of ripping dimension lumber into narrower pieces.

Develop a frequency distribution for the board values for the scanner and the manual process.

Generate appropriate descriptive statistics for both manual and scanner values.

Analyze the frequency distribution and descriptive statistics for both manual and scanner processes. Use Excel to create your charts.

Determine which process generates more values that were more than 2 standard deviations from the mean (manual or scanner)

## Competency Describe the data using the measures of central tendency and measures

Competency

Describe the data using the measures of central tendency and measures of variability.

Student Success Criteria

View the grading rubric for this deliverable by selecting the “This item is graded with a rubric” link, which is located in the Details & Information pane.

Instructions

Scenario (information repeated for deliverable 01, 03, and 04)

A major client of your company is interested in the salary distributions of jobs in the state of Minnesota that range from $30,000 to $200,000 per year. As a Business Analyst, your boss asks you to research and analyze the salary distributions. You are given a spreadsheet that contains the following information:

A listing of the jobs by title

The salary (in dollars) for each job

Deliverable 1 – Descriptive Statistics.xlsx

The client needs the preliminary findings by the end of the day. Your boss asks you to first compute some basic statistics and then analyze the results in the four questions given in the Excel spreadsheet.

Background information on the Data

The data set in the spreadsheet consists of 364 records that you will be analyzing from the Bureau of Labor Statistics. The data set contains a listing of several jobs titles with yearly salaries ranging from approximately $30,000 to $200,000 for the state of Minnesota.

What to Submit

Your boss wants you to submit the spreadsheet with the completed calculations, answers, and analysis.

## Competency Apply the normal distribution, standard normal distribution, and cent

Competency

Apply the normal distribution, standard normal distribution, and central limit theorem.

Student Success Criteria

View the grading rubric for this deliverable by selecting the “This item is graded with a rubric” link, which is located in the Details & Information pane.

Scenario

Frank has only had a brief introduction to statistics when he was in high school 12 years ago, and that did not cover inferential statistics. He is not confident in his ability to answer some of the problems posed in the course.

As Frank’s tutor, you need to provide Frank with guidance and instruction on a spreadsheet he has partially filled out. Your job is to help him understand and comprehend the material. You should not simply be providing him with an answer as this will not help when it comes time to take the test. Instead, you will be providing a step-by-step breakdown of the problems including an explanation on why you did each step and using proper terminology.

What to Submit

To complete this assignment, you must first download the spreadsheet, and then complete it by including the following items on the spreadsheet:

Deliverable 2 – Tutoring on the Normal Distribution.xlsx

Incorrect Answers – Correct any wrong answers. You must also explain the error performed in the problem in your own words.

Partially Finished Work – Complete any partially completed work. Make sure to provide step-by-step instructions including explanations.

Blank Questions – Show how to complete any blank questions by providing step-by-step instructions including explanations.

Your step-by-step breakdown of the problems, including explanations and calculations performed, should be present within the Excel spreadsheet provided.

## Assignment Content Competency Develop a confidence interval for a population par

Assignment Content

Competency

Develop a confidence interval for a population parameter.

Student Success Criteria

View the grading rubric for this deliverable by selecting the “This item is graded with a rubric” link, which is located in the Details & Information pane.

Instructions

Scenario (information repeated for deliverable 01, 03, and 04)

A major client of your company is interested in the salary distributions of jobs in the state of Minnesota that range from $30,000 to $200,000 per year. As a Business Analyst, your boss asks you to research and analyze the salary distributions. You are given a spreadsheet that contains the following information:

A listing of the jobs by title

The salary (in dollars) for each job

Deliverable 3 – Confidence Intervals.xlsx

You have previously explained some of the basic statistics to your client already, and he really liked your work. Now he wants you to analyze the confidence intervals.

Background information on the Data

The data set in the spreadsheet consists of 364 records that you will be analyzing from the Bureau of Labor Statistics. The data set contains a listing of several jobs titles with yearly salaries ranging from approximately $30,000 to $200,000 for the state of Minnesota.

What to Submit

Your boss wants you to submit the spreadsheet with the completed calculations, answers, and analysis.

## Assignment Content Competency Evaluate hypothesis tests for population parameter

Assignment Content

Competency

Evaluate hypothesis tests for population parameters from one population.

Student Success Criteria

View the grading rubric for this deliverable by selecting the “This item is graded with a rubric” link, which is located in the Details & Information pane.

Instructions

Scenario (information repeated for deliverable 01, 03, and 04)

A major client of your company is interested in the salary distributions of jobs in the state of Minnesota that range from $30,000 to $200,000 per year. As a Business Analyst, your boss asks you to research and analyze the salary distributions. You are given a spreadsheet that contains the following information:

A listing of the jobs by title

The salary (in dollars) for each job

Deliverable 4 – Hypothesis Tests.xlsx

In prior engagements, you have already explained to your client about the basic statistics and discussed the importance of constructing confidence intervals for the population mean. Your client says that he remembers a little bit about hypothesis testing, but he is a little fuzzy. He asks you to give him the full explanation of all steps in hypothesis testing and wants your conclusion about two claims concerning the average salary for all jobs in the state of Minnesota.

Background information on the Data

The data set in the spreadsheet consists of 364 records that you will be analyzing from the Bureau of Labor Statistics. The data set contains a listing of several jobs titles with yearly salaries ranging from approximately $30,000 to $200,000 for the state of Minnesota.

What to Submit

Your boss wants you to submit the spreadsheet with the completed calculations, answers, and analysis.

## Assignment Content Competency Evaluate hypothesis tests for population parameter

Assignment Content

Competency

Evaluate hypothesis tests for population parameters from two populations.

Dealing with Two Populations

Inferential statistics involves forming conclusions about a population parameter. We do so by constructing confidence intervals and testing claims about a population mean and other statistics. Typically, these methods deal with a sample from one population. We can extend the methods to situations involving two populations (and there are many such applications). This deliverable looks at two scenarios.

Concept being Studied

Your focus is on hypothesis tests and confidence intervals for two populations using two samples, some of which are independent and some of which are dependent. These concepts are an extension of hypothesis testing and confidence intervals which use statistics from one sample to make conclusions about population parameters.

Student Success Criteria

View the grading rubric for this deliverable by selecting the “This item is graded with a rubric” link, which is located in the Details & Information pane.

What to Submit

Your research and analysis should be presented on the spreadsheet provided.

## Assignment Content Competency Determine the linear correlation and regression eq

Assignment Content

Competency

Determine the linear correlation and regression equation between two variables to make predictions for the dependent variable.

Student Success Criteria

View the grading rubric for this deliverable by selecting the “This item is graded with a rubric” link, which is located in the Details & Information pane.

Scenario

According to the U.S. Geological Survey (USGS), the probability of a magnitude 6.7 or greater earthquake in the Greater Bay Area is 63%, about 2 out of 3, in the next 30 years. In April 2008, scientists and engineers released a new earthquake forecast for the State of California called the Uniform California Earthquake Rupture Forecast (UCERF).

As a junior analyst at the USGS, you are tasked to determine whether there is sufficient evidence to support the claim of a linear correlation between the magnitudes and depths from the earthquakes. Your deliverables will be a PowerPoint presentation you will create summarizing your findings and an excel document to show your work.

Concepts Being Studied

Correlation and regression

Creating scatterplots

Constructing and interpreting a Hypothesis Test for Correlation using r as the test statistic

You are given a spreadsheet that contains the following information:

Magnitude measured on the Richter scale

Depth in km

Deliverable 6 – Analysis with Correlation and Regression.xlsx

Using the spreadsheet, you will answer the problems below in a PowerPoint presentation.

What to Submit

The PowerPoint presentation should answer and explain the following questions based on the spreadsheet provided above.

Slide 1: Title slide

Slide 2: Introduce your scenario and data set including the variables provided.

Slide 3: Construct a scatterplot of the two variables provided in the spreadsheet. Include a description of what you see in the scatterplot.

Slide 4: Find the value of the linear correlation coefficient r and the critical value of r using α = 0.05. Include an explanation on how you found those values.

Slide 5: Determine whether there is sufficient evidence to support the claim of a linear correlation between the magnitudes and the depths from the earthquakes. Explain.

Slide 6: Find the regression equation. Let the predictor (x) variable be the magnitude. Identify the slope and the y-intercept within your regression equation.

Slide 7: Is the equation a good model? Explain. What would be the best predicted depth of an earthquake with a magnitude of 2.0? Include the correct units.

Slide 8: Conclude by recapping your ideas by summarizing the information presented in context of the scenario.

Along with your PowerPoint presentation, you should include your Excel document which shows all calculations.