
Get Help From World's No.1 Online Tutoring Company

Get Online Tutoring through WhatsApp

Question & Answers

STA2300 Assignment 1 Data Analysis

Assignment 1

Due Date:                            03 September, 2020

Weighting:                         15%

Full Marks:                          100 (final marks to be converted to 15%)

  • Answering the questions in this assignment should not be your first attempt at these types of It is essential that you work through practice exercises from the tutorial sheets in the Study Book and/or Text Book first.
  • This assignment is important in checking your knowledge, providing feedback and helping to establish competency in essential
  • Answer all the questions. The questions are not of equal weight; some questions are worth much more than the
  • The questions relate to materials in Modules 1 to
  • Before starting this assignment read Notes Concerning Assignments under the Introductory Material link in the ‘Getting started’ tab on the StudyDesk.
  • When you are asked to comment on a finding, usually a short paragraph is all that is
  • Do not copy/paste SPSS output into your assignment unless specifically asked to do so. In many cases the SPSS output contains much more information than is required for a correct and complete answer. In those cases just reproducing the output may not attract any marks. Make sure you report only the information from the SPSS output relevant to your
  • In order to obtain full marks for any question you must show all working. No working, no
  • Convert your word document to pdf before submitting your assignment via the link on the StudyDesk. See the Introductory Material (Section 5, Assignments) for information about how to do this
  • This assignment consists of 6
  • It is vitally important that you understand USQ policies and procedures, in particular those related to communication, assessment, academic integrity and plagiarism. Details are under the Assessment link on the
  • You will need to download data set sav from the StudyDesk of the course. Detailed information on the variables in the data set is found in Body.txt file accessible from the StudyDesk.

Question 1 (17 marks)

This question uses information from the data file Body.sav found under the Assessment tab on the StudyDesk (also see Body.txt for more details about the source and the variables reported in the dataset). Make sure the Variable View in SPSS is setup properly with all ‘labels’ correctly defined (with units), all ‘values’ assigned correctly for categorical variables and the correct ‘measure’ selected for all variables.

The participants in the Body.sav dataset were taken randomly from dozens of California health and fitness clubs and the measurements were taken by technicians under the supervision of one of the researchers.

Use SPSS to find the answers to the following questions, but do not copy and paste SPSS output into your answers for parts (c) and (d).

  • (5 marks) Using SPSS produce an appropriate graph to display the distribution of the body height of the In the graph, label the axes correctly, include units of measure and provide an appropriate title which includes your name.
  • (4 marks) Using the graph produced in part (a) only (don’t refer to SPSS summary statistics), describe in no more than 60 words, the distribution of the height of respondents. Include comments on shape, centre and spread of the distribution and the existence of outliers and/or gaps, if any. Do not perform any calculations; use the graph
  • (2 marks) What are the mean and standard deviation of the distribution of the height of respondents? (You can use SPSS to calculate the descriptive statistics but do not copy/paste SPSS output).
  • (3 marks) Using SPSS find the median and IQR of the distribution of the height of (Do not copy/paste SPSS output).
  • (3 marks) For the distribution of the height of respondents, which statistics are appropriate to measure its centre and spread? Give a reasonable explanation for your

Question 2 (15 marks)

A little fertilizer at or near planting time can help jumpstart wheat toward a successful crop, but producers have to be careful to apply it correctly, said an associate professor of agronomy at Kansas State University, USA. In general, wheat is considered a highly responsive crop to starter fertilizers, particularly phosphorus and nitrogen, he said. When applying a starter fertilizer for wheat, application methods and rates are much more flexible with phosphorus than nitrogen.

An agricultural scientist at USQ conducted a study on the effect of the amount of starter fertilizers on the yield of wheat. Of the 24 plots of one-acre sized homogeneous land available for the study, 8 were randomly assigned to fertilizer A (30kg of phosphorus), another 8 were randomly assigned to fertilizer B (20kg of phosphorus), and the remaining 8 were assigned to fertilizer C (10kg of phosphorus). At the end of the farming season, yields of wheat were measured from all the plots in metric tons.

  • (2 marks) Is this an experimental or observational study? In less than 50 words clearly explain your choice based on the extract given
  • (6 marks) For the above study identify, if appropriate, the
  1. response variable(s),
  2. factor(s) and treatments, and
  • experimental unit and how many of these there
  • (4 marks) Are the four principles of experimental design used in this study? Explain each of them, in the context of the
  • (3 marks) Explain explicitly what a confounding variable Identify one plausible confounding variable in this study and explain why it is a confounding variable.

Question 3          (14 marks)

The data set Body.sav contains information that was collected as part of a comprehensive study on 247 male and 260 female respondents.

A researcher is interested to know if the Age Category of the respondents is associated with Gender.

  • (4 marks) Use SPSS to produce a contingency table displaying the relationship between the Age Category and Gender of the respondents (you should use SPSS to produce this contingency table). The title for this table should reflect its (Note that by convention, a table title should appear above the table). Include your name in the title.
  • (2 marks) What proportion of respondents are in the age category of `20 years and under’ and were female?
  • (2 marks) Of the male respondents, what proportion are in the age category of ’20 years and under’?
  • (6 marks) Does there appear to be an association between the Age Category and Gender? Explain in less than 100 words, using a numerical example from a conditional distribution table (produced by SPSS) to support your

Question 4          (12 marks)

A local farmer produces high quality rockmelons to supply in the supermarket. From long time experience the farmer knows that the weights of the rockmelons are normally distributed with a mean of 2.9kg and a standard deviation of 0.45kg. Answer the following questions based on the above information:

  • (2 marks) Identify the variable of interest and the unit of measurement of this
  • (3 marks) Based on this distribution, what percentage of the rockmelons will weigh more than 5kg?
  • (4 marks) Based on this distribution, what percentage of rockmelons will weigh between 9kg and 3.9kg?
  • (3 marks) Based on this distribution, what is the weight that 99% of the rockmelons will exceed?

Question 5       (24 marks)

Consider the data in the file Body.sav again. The sports health specialist thinks that there is a relationship between the shoulder girth and forearm (extended) of the respondents. The specialist is determined to examine the relationship.

  • (2 marks) What are the two variables the sports health specialist will need to include in the analysis? What type of variables are they and what are their units of measurement?
  • (4 marks) Use an appropriate graph to display the relationship between the two variables identified in part (a). Label the axes correctly, include units of measurement and provide an appropriate title which includes your name.
  • (4 marks) From the graph in part (b), describe (in no more than 30 words) the form, direction and scatter of this relationship, and comment on the existence of any outliers.
  • (4 marks) Calculate an appropriate statistic to measure the strength and direction of the relationship between the two variables you identified in part (a). Justify your choice of this statistic and interpret what it tells you about the
  • (6 marks) Use SPSS output to write the equation of the regression line which could be used to predict the shoulder from the forearm of the respondents, and then plot the regression line on the graph produced in part (b).
  • (3 marks) Using the regression equation from part (e), predict the shoulder girth of a respondent whose forearm is 35cm. Would you consider this to be an accurate prediction? Why or why not?
  • (1 mark) What proportion of the variability in the shoulder has been explained by the forearm?

Question 6       (18 marks)

The COVID-19 virus has become a real threat to many human lives. It is particularly life threatening to people over the age of 70 years. Research has shown that this virus kills 15% of COVID-19 infected people over the age of 70 years in nursing homes. To study the number of deaths among the COVID- 19 infected people over the age of 70 years, an epidemiologist selects a random sample of 20 residents of age over 70 years who are infected with COVID-19 from nursing homes.

Based on the above information answer the following questions:

  • (1 mark) What is the variable of interest here?
  • (3 marks) What is an appropriate model to represent the distribution of the variable of interest? Identify the parameters of the model and state the values of these parameters in this
  • (4 marks) Discuss how the conditions of the appropriate model are satisfied in the current

study. Explain the conditions in the context of the problem.

  • (2 marks) Find the mean and standard deviation of the variable using the parameters of the
  • (3 marks) Find the probability that less than 3 of the 20 randomly selected over 70 years residents infected with COVID-19 will die.
  • (5 marks) In a random sample of 150 residents of age over 70 years who are infected with COVID-19 from the nursing homes, determine the probability that 30 or more will die. State and check relevant assumptions, conditions or rules of thumb that should be considered before performing the calculations to determine this

Expert's Answer

For Viewing Complete Solution

Chat with our Experts

Want to contact us directly? No Problem. We are always here for you


Online Tutoring Services


Orders Delivered


5 Star Rating


PhD Experts


Amazing Features

Plagiarism Free

Top Quality

Best Price

On-Time Delivery

100% Money Back

24 x 7 Support

Ask a New Question


Connect on WHATSAPP: +61-416-195006, Uninterrupted Access 24x7, 100% Confidential