AFE134 Business Statistics
Computer Online Tutoring Problems
Answer 1
- Questionnaires method of surveying shall be used by the researcher to collect data about the negative impacts of television. This questionnaire would include both open and close ended question. Use of these questionnaires would reduce the level of personal interaction between the respondents and data collector, which would increase the speed of data collection. Further, this increase in speed would not only decrease the time taken in collection of statistics but also increase the number of people that can be covered in the survey. Thus it is beneficial for the research work that the researcher uses questionnaire method of surveying to collect data.
- The researcher can use stratified sampling. There are several groups of peoples, and stratified sampling would consider the different characteristics of people. For example in this data collection the information may differ substantially between the villagers and city dwellers and stratified sampling would consider it. Similarly there are several variables which are expected to substantially impact the distress from television viewing and only stratified sampling would ensure that all be considered appropriately.
- The categorical variables would include being a TV viewer, watch violent shows, obese, financial problems, impacted by commercials and increase in shopping. The data types to these categorical variables are ‘yes’ or ‘no’. The numerical variables would include the number of hours that a family watches the amount of monthly expenditure, debt on them and the weight of the people in the family. The data types would be relative numerical values corresponding to the variables.
- The basic issue in the data collection is straying off from the basic purpose of the data collection. Population has many variants to be judged on and therefore the researcher, while collecting data, may get puzzled over which variant to additionally include and which not to. Further, the researcher may also feel difficulty in identifying the appropriate sample after stratification.
Answer 2
- Generally it is substantially enough if the number of classes is kept between 6-15. This ensures that the data being considered stays substantially manageable. Considering this the researcher would ensure the same. Presently the researcher has collected information about 395 families thus it would be beneficial if the number of classes be kept 12. The maximum hours of television watched by the families are 57 hours and the minimum hours are 6 which can be both rounded off to 1 and 60. On this basis the frequency density would be 5. And the distribution on the basis of the no. of hours would have classes 0 – 5, 6 – 10, 11 – 15, 16 – 20, 21 – 25, 26 – 30, 31 – 35, 36 – 40, 41 – 45, 46 – 50, 51 – 55, and 56 – 60. Similarly the maximum debt of the families is 277,234 while the minimum debt is 20,516. These can be successfully rounded off to 0 and 300,000. And now to keep twelve classes, the frequency density comes to 25,000. Thus the distribution on the basis of debt would have classes 0 – 25000, 25001 – 50000, 50001 – 75000, 75001 – 100000, 100001 – 125000, 125001 – 150000, 150001 – 175000, 175001 – 200000, 200001 – 225000, 225001 – 250000, 250001 – 275000, 275000 – 300000. Thus the two distributions can be presented as:
Distribution on the basis of no. of hours | |
Class | Frequency |
0-5 | 0 |
6-10 | 6 |
11-15 | 22 |
16-20 | 37 |
21-25 | 64 |
26-30 | 75 |
31-35 | 66 |
36-40 | 57 |
41-45 | 43 |
46-50 | 17 |
51-55 | 6 |
56-60 | 2 |
TOTAL | 395 |
Distribution on the basis of debt | |
Class | Frequency |
0-25000 | 3 |
25001-50000 | 15 |
50001-75000 | 31 |
75001-100000 | 70 |
100001-125000 | 74 |
125001-150000 | 97 |
150001-175000 | 48 |
175001-200000 | 32 |
200001-225000 | 21 |
225001-250000 | 3 |
250001-275000 | 0 |
275001-300000 | 1 |
TOTAL | 395 |
- Histogram on the basis of TV hours
Histogram on the basis of debt on family
Shape of the two distributions
The distribution of the hours of time people spend on Television is a symmetrical distribution with 26-30 as its base. As the distribution moves towards the base, the frequency increases and as it moves away from the face, the frequency decreases. But, since the frequency distribution is not just the same on both sides, it is not perfectly symmetrical. However, if the distribution of the level of debt on the family’s be analyzed, initially the distribution is positively skewed and then when it crosses the middle point it is negatively skewed.
- The plot to investigate the relationship would have the no. of hours the family watches television on the x-axis and the amount of debt on the y-axis.
Distribution on the basis of average | |
Television | Debt |
0-5 | 0 |
6-10 | 93355.33 |
11-15 | 93964.36 |
16-20 | 89390.81 |
21-25 | 109489.8 |
26-30 | 112640.1 |
31-35 | 132680.3 |
36-40 | 150596.6 |
41-45 | 155903.7 |
46-50 | 168411.8 |
51-55 | 205144.8 |
56-60 | 213694 |
The linear trend line is up and across the y-intercept by | ||
up from | = | 225000-50000 |
= | 175000 | |
across | = | 60-5 |
= | 55 | |
Equation of the trend line is y = ɑ + βx | ||
ɑ | = | y-intercept |
= | 50,000 | |
β | = | gradient |
= | up/across | |
= | 175000/55 | |
= | 3181.818 | |
Thus the equation of the trend line is : y = 50,000 + 3181.82 x | ||
And the coefficient of determination: β = 3181. 82 | ||
The X-axis represents the independent variable and the y-axis represents the dependent variable. And in the distribution, the amount of debt on a family is dependent on the no. of hours that the family watches television. Thus here the amount of debt is a dependent variable while the no. of hours spent on television is independent variable and therefore no. of hours has been taken on x-axis and the amount of debt on family is taken on y-axis.
Answer 3
- Summary Measures
Distribution on the basis of no. of hours | |||||
Class | Frequency, f | Middle value(x) | f*x | x-mean | (x-mean)² |
0-5 | 0 | 2.5 | 0 | -27.42 | 751.73 |
6-10 | 6 | 7.5 | 45 | -22.42 | 502.55 |
11-15 | 22 | 12.5 | 275 | -17.42 | 303.38 |
16-20 | 37 | 17.5 | 647.5 | -12.42 | 154.20 |
21-25 | 64 | 22.5 | 1440 | -7.42 | 55.02 |
26-30 | 75 | 27.5 | 2062.5 | -2.42 | 5.85 |
31-35 | 66 | 32.5 | 2145 | 2.58 | 6.67 |
36-40 | 57 | 37.5 | 2137.5 | 7.58 | 57.49 |
41-45 | 43 | 42.5 | 1827.5 | 12.58 | 158.31 |
46-50 | 17 | 47.5 | 807.5 | 17.58 | 309.14 |
51-55 | 6 | 52.5 | 315 | 22.58 | 509.96 |
56-60 | 2 | 57.5 | 115 | 27.58 | 760.78 |
TOTAL | 395 | 11817.5 | 3575.08 |
Mean | = | ξf*x/n |
= | 11817.5/395 | |
= | 29.92 | |
Median | = | Middle Value |
= | no. with maximum frequency | |
= | Middle Value of class (26-30) | |
= | 27.5 | |
Smallest Value | = | 0 |
Largest Value | = | 60 |
Range | = | Largest value – Smallest value |
= | 60-0 | |
= | 60 | |
Q1 | = | n/4th value |
= | 395/4th value | |
= | 98.75th value | |
= | 98.75th value is in 21-25 class | |
= | 22.5 | |
Q2 | = | n/2th value |
= | 395/2th value | |
= | 197.5th value | |
= | 197.5th value is in 26-30 class | |
= | 27.5 | |
Q3 | = | 3n/4th value |
= | 3*395/4th value | |
= | 1185/4th value | |
= | 296.25th value | |
= | 296.25th value is in 36-40 class | |
= | 37.5 | |
Interquartile Range | = | Q3-Q1 |
= | 37.5-22.5 | |
= | 15 | |
Variance | = | (ξ(x-mean)²)/n |
= | 3575.08/395 | |
= | 9.05 | |
Standard Deviation | = | √Variance |
= | √9.05 | |
= | 3.01 |
Distribution on the basis of debt | |||||
Class | Frequency | Middle value(x) | f*x | x-mean | (x-mean)² |
0-25000 | 3 | 12500 | 37500 | -110430.38 | 12194868771.03 |
25001-50000 | 15 | 37500 | 562500 | -85430.38 | 7298349783.69 |
50001-75000 | 31 | 62500 | 1937500 | -60430.38 | 3651830796.35 |
75001-100000 | 70 | 77500 | 5425000 | -45430.38 | 2063919403.94 |
100001-125000 | 74 | 112500 | 8325000 | -10430.38 | 108792821.66 |
125001-150000 | 97 | 137500 | 13337500 | 14569.62 | 212273834.32 |
150001-175000 | 48 | 162500 | 7800000 | 39569.62 | 1565754846.98 |
175001-200000 | 32 | 177500 | 5680000 | 54569.62 | 2977843454.57 |
200001-225000 | 21 | 212500 | 4462500 | 89569.62 | 8022716872.30 |
225001-250000 | 3 | 237500 | 712500 | 114569.62 | 13126197884.95 |
250001-275000 | 0 | 262500 | 0 | 139569.62 | 19479678897.61 |
275001-300000 | 1 | 277500 | 277500 | 154569.62 | 23891767505.21 |
TOTAL | 395 | 48557500 | 94593994872.62 |
Mean | = | ξf*x/n |
= | 48557500/395 | |
= | 122930.38 | |
Median | = | Middle Value |
= | no. with maximum frequency | |
= | Middle Value of class (125001-150000) | |
= | 137500 | |
Smallest Value | = | 0 |
Largest Value | = | 300000 |
Range | = | Largest value – Smallest value |
= | 300000-0 | |
= | 300000 | |
Q1 | = | n/4th value |
= | 395/4th value | |
= | 98.75th value | |
= | 98.75th value is in 75001-100000 class | |
= | 77500 | |
Q2 | = | n/2th value |
= | 395/2th value | |
= | 197.5th value | |
= | 197.5th value is in 125001-150000 class | |
= | 137500 | |
Q3 | = | 3n/4th value |
= | 3*395/4th value | |
= | 1185/4th value | |
= | 296.25th value | |
= | 296.25th value is in 150001-175000 class | |
= | 162500 | |
Interquartile Range | = | Q3-Q1 |
= | 162500-77500 | |
= | 85000 | |
Variance | = | (ξ(x-mean)²)/n |
= | 94593994873/395 | |
= | 239478468.03 | |
Standard Deviation | = | √Variance |
= | √239478468.03 | |
= | 15475.09 |
- Box Plot
Distribution on the basis of | ||
no. of hours | amount of debt | |
Q1 | 22.5 | 77500 |
Q2 | 27.5 | 137500 |
Q3 | 37.5 | 162500 |
Smallest Value | 0 | 0 |
Largest Value | 60 | 300000 |
Answer 4
- The dependent variable in the distribution is the amount of debt on family and the independent variable is the number of hours the family spends on television. It is because the amount of debt is dependent upon the number of hours the family watches television. This is the basis of our analysis.
- Regression model is the line of best fit which has been recognized in the plot provided above. It is y = 50,000 + 3181.82 x
In the equation the value 50,000 represents the y-intercept. The y-intercept is where the graph crosses the y-axis. And in the present relationship it crosses the y-axis at 50,000 thus the y-intercept is 50,000. The value 3181.82 is the gradient. It is computed by as how many units up the line go for every one unit across. And in this relationship the line goes 175,000 for 55 units which make 3181.82 units up for one unit across. Thus the gradient or the beta of the line is 3181.82
- The coefficient of determination here is the beta value. It is the measure of how much the dependent variable increase on the basis of one unit increase in independent variable. The r square value for the relationship would therefore be square of 3181.82, i.e. 10123978.51
Answer 5
(a) | Probability for any distribution is equal to one. |
Thus the probability of the number of stores mall, customers actually enter would also sum up to one. | |
Therefore, | |
0.04 + 0.16 + 0.22 + 0.28 + 2k + 0.09 + k = 1 | |
0.79 + 3k = 1 | |
3k = 1 – 0.79 | |
3k = 0.21 | |
k = 0.21/3 | |
k = 0.07 |
Updating probability distribution table | |||
Original | Now | ||
X | p(x) | p(x) | |
0 | 0.04 | 0.04 | |
1 | 0.16 | 0.16 | |
2 | 0.22 | 0.22 | |
3 | 0.28 | 0.28 | |
4 | 2k | =2×0.07= | 0.14 |
5 | 0.09 | 0.09 | |
6 | k | =0.07= | 0.07 |
(b) | X | p(x) | X*p(x) |
0 | 0.04 | 0 | |
1 | 0.16 | 0.16 | |
2 | 0.22 | 0.44 | |
3 | 0.28 | 0.84 | |
4 | 0.14 | 0.56 | |
5 | 0.09 | 0.45 | |
6 | 0.07 | 0.42 | |
TOTAL | 2.87 | ||
Mean | = | ΞX*p(x) | |
= | 2.87 |
(c ) | X | p(x) | X-Mean | (X-Mean)² | (X-Mean)²*p(x) |
0 | 0.04 | -2.87 | 8.2369 | 0.329476 | |
1 | 0.16 | -1.87 | 3.4969 | 0.559504 | |
2 | 0.22 | -0.87 | 0.7569 | 0.166518 | |
3 | 0.28 | 0.13 | 0.0169 | 0.004732 | |
4 | 0.14 | 1.13 | 1.2769 | 0.178766 | |
5 | 0.09 | 2.13 | 4.5369 | 0.408321 | |
6 | 0.07 | 3.13 | 9.7969 | 0.685783 | |
TOTAL | 2.3331 | ||||
Variance | = | ξ((X-Mean)²*p(x)) | |||
= | 2.3331 | ||||
Standard Deviation |
= | √Variance | |||
= | √2.3331 | ||||
= | 1.527449 | ||||