1. This is a question about systematic variance: how much of the total variance is systematically related to the factor in which you're interested (extroversion vs. introversion). You should think: "How much of the variance in drinking is due to this one personality factor (systematic variance), and how much of the variance is due to the millions of ways that people can differ from one another (error variance)?"
To answer the question, you first need to compute the total variance. This tells you how much variance there is in all of these numbers.
You then need to compute the variance for extroverts only. This tells you how much these people differ from one another, holding one personality factor (extroversion) constant. Remember, all of the ways that people differ from each other contributes to the error variance, so the variance of the extroversion group alone will partially compose the error variance.
You then need to compute the variance for introverts only. This tells you how much these people differ from one another, holding one personality factor (introversion) constant. The variance of this group, computed separately from the extroversion group, will also give you part of the error variance.
Here are all the descriptive stats so you can check your intermediate steps:
Total Data Set (not broken down into two separate groups)
Extroverts
Introverts
Now, from all of those numbers you can get:
You need to combine the variance for extrovers with the variance of introverts. You can average these numbers together because the two groups are the same size. The average is 1.68, which is your ERROR VARIANCE.
Subtracting the error variance from the total variance gives you 0.22, which is the SYSTEMATIC VARIANCE.
To finish things off, you calculate R2, which is the systematic variance divided by the total variance: 0.22/1.90 = 0.11. If you multiply this by 100 you get the percentage of the variance that is systematic (11%). Using Cohen's criteria, this would be between a medium and large effect. Thus, knowing whether someone is an extrovert or an introvert allows you to explain 11% of the variability in drinking. Compare this to all of the other things that could cause variability in drinking (e.g., tolerance to alcohol, religious beliefs, age, etc.). When you say that one personality factor (extroversion/introversion) accounts for 11% of the variability, that's not bad given all of the other things that can cause variability.
2. Finally, the presence of two naturally occurring, continuous variables that you don't manipulate should scream "correlate me, baby."
Start with your scatterplot:
Here are the descriptive stats for each of the two groups. The bottom row has the sum of the column above it (in blue).
Salary (X) Teach. Rating (Y) Product (X*Y) Salary2 Teach. Rating2 32.5 6 195 1056 36 44.0 5 220 1936 25 51.0 1 51 2601 1 38.0 8 304 1444 64 39.4 8 315.2 1552 64 36.0 5 180 1296 25 42.1 6 252.6 1772 36 35.9 7 251.3 1289 49 319 46 1769 12947 300<--these are the sums
Using the equation from class should give you the following for the numerator:
1769-((319*46)/8) --> first multiply 319 and 46; then divide by 8; then subtract this number from 1769.
This reduces to: -65.25 (which indicates that you'll get a negative correlation).
CAUTION: watch those negative signs. They'll kill you if you're not careful.
For the denominator you should get:
Take the square root of: [(12947)-(319 squared/8)]*[(300)-(46 squared/8)]
This gives you the square root of 8054, which is 89.74 (roughly). This will be your denominator. NOTE: You may get slightly different values depending on how you rounded. The main thing is that you're in the ballpark of the values that you see here.
HINT: The denominator will always be positive because it is a measure of variance and you have to take the square root of some number.
From all of this you should get a correlation coefficient of -65.25/89.74 = -.73.
How do you interpret this? You can say that there is a negative relationship and that it's a fairly strong relationship. If you want to describe the relationship in terms of variance, square the correlation coefficient (r) to get R2: .53 (53%). This tells you that you can explain 53% of the variance in teaching performance by knowing the salary. You can then use Cohen's criteria to know whether this is a small, medium, or large effect. (It's a whopper of an effect.)
Of course, there may be problems with these data. First, look at your scatterplot and determine if there are any outliers and if they are "on-line" or "off-line" outliers. If you suspect any point, it should be the prof who makes $51,000 and has a teaching rating of 1. This person represents an on-line outlier and might be inflating the correlation coefficient. (Notice that the rest of the data don't show such a strong linear trend--so this one person might be driving the whole correlation.)
4. What does knowing years at UI tell you? This is a third variable that may be causing the correlation between salary and teaching rating. You could do a partial correlation in which you remove the effects of this third variable (caloric intake). Then you could re-examine your R2 values to see if the amount of variance has changed.
The partial correlation gives you the following:
1. Numerator: -.73 - (-.843*.872) = 0.005 2. Denominator: Square root of ((1-.711)*(1-.76)) = square root of .069 = .263 3. Result: .005/.263 = .019 <-- this is the new correlation coefficient
You can compare the R2 values between the original result (r = -.73) and the partial correlation you just computed (r = .019). These give R2 of .53 versus .0004. This tells you that partialling out the third variable of # years at UI explains all of the variance in teaching ratings. If you take out the effect of years at UI, the relationship between salary and teaching rating disappears (R2 is almost 0).
4. This is a confidence interval (CI) problem. Compute the 95% CI around the ADD sample and see if this interval includes the non-ADD population mean of 100. If the 95% CI does contain 100, then the ADD group also may have a true population mean of 100. This would imply that ADD does not cause a reduction in IQ. However, if the 95% CI doesn't contain 100, then your best guess about the ADD population mean would be that it probably wouldn't be 100 (or close to 100). So, you would conclude that ADD may reduce IQ.
Here are the computations to give you the 95% CI:
1. Find standard error (which is standard deviation divided by square root of n): 20/5 = 4 2. Find critical value from t table with n-1 degrees of freedom: 2.064 3. Multiply standard error by critical value: 4*2.064 = 8.256. 4. Add and subtract this value from the sample mean of 92.6:
So, you would need to conclude that you are 95% sure that the ADD population's IQ lies between 84.344 and 100.856. Since 100 is in this interval, the ADD population probably does not have an IQ lower than the non-ADD population.