SAT Scores vs. Acceptance Rates
The experiment must fulfill two goals: (1) to produce a professional report of your experiment, and (2) to show your understanding of the topics related to least squares regression as described in Moore & McCabe, Chapter 2.
In this experiment, I will determine whether or not there is a relationship between average SAT scores of incoming freshmen versus the acceptance rate of applicants at top universities in the country. The cases being used are 12 of the very best universities in the country according to US News & World Report.
The average SAT scores of incoming freshmen are the explanatory variables. The response variable is the acceptance rate of the universities.
I used September 16, 1996 issue of US News & World Report as my source.
I started out by choosing the top fourteen "Best National Universities". Next,
I graphed the fourteen schools using a scatterplot and decided to cut it down to
12 universities by throwing out odd data.
A scatterplot of the 12 universities data is on the following page (page 2)
The linear regression equation is:
ACCEPTANCE = 212.5 + -.134 * SAT_SCORE
R= -.632 R^2=.399
I plugged in the data into my calculator, and did the various regressions. I saw that the power regression had the best correlation of the non-linear transformations. A scatterplot of the transformation can be seen on page 4.
The Power Regression Equation is
ACCEPTANCE RATE=(2.475x10^23)(SAT SCORE)^-7.002
R= -.683 R^2=.466
The power regression seems to be the better model for the experiment that I have chosen. There is a higher correlation in the power transformation than there is in the linear regression model. The R for the linear model is -.632 and the R in the power transformation is -.683. Based on R^2 which measures the fraction of the variation in the values of y that is explained by the least-squares regression of y on x, the power transformation model has a higher R^2 which is .
466 compared to .399. The residual plot for the linear regression is on page 5 and the residual plot for the power regression is on page 6. The two residuals plots seem very similar to one another and no helpful observations can be seen from them. The outliers in both models was not a factor in choosing the best model. In both models, there was one distinct outlier which appeared in the graphs. The one outlier in both models was University of Chicago. It had an unusually high acceptance rate among the universities in this experiment. This school is a very good school academically which means the average SAT scores of
Above is my original data. In the graph, it can be seen that there are
Our predicted points for our data are, (13, -88.57) and (-2, -29.84). These points show the
I do not predict that all of my results will follow a line of best fit
The results of this experiment are shown in the compiled student data in Table 1 below.
The New York Times. The New York Times, 15 Aug. 2013. Web. The Web.
In conclusion table 10-1 on page 292 list the three types of models. These models provide
You did a wonderful job of using the plot to support your argument without simply giving a summary. There are just a few grammatical and structural errors. It might help to review the paper again or have a friend go over it with you.
Lastly, Figure 2 and Figure 3 represent a collection of data obtained from the students in class. To determine a correlation between two variables we used the “coefficient of determination” which is also known as r-squared. Based on Figure 2, the r-squared value was 0.292. This r-squared value indicated that there appears to be no relationship between the muscle size and maximum muscle force. In comparison, in Figure 3 the r-squared value was 0.038. Thus, this r-squared value also indicated that there is no relationship between the muscle size and half-maximum fatigue
because of the line of best fit. Using line of best fit means I can
Based on E-Views method, there are 3 values where each of it stipulates the correlation coefficient, R-squared, probability of F-statistics and p-value of t-test
slope. I think that out of all the variables, this is the one which is
Alpha = 0.05, df = 10 - 2 = 8, so the critical value of r is 0.632 r = 0.5654 (this is the effect size) since r < r-crit, we have insufficient evidence to reject the null hypothesis.
The two columns in the graph represent the mean values and the error lines represent the standard deviations of the tested grasshopper and human subject. The jumping distance of the grasshoppers was more than the jumping distance of humans and the TTEST value was less than 0.05.
R-squared is always between 0 and 100%. 0% means the model explains none of the variability of the response data around its mean. 100% indicates that the model explains all the variability of the response data around its mean.
R^2 can be assume a value between 0 and 1, the closer R^2 is to 1, the regression model can explained the observed data.