Various Labs and Projects from Math 4672 - Computational Statistics. Most projects are done in SAS, but some in R. Every project includes a short write up and original code
Memo 4 - This lab is based on the paper Approximate is Better than "Exact" for Interval Estimation of Binomial Proportions by Alan Agresti and Brent A. Coull (The American Statistician by American Statistical Association, May 1998, Vol. 52, No. 2, p.119-126) The coverage probability of a confidence interval (CI) is the actual probability that the interval contains the true value of statistics interest
Memo 5 - Ordinary sample proportion for point estimation: pb = S/n; Adjusted Wald approximation for point estimation pe = (S + 2)/(n + 4); S is binomial S ∼ Bin(n, p). Which is better exact of appoximate coverage?
Memo 6 - Macro to generate random data from a uniform distribution
Memo 7 - A procedure for finding the best-fitting curve to a given set of points by minimizing the sum of the squares of the errors (or offsets or residuals) of the points from the curve.
Memo 8 - Find the best-ftting curve for fish data, with linear, quadradic, cubic and spline models.
Memo 9 - The t-test assesses whether the means of two groups are statistically different from each other. The t-test is known to be relatively robust if the underlying distribution of the samples is normal. To determine whether the means of two groups are significantly different, we set up a null hypothesis (H0). It states that the both groups would have the same mean if we repeated the experiment a large (infinite) number of times. The alternative hypotheses (H1) to the null hypothesis are that one particular mean will be greater (or lower) than the other (called a one tailed test) or that the two means will be different (called a two-tailed test).
Memo 14 - Least Square Curve Fitting with Scatter Plot and Regression Line
Memo 19 - We use statistical sampling to determine the value of a parameter of a population. We sample a population, measure a statistic of this sample, and then use this statistic to say something about the corresponding parameter of the population.
Memo 20 - We look at the Wild and Studenized Bootstrap.
Memo 21 - The question that ANOVA answering is if the variations between the means due to true differences about the populations means or just due to sampling variability. To answer this question, ANOVA calculates a parameter called F-statistics, which compares the variation among sample means (among different groups) to the variation within groups.
Memo 23 - Two-Way ANOVA. Difference in means with respect to two variables.
Memo 24 - One Way ANOVA Data Reading
Memo 25 - Repeated measures as given in the blood glucose data with ANOVA
Logistic Regression - Logistic regression uses a transformation (called a logit) which forces the prediction equation to predict values between 0 and 1. A logistic regression equation predicts the natural log (ln) of the odds for a subject being in one category or another. This project analyzes heart attack data considering a number of factors.
Project2 - Extended Bootstrapping project, using plain, wild, studentized, etc.
Project3 - Extended ANOVA project involving medical treatment data.