Numeracy scores have kurtosis close to 4 indicating heavier tails than expected in a normal distribution, but the Q-Q plot looks relatively normal (although observed quantiles rise above the normal line at the low end. The Q-Q plots confirm that these measures show the characteristic patterns of normality.Įxam scores have kurtosis below 2 suggesting lighter tails than you’d expect in a normal distribution, which is also shown in the Q-Q plot with observed quantiles at the low end being higher than expected and observed quantiles at the high end being lower than expected. Looking at the Q-Q plots, none of them show the characteristic upward or downward bend associated with skew.įor kurtosis, computer literacy and lecture attendance have kurtosis around the expected value of 3. The exception is numeracy scores for which skew is about 1. For all of the measures the value of skewness is fairly close to zero, indicating the expected value of 0. The skewness values are calculated in the table for task 2. Use Task 2 and 4 to interpret skew and kurtosis for each of the four measures. Ggplot2::ggplot(., aes(sample = score)) +įacet_wrap(~measure, ncol = 2, scales = "free") + Produce Q-Q plots for the four measures in Task 2. Intuitively, this finding fits with the nature of the subject: statistics is very easy once everything falls into place, but before that enlightenment occurs it all seems hopelessly difficult! Task 6.4 This snapshot can be very useful: for example, the bimodal distribution of exam scores instantly indicates a trend that students are typically either very good at statistics or struggle with it (there are relatively few who fall in between these extremes). Finally, the numeracy test has produced very positively skewed data (the majority of people did very badly on this test and only a few did well).ĭescriptive statistics and histograms are a good way of getting an instant picture of the distribution of your data. It looks as though computer literacy is fairly normally distributed (a few people are very good with computers and a few are very bad, but the majority of people have a similar degree of knowledge) as is the lecture attendance. The exam scores are very interesting because this distribution is quite clearly not normal in fact, it looks suspiciously bimodal (there are two peaks, indicative of two modes). Theme(plot.title = element_text(hjust = 0.5)) Produce histograms for each of the four measures in the previous task and interpret them ggplot2::ggplot(rexam_tidy_tib, aes(score)) + In addition, the confidence interval for computer literacy was relatively narrow compared to that of the percentage of lectures attended and exam scores. From this table, we can see that, on average, students attended nearly 60% of lectures, obtained 58% in their exam, scored only 51% on the computer literacy test, and only 5 out of 15 on the numeracy test. The output shows the table of descriptive statistics for the four variables in this example. Kurtosis = moments::kurtosis(score, na.rm = TRUE) Skew = moments::skewness(score, na.rm = TRUE), Compute and interpret summary statistics for exam, computer, lecture and numeracy for the sample as a whole.Ĭi_lower = ggplot2::mean_cl_normal(score)$ymin,Ĭi_upper = ggplot2::mean_cl_normal(score)$ymax, ![]() There is a variable called uni indicating whether the student attended Sussex University (where I work) or Duncetown University. Four variables were measured: exam (first-year SPSS exam scores as a percentage), computer (measure of computer literacy in percent), lecture (percentage of statistics lectures attended) and numeracy (a measure of numerical ability out of 15). The file r_exam.csv contains data on students’ performance on an SPSS exam. the dots fall within the confidence interval for the line). Labs(x = "Theoretical quantiles", y = "Sample quantiles") +įacet_wrap(~film, ncol = 1, scales = "free") +įor both films the expected quantile points are close, on the whole, to those that would be expected from a normal distribution (i.e. Qqplotr::stat_qq_point(alpha = 0.2, size = 1) + ![]() ![]() Ggplot2::ggplot(., aes(sample = arousal)) + Load the data directly from the discovr package: notebook_tib % Using the notebook.csv data from Chapter 5, create and interpret a Q-Q plot for the two films (ignore sex). See the full license terms at the bottom of the page. You can use this material for teaching and non-profit activities but please do not meddle with it or claim it as your own work. This document contains abridged sections from Discovering Statistics Using R and RStudio by Andy Field so there are some copyright considerations.
0 Comments
Leave a Reply.AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |