You are given information about 15 Titanic passengers.
PassengerId | Sex | Age | Class | Survived | |
---|---|---|---|---|---|
773 | 773 | female | 57 | 2 | no |
698 | 698 | female | NA | 3 | yes |
652 | 652 | female | 18 | 2 | yes |
548 | 548 | male | NA | 2 | yes |
890 | 890 | male | 26 | 1 | yes |
875 | 875 | female | 28 | 2 | yes |
392 | 392 | male | 21 | 3 | yes |
788 | 788 | male | 8 | 3 | no |
330 | 330 | female | 16 | 1 | yes |
183 | 183 | male | 9 | 3 | no |
680 | 680 | male | 36 | 1 | yes |
560 | 560 | female | 36 | 3 | yes |
104 | 104 | male | 33 | 3 | no |
136 | 136 | male | 23 | 2 | no |
37 | 37 | male | NA | 3 | yes |
Compute the contingency table for Sex and Class variables.
Compute marginal frequencies for Sex and Class variables.
Compute joint probabilities for P(Sex = …, Class = …). Six probabilities in total.
Compute joint probabilities for P(Class = …|Sex = …). Six probabilities in total.
Draw a stacked barplot. Do you think there is an association between these two variables?
You are given Sex vs Class contingency table for all Titanic passengers, and you want to test these two variables for the independence.
tab = table(data$Class, data$Sex)
kable(tab)
female | male |
---|---|
94 | 122 |
76 | 108 |
144 | 347 |
Find expected and observed counts for this table.
Find the test statistic.
The iris data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris (150 flowers in total). The species are Iris setosa, versicolor, and virginica.
Here are summary statistics for Sepal and Petal lengths.
mean(Sepal.Length)
## [1] 5.843333
mean(Petal.Length)
## [1] 3.758
sd(Sepal.Length)
## [1] 0.8280661
sd(Petal.Length)
## [1] 1.765298
cor(Sepal.Length, Petal.Length)
## [1] 0.8717538
State the regression line equation.