7 Independence & Association
Suppose we design a clinical trial with a treatment and placebo. Imagine we enroll 100 participants into each arm of the trial, for a total of 200 participants. The outcome is recovery status (Yes, No). Suppose that in the placebo arm, only 20 participants recover.
| Treatment | Placebo | |
|---|---|---|
| Recovered | ? | 20 |
| Unrecovered | ? | 80 |
| Total | 100 | 100 |
If the treatment does not work, how many participants in the treatment arm do you expect to recover?
The answer is 20, matching the number in the placebo arm. Why?
If the treatment is effective, how many participants in the treatment arm do you expect to recover?
Answer: A number larger than 20. Why?
If the treatment is harmful, how many participants in the treatment arm do you expect to recover?
Answer: A number less than 20. Why?
This clinical trial example illustrates the concepts of association and independence.
When we supposed that the treatment does not work, we were able to say that the expected number of recovered participants in the treatment arm would be close to the number in the placebo arm. How does our answer change if the number of participants in the treatment arm is four times the number in the placebo arm?
| Treatment | Placebo | |
|---|---|---|
| Recovered | ? | 20 |
| Unrecovered | ? | 80 |
| Total | 400 | 100 |
The answer: something close to 80. We noted that
\[ P(\text{Recovered}|\text{Treatment}) = P(\text{Recovered}|\text{Placebo}) = \frac{20}{100} \]
\[ \frac{20}{100} \times 400 = 80 \]
In general, if the treatment does not work, we’d say
\[ P(\text{Recovered}|\text{Treatment}) = P(\text{Recovered}|\text{Placebo}) = P(\text{Recovered}) \]
indicating that treatment arm provides no additional information about recovery status. The marginal probability of recovery is just as good as the conditional probabilities.
7.1 Independence
We say two random outcomes \(A\) and \(B\) are independent, if
\[ P(A=a\mid B) = P(A=a) \]
which is equivalent to
\[ P(A=a\ \&\ B=b) = P(A=a)P(B=b). \]
When predictions are based on conditional probabilities, we would say that A and B are independent when B is not predictive of A and visa-versa.
7.1.1 Check independence
We can check independence by comparing the joint probabilities to those that would arise under independence. Suppose we wanted to check whether treatment arm and recovery status are independent in the following joint distribution.
| Treatment | Placebo | Margin | |
|---|---|---|---|
| Recovered | .15 | .1 | 0.25 |
| Unrecovered | .35 | .4 | 0.75 |
| Margin | .5 | .5 | 1 |
To build the table under independence, all we need are the margins.
| Treatment | Placebo | Margin | |
|---|---|---|---|
| Recovered | 0.25 | ||
| Unrecovered | 0.75 | ||
| Margin | .5 | .5 | 1 |
Under independence, \(P(\text{Recovered} \& \text{Treatment}) = P(\text{Recovered})P(\text{Treatment})\). Using this formula, we get:
| Treatment | Placebo | Margin | |
|---|---|---|---|
| Recovered | 0.25 \(\times\) 0.5 = 0.125 | 0.25 \(\times\) 0.5 = 0.125 | 0.25 |
| Unrecovered | 0.75 \(\times\) 0.5 = 0.375 | 0.75 \(\times\) 0.5 = 0.375 | 0.75 |
| Margin | .5 | .5 | 1 |
Note that in this case
\[ 0.15 = P(\text{Recovered}\ \&\ \text{Treatment}) \neq P(\text{Recovered})P(\text{Treatment}) = 0.125. \]
Exercise Find a joint distribution for which treatment arm is independent of recovery status.
| Treatment | Placebo | Margin | |
|---|---|---|---|
| Recovered | .05 | ||
| Unrecovered | .2 | ||
| Margin | .75 | .25 | 1 |
7.2 Association
When two random outcomes are not independent, we say that the outcomes are associated. They may be positively associated or negatively associated. In the clinical trial example above, we noted that we’d expect more than 20 recovered participants if the treatment worked. This is an example of a positive association. Likewise, if the treatment is harmful, we’d expect fewer than 20 recovered participants, which is an example of negative association.
To determine if outcomes are positively or negatively associated, one can compare the joint probabilities to those that would be observed under independence. For example, if the following are the joint probabilities of recovery status and treatment arm,
| Treatment | Placebo | Margin | |
|---|---|---|---|
| Recovered | .25 | .05 | .3 |
| Unrecovered | .5 | .2 | .7 |
| Margin | .75 | .25 | 1 |
then the joint probabilities under independence would be
| Treatment | Placebo | Margin | |
|---|---|---|---|
| Recovered | .3 \(\times\) .75 = 0.225 | .3 \(\times\) .25 = 0.075 | .3 |
| Unrecovered | .7 \(\times\) .75 = 0.525 | .7 \(\times\) .25 = 0.175 | .7 |
| Margin | .75 | .25 | 1 |
Note that
\[ P(\text{Recovered}\ \&\ \text{Treatment}) = 0.25 > 0.225 = P(\text{Recovered})P(\text{Treatment}) \]
As the joint probability is larger than what would be observed under independence, recovery and treatment are positively associated. In contrast, because
\[ P(\text{Recovered}\ \&\ \text{Placebo}) = 0.05 < 0.075 = P(\text{Recovered})P(\text{Placebo}) \]
recovery and placebo are negatively associated.
In general,
\[ \begin{array}{ll} P(A=a\ \&\ B=b) > P(A=a)P(B=b)& a \text{ and } b \text{ are positively associated}\\ P(A=a\ \&\ B=b) < P(A=a)P(B=b)& a \text{ and } b \text{ are negatively associated} \end{array} \]
Positive and negative association can also be expressed in terms of conditional probabilities:
\[ \begin{array}{ll} P(A=a\mid B=b) > P(A=a)& a \text{ and } b \text{ are positively associated}\\ P(A=a\mid B=b) < P(A=a)& a \text{ and } b \text{ are negatively associated.} \end{array} \]
Exercise Find a joint distribution for which treatment is positively associated with recovery.
| Treatment | Placebo | Margin | |
|---|---|---|---|
| Recovered | .05 | ||
| Unrecovered | .2 | ||
| Margin | .75 | .25 | 1 |
Exercise Calculate the conditional probabilities for your table from the previous exercise. Show that the conditional probability definition of positive association is satisfied.