Fisher’s exact test failure can lead to biased results

Robersy Sanchez

Department of Biology. Pennsylvania State University, University Park, PA 16802

Email: rus547@psu.edu

Fisher’s exact test is a statistical significance test used in the analysis of contingency tables. Although this test is routinely used even though, it has been full of with controversy for over 80 years. Herein, the case of its application analyzed is scrutinized with specific examples.

Overwiew

The statistical significance of the difference between two bisulfite sequence from control and treatment groups at each CG site can be evaluated with Fisher’s exact test. This is a statistical test used to determine if there are nonrandom associations between two categorical variables.

Let there exist two such (categorical) variables $X$ and $Y$, where $X$ stands for two groups of individuals: control and treatment, and $Y$ be a two states variable denoting the methylation status, carrying the number of times that a cytosine site is found methylated ($^{m}CG$) and non-methylated ($CG$), respectively.

This information can be summarized in a $2 \times 2$ table, a $2 \times 2$ matrix in which the entries $a_{ij}$ represent the number of observations in which $x=i$ and $y=j$. Calculate the row and column sums $R_i$ and $C_j$, respectively, and the total sum:

$N=\sum_iR_i=\sum_jC_j$

of the matrix:

  $Y = ^mCG$ $Y = CG$ $R_i$
Control $a_{11}$ $a_12$ $a_{11}+a_{12}$
Treatment $a_{21}$ $a_22$ $a_{21}+a_{22}$
$C_i$ $a_{11}+a_{21}$ $a_{12}+a_{22}$ $a_{11}+a_{12}+a_{21}+a_{22} = N$

Then the conditional probability of getting the actual matrix, given the particular row and column sums, is given by the formula:

$P_{cutoff}=\frac{R_1!R_2!}{N!\prod_{i,j}a_{ij}!}C_1!C_2!$