############################################
A worked example of the general linear model
############################################

See: `introduction to the general linear model`_.

Here we go through the matrix formulation of the general linear model with the
simplest possible example |--| a t-test for the difference of the sample mean
from zero.

Let us say we have the hypothesis that men are more psychopathic than women.

We find 10 male-female couples, and we give them each a psychopathy
questionnaire.  We score the questionnaires, and then, for each couple, we
subtract the woman's score from the man's score, to get a difference score.

We have 10 difference scores:

.. nbplot::

    >>> #: Our standard imports
    >>> import numpy as np
    >>> import numpy.linalg as npl
    >>> # Only show 4 decimals when printing arrays
    >>> np.set_printoptions(precision=4)

.. nbplot::

    >>> differences = np.array([ 1.5993, -0.13  ,  2.3806,  1.3761, -1.3595,
    ...                          1.0286,  0.8466,  1.6669, -0.8241,  0.4469])

Our hypothesis is that the females in the couple have a lower psychopathy
score. We therefore predict that, in the whole population of male-female
couples, the difference measure will, on average, be positive (men have higher
scores than women).

We teat this hypothesis by testing whether the sample mean is far enough from
zero to make it unlikely that the population mean is actually zero.

******************************
The standard error of the mean
******************************

One way to test this, is to compare the mean value, to the `standard error of
the mean
<https://en.wikipedia.org/wiki/Standard_error#Standard_error_of_the_mean_2>`_.

The `standard error`_ of a statistic (such as a mean) is the standard
deviation of the sampling distribution of the statistic.  The sampling
distribution is the distribution of the statistic that we expect when taking a
very large number of samples of the statistic.  For example, the sampling
distribution of the mean, is the expected distribution of the sample means
generated by taking a very large number of samples.

As usual, define the *mean* of a vector of values $\vec{x}$ as:

.. math::

    \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i

Define the *unbiased estimate of the standard deviation* of $\vec{x}$ as:

.. math::

    s_x = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2}

Notice the $\frac{1}{n-1}$.   We have previously used $n$ rather than $n-1$ as
the divisor for the *sample standard deviation*.  Using $n$, our *sample
standard deviation* is a biased estimate of the population standard deviation.
Using $n-1$ as the divisor gives an unbiased estimate for population standard
deviation.

The standard error of the mean is:

.. math::

    SE_{\bar{x}} = \frac{s_x}{\sqrt{n}}

where $n$ is the number of samples, 10 in our case.

A t statistic is given by a value divided by the *standard error* of that
value.  The t statistic for $\bar{x}$ is:

.. math::

    t = \frac{\bar{x}}{SE_{\bar{x}}}

In our case:

.. nbplot::

    >>> x = differences
    >>> n = len(x)
    >>> x_bar = np.mean(x)
    >>> x_bar
    0.70314...
    >>> unviased_var_x = 1. / (n - 1) * np.sum((x - x_bar) ** 2)
    >>> s_x = np.sqrt(unviased_var_x)
    >>> s_x
    1.17718...
    >>> SEM = s_x / np.sqrt(n)
    >>> SEM
    0.37225...
    >>> t = x_bar / SEM
    >>> t
    1.88884...

**********************************
The standard error as an estimator
**********************************

Imagine that we repeat the following procedure many times, collecting each
value into a vector $\vec{m}$:

* take $n$ samples from some distribution;
* take the mean of these $n$ samples.

The standard error of the mean is an estimate of the standard deviation of the
values in $\vec{m}$.

We can simulate this process when we know the exact distribution of the values
in the population.  For example, let us imagine that we know that the
population distribution is a normal distribution with population mean 10 ($\mu
= 10$) and population standard deviation 1.5 ($\sigma = 1.5$).  We can use
:doc:`numpy.random <numpy_random>` to draw samples from this distribution:

.. nbplot::
    :include-source: false

    >>> np.random.seed(42)

.. nbplot::

    >>> # 10 samples from normal distribution with mean 5 and sd 1.5
    >>> sample = np.random.normal(5, 1.5, size=10)
    >>> sample
    array([ 5.7451,  4.7926,  5.9715,  7.2845,  4.6488,  4.6488,  7.3688,
            6.1512,  4.2958,  5.8138])

We know the exact mean ($\mu$) and standard deviation ($\sigma$).  If we draw
a near-infinite number of samples like this, then the standard deviation of
the mean of these samples will be $\frac{\sigma}{\sqrt{n}}$:

.. nbplot::

    >>> 1.5 / np.sqrt(n)
    0.474341...

We can compare this number to an estimate of the same thing from a very large
number of means from samples size 10:

.. nbplot::

    >>> n_samples = 100000
    >>> pop_mean = 5
    >>> pop_sd = 1.5
    >>> means_of_samples = np.zeros(n_samples)
    >>> for i in range(n_samples):
    ...     sample = np.random.normal(pop_mean, pop_sd, size=n)
    ...     means_of_samples[i] = sample.mean()
    >>> sq_deviations = (means_of_samples - pop_mean) ** 2
    >>> # With lots of samples, this value is close to the exact number
    >>> means_std_dev = np.sqrt(1. / n_samples * np.sum(sq_deviations))
    >>> means_std_dev
    0.474735...

Normally we do not know the exact standard deviation of the population.  The
standard error of the mean is for that situation.  First we use the sample
that we have to get an *unbiased estimate* of the population standard
deviation:

.. nbplot::

    >>> n = 10
    >>> y = sample
    >>> y_bar = y.mean()
    >>> unbiased_var_estimate = 1. / (n - 1) * np.sum((y - y_bar) ** 2)
    >>> unbiased_sd_est = np.sqrt(unbiased_var_estimate)

Of course, this estimate will not be exactly the same as the population
standard deviation ($\sigma = 1.5$):

.. nbplot::

    >>> unbiased_sd_est
    1.433659...

We use the unbiased standard deviation estimate to give an unbiased estimate
for the standard error of the mean. This estimate will be close to the
standard deviation of means from many samples of size $n$:

.. nbplot::

    >>> # Standard deviation of means from population
    >>> 1.5 / np.sqrt(n)
    0.474341...
    >>> # Our estimate for the standard error of mean
    >>> unbiased_sd_est / np.sqrt(n)
    0.453362...

**************************************
Testing using the general linear model
**************************************

It is overkill for us to use the general linear model for this, but it does
show how the machinery works in the simplest case.

:math:`\newcommand{Xmat}{\boldsymbol X} \newcommand{\bvec}{\vec{\beta}}`
:math:`\newcommand{\yvec}{\vec{y}} \newcommand{\xvec}{\vec{x}} \newcommand{\evec}{\vec{\varepsilon}}`

The matrix expression of the general linear model is:

.. math::

   \yvec = \Xmat \bvec + \evec

:math:`\newcommand{Xmat}{\boldsymbol X} \newcommand{\bvec}{\vec{\beta}}`

Define our design matrix $\Xmat$ to have a single column of ones:

.. nbplot::

    >>> X = np.ones((n, 1))
    >>> X
    array([[ 1.],
           [ 1.],
           [ 1.],
           [ 1.],
           [ 1.],
           [ 1.],
           [ 1.],
           [ 1.],
           [ 1.],
           [ 1.]])

:math:`\newcommand{\bhat}{\hat{\bvec}} \newcommand{\yhat}{\hat{\yvec}}`

$\bhat$ is The least squares estimate for $\bvec$, and is given by:

.. math::

    \bhat = (\Xmat^T \Xmat)^{-1} \Xmat^T \yvec

Because $\Xmat$ is just a column of ones, $\Xmat^T \yvec = \sum_i{y_i}$.

$\Xmat^T \Xmat = n$, so $(\Xmat^T \Xmat)^{-1} = \frac{1}{n}$.

Thus:

.. math::

    \bhat = (\Xmat^T \Xmat)^{-1} \Xmat^T \yvec \\
    = \frac{1}{n} \sum_i{y_i} \\
    = \bar{y}

The student's t statistic from the general linear model is:

.. math::

   t = \frac{c^T \hat\beta}{\sqrt{\hat{\sigma}^2 c^T (\Xmat^T \Xmat)^+ c}}

where $\hat{\sigma}^2$ is our estimate of variance in the residuals, $c$ is a
contrast vector to select some combination of the design columns, and
$(\Xmat^T \Xmat)^+$ is the *pseudoinverse* of $\Xmat^T \Xmat$.

In our case we have only one design column, so $c = [1]$ and we can omit it.
$\hat{\sigma}^2 = s_x^2$ for $s_x$ defined above.  $\Xmat^T \Xmat$ is
invertible, and we know the inverse already: $\frac{1}{n}$.  Therefore:

.. math::

   t = \frac{\bar{y}}{s_x \sqrt{\frac{1}{n}}} \\
   = \bar{y} \Big/ \frac{s_x}{\sqrt{n}} \\
   = \frac{\bar{y}}{SE_{\bar{y}}}