Pothos

Assume we have $n$ data points $x_0, x_1, \ldots, x_{n-1}$. For convenience, assume $n$ is a power of two; if it’s different, we can just pad with zeroes. We want to efficiently compute the transformed sequence $y_k = x_0 + w^kx_1 + w^{2k}x_2 + \ldots + w^{(n-1)k}x_{n-1}$, where $w$ is the nth root of unity. We can represent these equations in matrix notation as $\bar y = V_n \bar x$, where $V_n$ is the Vandermonde matrix of $1, w, \ldots, w^{n-1}$.

Continue reading →

Posted in Uncategorized | Comments Off

Notes on linear regression

Posted on May 28, 2017 by mg50

So suppose we have a set of points $(x_1, y_1), \ldots, (x_n, y_n)$ and we want to find $\beta_0$, $\beta_1$ such that $y_i = \beta_1 x_i + \beta_0$ for each $i$, or in matrix notation $Y = X\bar \beta$, where $X$ is n-by-2 and has the $x_i$’s in its first column and $1$’s in its second. Well, unfortunately, this may not have a solution, since $Y$ may not lie in the span of the two columns of $X$. So let’s instead try to find another vector $Y’$ which is in the span of $X$ and is as close to $Y$ as possible.

This can clearly be found by projection. Let $v_1, v_2$ be the columns of X. Then we can uniquely write $Y = av_1 + bv_2 + u$, where $u$ is orthogonal to $v_1, v_2$. Thus, multiplying through by $X^T$, we see that $X^TY = aX^Tv_1 + bX^Tv_2 + 0$, since the rows of $X^T$ are perpendicular to $u$. This can now be rewritten as $X^TY = X^T X \begin{pmatrix} a \\ b \end{pmatrix}$. So we have $\begin{pmatrix} a \\ b \end{pmatrix} = (X^TX)^{-1}X^TY$. But by definition $a$ and $b$ are the coordinates of the projection of $Y$ onto the columns of $X$. So we’ve found $\bar \beta$ – assuming that $X^TX$ is invertible!

If $X^TX$ is not invertible, we can find some vector $v$ with $X^TXv = 0$. But then $X^Tu = 0$, where $u = Xv$. Each entry in the vector $X^Tu$ is the dot product of a column of $X$ with $u$. As these are all 0, it follows that $u$ is perpendicular to all of $X$’s columns. But $u = Xv$ is also a linear combination of $X$’s columns, so it must be 0, so there is some linear dependence between the columns of $X$. Therefore, if the columns of $X$ are linearly independent, then $X^TX$ is invertible and the above argument gives us $\bar \beta = (X^TX)^{-1}X^TY$.

$Y’ = X\bar\beta$ being the closest vector to $Y$ in the column space of $X$ means we’ve minimized $\|Y – X\beta\|$, hence minimized $\|Y – X\beta\|^2 = (Y – X\beta)^T(Y – X\beta)$. This expands out to $f(\beta) = Y^TY – \beta^TX^TY – Y^TX\beta + \beta^T(X^TX)\beta$. Taking the matrix derivative of $f$ and setting it equal to $0$ will also let us find $\beta = (X^TX)^{-1}X^TY$.

The same logic obviously applies if we have more than one predictor variable, in which case the matrix $X$ will have more than two columns and $\beta$ has more than two rows. Also, we get $\beta = (X^TWX)^{-1}X^TWY$ via the above differentiation argument if $W$ represents a diagonal matrix of weights.

Posted in Uncategorized | Comments Off

Notes on the t-distribution

Posted on April 26, 2017 by mg50

Let $X_1, X_2, \ldots, X_n$ be IID normal variables with mean $\mu$ and variance $\sigma^2$. Let $\overline{X}$ be the mean of these variables, and define the deviations $D_i = \overline{X} – X_i$. Let $Y = [X_1 \ X_2 \ \ldots \ X_n]^T$ and $Z = [\overline{X} \ D_2 \ \ldots \ D_n]^T$. Let $S = \frac{1}{n-1} \sum D_i^2$ be the sample variance.

Independence of Sample Variance and Sample Mean

By definition, $Y \sim N(\mu \mathbf{1}, \sigma^2 \mathrm I)$. Also, $Z = A_n Y$ where, for example, $A_3$ looks like this:

$$
A_3 = \frac{1}{3}
\begin{bmatrix}
1 & 1 & 1 \\
1 & 1-3 & 1 \\
1 & 1 & 1-3 \\
\end{bmatrix}.
$$

Thus, $Z \sim N(A_n (\mu \mathbf{1}), A_n (\sigma^2 \mathrm I) A_n^T) = N(\mu A_n \mathbf{1}, \sigma^2 A_n A_n^T)$. But multiplying out $A_n A_n^T$, you can see that it has the form

$$ \frac{1}{n^2}
\begin{bmatrix}
n & \mathbf 0^T \\
\mathbf 0 & B
\end{bmatrix}
$$

for some matrix B (try multiplying it out with $A_3$ to see this). This implies the covariance between $\overline{X}$ and $D_2, \ldots, D_n$ is 0; and since they’re jointly normally distributed, it follows that $\overline X$ is independent of $D_2, \ldots, D_n$. And since $\sum D_i = 0$, $\overline X$ is also independent of $D_1$. Since $S$ is a function of the deviations, it’s also independent of $\overline X$.

Distribution of the Sample Variance and t-statistic

Note that

$$
\begin{eqnarray}
\sum (X_i – \mu)^2 &=& \sum ((X_i – \overline X) + (\overline X – \mu))^2 \\
&=& \sum D_i^2 + 2\sum (\overline X – \mu) (X_i – \overline X) + \sum (\overline X – \mu)^2 \\
&=& \sum D_i^2 + 2(\overline X – \mu) \sum(X_i – \overline X) + n(\overline X – \mu)^2 \\
&=& (n – 1)S + 0 + n(\overline X – \mu)^2 \\
&=& (n – 1)S + (\sqrt n (\overline X – \mu))^2.
\end{eqnarray}
$$

Divide through by $\sigma^2$ to yield

$$
\sum \left(\frac{X_i – \mu}{\sigma}\right)^2 = \frac{(n – 1)}{\sigma^2}S + \left(\sqrt n \frac{\overline X – \mu}{\sigma}\right)^2.
$$

The left side has $\chi^2_n$ distribution. The second term on the right-hand side is $\chi^2_1$. Since the two terms on the right are independent by the above section, we can take the MGF of both sides to yield $(1 – 2t)^\frac{n}{2} = M(t) (1-2t)^\frac{1}{2}$, where $M(t)$ is the MGF of $\frac{(n – 1)}{\sigma^2}S$. Thus, $M(t) = (1 – 2t)^\frac{n-1}{2}$ which is the MGF of the $\chi^2_{n-1}$ distribution.

So $\frac{1}{\sigma^2}S \sim\frac{1}{n – 1}\chi^2_{n-1}$. Let $T = \sqrt n \frac{\overline X – \mu}{\sqrt S}$. Then we can write

$$
\begin{eqnarray}
T &=& \frac{ \sqrt n \frac{\overline X – \mu}{\sigma} } {\frac{\sqrt S}{\sigma}} \\
&=& \frac{ \sqrt n \frac{\overline X – \mu}{\sigma} } {\sqrt {\frac{S}{\sigma^2}}}
\end{eqnarray}
$$

where the numerator and denominator are independent, the numerator has distribution $N(0, 1)$ and the denominator has distribution $\frac{1}{n-1}\chi^2_{n-1}$.

Posted in Uncategorized | Tagged distributions, probability theory, statistics | Comments Off

Complex analysis notes

Posted on April 16, 2017 by mg50

Cauchy-Riemann

Let $f(z) = f(x + iy) = g(x, y) + ih(x, y)$. If it’s holomorphic, we can differentiate along the real line and the imaginary line:

$f’ = \lim\frac{f(x + a, y) – f(x, y)}{a} = \frac{∂g}{∂x} + i\frac{∂h}{∂x}$

$f’ = \lim\frac{f(x , y + a) – f(x, y)}{ia} = \frac{∂h}{∂y} – i\frac{∂g}{∂y}$

So we have $\frac{∂g}{∂x} = \frac{∂h}{∂y}$ and $\frac{∂h}{∂x} = -\frac{∂g}{∂y}$.

Power series

Theorem. Let $f(z) = \Sigma a_n z^n$. The radius of convergence is then given by $R = \frac{1}{\limsup \sqrt[n]{|a_n|}}$.

Theorem. Let $f(z) = \Sigma a_n z^n$ have radius of convergence R. Then f has a derivative $f'(z) = \Sigma n a_n z^{n-1}$ with identical radius of convergence.

Proof: First, set $g(z) = \sum na_{n-1} z^n$. By the above rule for computing radius of convergence, you can show $g$ has the same radius of convergence as $f$.

Next, pick z, h with |z| < r < R and |z + h| < r. Write $f(z) = S_n(z) + E_{n+1}(z)$, where $S_n$ is the partial sum and $E_{n+1}$ is the rest.

$$
\begin{eqnarray}
\dfrac{f(z+h) – f(z)}{h} – g(z) &=& \left(\dfrac{S_n(z+h) – S_n(z)}{h} – S_n'(z)\right) \\
&+& (S_n'(z) – g(z)) \\
&+& \dfrac{E_{n+1}(z+h) – E_{n+1}(z)}{h}
\end{eqnarray}
$$

The first term clearly converges to 0 as h -> 0 by definition of the derivative, as does the second term by definition of g as n -> infinity. Now,

$$
D_{n,h}(z) = \frac{E_{n+1}(z+h) – E_{n+1}(z)}{h} = \frac{1}{h}\displaystyle\sum_{k=n+1}^\infty a_k ((z+h)^k – z^k).
$$

Note that $(x^{m+1} – y^{m+1}) = (x – y)(x^m + x^{m-1}y + \dots + xy^{m-1} + y^m)$. So each term in the sum becomes $a_k h ((z+h)^{k-1} + \dots + z^{k-1})$. Since |z|, |z+h| < r, in magnitude this quantity is bounded by $k|h||a_k|r^{k-1}$. So $|D_{n,h}(z)| \leq \sum k |a_k| r^{k-1} \lt \infty$; as the tail of a convergent series (remember that g is absolutely convergent on the same disk), it also approaches zero as n approaches infinity.

Convergence of holomorphic functions

Theorem. Let $R$ be an open region, and suppose $f_n$ is a sequence of holomorphic functions on $R$ which converge uniformly to $f$. Then $f_n’$ converges uniformly to $f’$ on every compact subset of $R$.

Proof: By uniform convergence, we can find $M_n > 0$ such that $\sup_{z\in R} |f_n – f| < M_n$ and $M_n \to 0$.

Pick $\delta > 0$, and let $A_{\delta}$ be the set of all $z$ in $R$ which are at distance > $\delta$ to the boundary. We only need to show $f_n \to f$ uniformly on $A_{\delta}$.

For any $z \in A_{\delta}$, the disk $D$ of radius $\delta$ centered at $z$ lies in $R$. Then by Cauchy’s integral formula, we have

$$
\begin{eqnarray}
|f'(z) – f_n'(z)| &=& \left| \frac{1}{2\pi i} \int_{C_D} \frac{f(w) – f_n(w)}{(z – w)^2} dw \right| \\
&\leq& \frac{1}{2\pi} \frac{M_n * 2\pi\delta}{\delta^2} \\
&=& \frac{M_n}{\delta} \to 0
\end{eqnarray}
$$

This estimate depends only on $n$ and not on $z$, so the convergence is uniform.

Posted in Uncategorized | Tagged analysis, complex analysis | Comments Off

Invariant Subspaces

Posted on April 13, 2017 by mg50

Let M be a matrix, and suppose $v_1, \ldots, v_n$ for a basis for a linearly independent subspace of M. This means

$$
Mv_1 = a_{11}v_1 + a_{12}v_2 + \ldots + a_{1n}v_n \\
Mv_2 = a_{21}v_1 + a_{22}v_2 + \ldots + a_{2n}v_n \\
\vdots \\
Mv_n = a_{n1}v_1 + a_{n2}v_2+ \ldots + a_{nn}v_n
$$

or in other words

$$
M [ v_1 v_2 \ldots v_n] = [v_1 v_2 \ldots v_n]
\begin{bmatrix}
a_{11} & a_{21} & \ldots & a_{1n} \\
a_{12} & a_{22} & \ldots & a_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{1n} & a_{2n} & \ldots & a_{nn} \\
\end{bmatrix}
$$

or in other words $MV = VA$ (where $V$ and $A$ are what you expect). Let $U$ be a matrix such that the columns of $V$ and $U$ form a complete basis for the space. Then

$$
M[V \ U] = [V \ U]
\begin{bmatrix}
A & B \\
0 & C \\
\end{bmatrix}
$$

for some $B$ and $C$. This is because each column of $VB + UC$ is an arbitrary linear combination of the vectors in $V$ and $U$ – and since they form a basis, these combinations can be chosen to match the columns of $MU$.

So, if X is a matrix whose first n columns vectors form an invariant subspace of M, we can write

$$
X^{-1}MX =
\begin{bmatrix}
A & B \\
0 & C \\
\end{bmatrix}
$$

i.e., M is block triangular with respect to X.

Now suppose we can partition $R^m$ into invariant subspaces $C_1, C_2, \ldots, C_n$ of M. As before, we have $MC_i = C_iA_i$, so

$$
\begin{eqnarray}
M[C_1 \ C_2 \ \ldots \ C_n] &=& [MC_1 \ MC_2 \ \ldots \ MC_n] \\
&=& [C_1A_1 \ C_2A_2 \ \ldots \ C_nA_n] \\
&=& [C_1 \ C_2 \ \ldots \ C_n]
\begin{bmatrix}
A_1 & & & \\
& A_2 & & \\
& & \ddots & \\
& & & A_n \\
\end{bmatrix}.
\end{eqnarray}
$$

So M is block diagonal with respect to $[C_1 \ C_2 \ \ldots \ C_n]$.

Posted in Uncategorized | Tagged linear algebra | Comments Off

The Radon-Nikodym theorem

Analysis exercises

Infinite Products and Weierstrass Factorization

Maximum Modulus

Hahn decomposition

Discrete Fourier Transform notes

Fast Fourier Transform

Notes on linear regression

Notes on the t-distribution

Independence of Sample Variance and Sample Mean

Distribution of the Sample Variance and t-statistic

Complex analysis notes

Cauchy-Riemann

Power series

Convergence of holomorphic functions

Invariant Subspaces

Recent Posts

Recent Comments

Archives

Categories

Meta