-
Recent Posts
Recent Comments
Archives
Categories
Meta
The Radon-Nikodym theorem
Posted in Uncategorized
Comments Off on The Radon-Nikodym theorem
Infinite Products and Weierstrass Factorization
Posted in Uncategorized
Tagged analysis, complex analysis, series
Comments Off on Infinite Products and Weierstrass Factorization
Discrete Fourier Transform notes
Fast Fourier Transform
Assume we have $n$ data points $x_0, x_1, \ldots, x_{n-1}$. For convenience, assume $n$ is a power of two; if it’s different, we can just pad with zeroes. We want to efficiently compute the transformed sequence $y_k = x_0 + w^kx_1 + w^{2k}x_2 + \ldots + w^{(n-1)k}x_{n-1}$, where $w$ is the nth root of unity. We can represent these equations in matrix notation as $\bar y = V_n \bar x$, where $V_n$ is the Vandermonde matrix of $1, w, \ldots, w^{n-1}$.
Posted in Uncategorized
Comments Off on Discrete Fourier Transform notes
Notes on linear regression
So suppose we have a set of points $(x_1, y_1), \ldots, (x_n, y_n)$ and we want to find $\beta_0$, $\beta_1$ such that $y_i = \beta_1 x_i + \beta_0$ for each $i$, or in matrix notation $Y = X\bar \beta$, where $X$ is n-by-2 and has the $x_i$’s in its first column and $1$’s in its second. Well, unfortunately, this may not have a solution, since $Y$ may not lie in the span of the two columns of $X$. So let’s instead try to find another vector $Y’$ which is in the span of $X$ and is as close to $Y$ as possible.
This can clearly be found by projection. Let $v_1, v_2$ be the columns of X. Then we can uniquely write $Y = av_1 + bv_2 + u$, where $u$ is orthogonal to $v_1, v_2$. Thus, multiplying through by $X^T$, we see that $X^TY = aX^Tv_1 + bX^Tv_2 + 0$, since the rows of $X^T$ are perpendicular to $u$. This can now be rewritten as $X^TY = X^T X \begin{pmatrix} a \\ b \end{pmatrix}$. So we have $\begin{pmatrix} a \\ b \end{pmatrix} = (X^TX)^{-1}X^TY$. But by definition $a$ and $b$ are the coordinates of the projection of $Y$ onto the columns of $X$. So we’ve found $\bar \beta$ – assuming that $X^TX$ is invertible!
If $X^TX$ is not invertible, we can find some vector $v$ with $X^TXv = 0$. But then $X^Tu = 0$, where $u = Xv$. Each entry in the vector $X^Tu$ is the dot product of a column of $X$ with $u$. As these are all 0, it follows that $u$ is perpendicular to all of $X$’s columns. But $u = Xv$ is also a linear combination of $X$’s columns, so it must be 0, so there is some linear dependence between the columns of $X$. Therefore, if the columns of $X$ are linearly independent, then $X^TX$ is invertible and the above argument gives us $\bar \beta = (X^TX)^{-1}X^TY$.
$Y’ = X\bar\beta$ being the closest vector to $Y$ in the column space of $X$ means we’ve minimized $\|Y – X\beta\|$, hence minimized $\|Y – X\beta\|^2 = (Y – X\beta)^T(Y – X\beta)$. This expands out to $f(\beta) = Y^TY – \beta^TX^TY – Y^TX\beta + \beta^T(X^TX)\beta$. Taking the matrix derivative of $f$ and setting it equal to $0$ will also let us find $\beta = (X^TX)^{-1}X^TY$.
The same logic obviously applies if we have more than one predictor variable, in which case the matrix $X$ will have more than two columns and $\beta$ has more than two rows. Also, we get $\beta = (X^TWX)^{-1}X^TWY$ via the above differentiation argument if $W$ represents a diagonal matrix of weights.
Posted in Uncategorized
Comments Off on Notes on linear regression
Notes on the t-distribution
Let $X_1, X_2, \ldots, X_n$ be IID normal variables with mean $\mu$ and variance $\sigma^2$. Let $\overline{X}$ be the mean of these variables, and define the deviations $D_i = \overline{X} – X_i$. Let $Y = [X_1 \ X_2 \ \ldots \ X_n]^T$ and $Z = [\overline{X} \ D_2 \ \ldots \ D_n]^T$. Let $S = \frac{1}{n-1} \sum D_i^2$ be the sample variance.
Independence of Sample Variance and Sample Mean
By definition, $Y \sim N(\mu \mathbf{1}, \sigma^2 \mathrm I)$. Also, $Z = A_n Y$ where, for example, $A_3$ looks like this:
$$
A_3 = \frac{1}{3}
\begin{bmatrix}
1 & 1 & 1 \\
1 & 1-3 & 1 \\
1 & 1 & 1-3 \\
\end{bmatrix}.
$$
Thus, $Z \sim N(A_n (\mu \mathbf{1}), A_n (\sigma^2 \mathrm I) A_n^T) = N(\mu A_n \mathbf{1}, \sigma^2 A_n A_n^T)$. But multiplying out $A_n A_n^T$, you can see that it has the form
$$ \frac{1}{n^2}
\begin{bmatrix}
n & \mathbf 0^T \\
\mathbf 0 & B
\end{bmatrix}
$$
for some matrix B (try multiplying it out with $A_3$ to see this). This implies the covariance between $\overline{X}$ and $D_2, \ldots, D_n$ is 0; and since they’re jointly normally distributed, it follows that $\overline X$ is independent of $D_2, \ldots, D_n$. And since $\sum D_i = 0$, $\overline X$ is also independent of $D_1$. Since $S$ is a function of the deviations, it’s also independent of $\overline X$.
Distribution of the Sample Variance and t-statistic
Note that
$$
\begin{eqnarray}
\sum (X_i – \mu)^2 &=& \sum ((X_i – \overline X) + (\overline X – \mu))^2 \\
&=& \sum D_i^2 + 2\sum (\overline X – \mu) (X_i – \overline X) + \sum (\overline X – \mu)^2 \\
&=& \sum D_i^2 + 2(\overline X – \mu) \sum(X_i – \overline X) + n(\overline X – \mu)^2 \\
&=& (n – 1)S + 0 + n(\overline X – \mu)^2 \\
&=& (n – 1)S + (\sqrt n (\overline X – \mu))^2.
\end{eqnarray}
$$
Divide through by $\sigma^2$ to yield
$$
\sum \left(\frac{X_i – \mu}{\sigma}\right)^2 = \frac{(n – 1)}{\sigma^2}S + \left(\sqrt n \frac{\overline X – \mu}{\sigma}\right)^2.
$$
The left side has $\chi^2_n$ distribution. The second term on the right-hand side is $\chi^2_1$. Since the two terms on the right are independent by the above section, we can take the MGF of both sides to yield $(1 – 2t)^\frac{n}{2} = M(t) (1-2t)^\frac{1}{2}$, where $M(t)$ is the MGF of $\frac{(n – 1)}{\sigma^2}S$. Thus, $M(t) = (1 – 2t)^\frac{n-1}{2}$ which is the MGF of the $\chi^2_{n-1}$ distribution.
So $\frac{1}{\sigma^2}S \sim\frac{1}{n – 1}\chi^2_{n-1}$. Let $T = \sqrt n \frac{\overline X – \mu}{\sqrt S}$. Then we can write
$$
\begin{eqnarray}
T &=& \frac{ \sqrt n \frac{\overline X – \mu}{\sigma} } {\frac{\sqrt S}{\sigma}} \\
&=& \frac{ \sqrt n \frac{\overline X – \mu}{\sigma} } {\sqrt {\frac{S}{\sigma^2}}}
\end{eqnarray}
$$
where the numerator and denominator are independent, the numerator has distribution $N(0, 1)$ and the denominator has distribution $\frac{1}{n-1}\chi^2_{n-1}$.
Posted in Uncategorized
Tagged distributions, probability theory, statistics
Comments Off on Notes on the t-distribution
Complex analysis notes
Cauchy-Riemann
Let $f(z) = f(x + iy) = g(x, y) + ih(x, y)$. If it’s holomorphic, we can differentiate along the real line and the imaginary line:
$f’ = \lim\frac{f(x + a, y) – f(x, y)}{a} = \frac{∂g}{∂x} + i\frac{∂h}{∂x}$
$f’ = \lim\frac{f(x , y + a) – f(x, y)}{ia} = \frac{∂h}{∂y} – i\frac{∂g}{∂y}$
So we have $\frac{∂g}{∂x} = \frac{∂h}{∂y}$ and $\frac{∂h}{∂x} = -\frac{∂g}{∂y}$.
Power series
Theorem. Let $f(z) = \Sigma a_n z^n$. The radius of convergence is then given by $R = \frac{1}{\limsup \sqrt[n]{|a_n|}}$.
Theorem. Let $f(z) = \Sigma a_n z^n$ have radius of convergence R. Then f has a derivative $f'(z) = \Sigma n a_n z^{n-1}$ with identical radius of convergence.
Proof: First, set $g(z) = \sum na_{n-1} z^n$. By the above rule for computing radius of convergence, you can show $g$ has the same radius of convergence as $f$.
Next, pick z, h with |z| < r < R and |z + h| < r. Write $f(z) = S_n(z) + E_{n+1}(z)$, where $S_n$ is the partial sum and $E_{n+1}$ is the rest.
$$
\begin{eqnarray}
\dfrac{f(z+h) – f(z)}{h} – g(z) &=& \left(\dfrac{S_n(z+h) – S_n(z)}{h} – S_n'(z)\right) \\
&+& (S_n'(z) – g(z)) \\
&+& \dfrac{E_{n+1}(z+h) – E_{n+1}(z)}{h}
\end{eqnarray}
$$
The first term clearly converges to 0 as h -> 0 by definition of the derivative, as does the second term by definition of g as n -> infinity. Now,
$$
D_{n,h}(z) = \frac{E_{n+1}(z+h) – E_{n+1}(z)}{h} = \frac{1}{h}\displaystyle\sum_{k=n+1}^\infty a_k ((z+h)^k – z^k).
$$
Note that $(x^{m+1} – y^{m+1}) = (x – y)(x^m + x^{m-1}y + \dots + xy^{m-1} + y^m)$. So each term in the sum becomes $a_k h ((z+h)^{k-1} + \dots + z^{k-1})$. Since |z|, |z+h| < r, in magnitude this quantity is bounded by $k|h||a_k|r^{k-1}$. So $|D_{n,h}(z)| \leq \sum k |a_k| r^{k-1} \lt \infty$; as the tail of a convergent series (remember that g is absolutely convergent on the same disk), it also approaches zero as n approaches infinity.
Convergence of holomorphic functions
Theorem. Let $R$ be an open region, and suppose $f_n$ is a sequence of holomorphic functions on $R$ which converge uniformly to $f$. Then $f_n’$ converges uniformly to $f’$ on every compact subset of $R$.
Proof: By uniform convergence, we can find $M_n > 0$ such that $\sup_{z\in R} |f_n – f| < M_n$ and $M_n \to 0$.
Pick $\delta > 0$, and let $A_{\delta}$ be the set of all $z$ in $R$ which are at distance > $\delta$ to the boundary. We only need to show $f_n \to f$ uniformly on $A_{\delta}$.
For any $z \in A_{\delta}$, the disk $D$ of radius $\delta$ centered at $z$ lies in $R$. Then by Cauchy’s integral formula, we have
$$
\begin{eqnarray}
|f'(z) – f_n'(z)| &=& \left| \frac{1}{2\pi i} \int_{C_D} \frac{f(w) – f_n(w)}{(z – w)^2} dw \right| \\
&\leq& \frac{1}{2\pi} \frac{M_n * 2\pi\delta}{\delta^2} \\
&=& \frac{M_n}{\delta} \to 0
\end{eqnarray}
$$
This estimate depends only on $n$ and not on $z$, so the convergence is uniform.
Invariant Subspaces
Let M be a matrix, and suppose $v_1, \ldots, v_n$ for a basis for a linearly independent subspace of M. This means
$$
Mv_1 = a_{11}v_1 + a_{12}v_2 + \ldots + a_{1n}v_n \\
Mv_2 = a_{21}v_1 + a_{22}v_2 + \ldots + a_{2n}v_n \\
\vdots \\
Mv_n = a_{n1}v_1 + a_{n2}v_2+ \ldots + a_{nn}v_n
$$
or in other words
$$
M [ v_1 v_2 \ldots v_n] = [v_1 v_2 \ldots v_n]
\begin{bmatrix}
a_{11} & a_{21} & \ldots & a_{1n} \\
a_{12} & a_{22} & \ldots & a_{2n} \\
\vdots & \vdots & \ddots & \vdots \\
a_{1n} & a_{2n} & \ldots & a_{nn} \\
\end{bmatrix}
$$
or in other words $MV = VA$ (where $V$ and $A$ are what you expect). Let $U$ be a matrix such that the columns of $V$ and $U$ form a complete basis for the space. Then
$$
M[V \ U] = [V \ U]
\begin{bmatrix}
A & B \\
0 & C \\
\end{bmatrix}
$$
for some $B$ and $C$. This is because each column of $VB + UC$ is an arbitrary linear combination of the vectors in $V$ and $U$ – and since they form a basis, these combinations can be chosen to match the columns of $MU$.
So, if X is a matrix whose first n columns vectors form an invariant subspace of M, we can write
$$
X^{-1}MX =
\begin{bmatrix}
A & B \\
0 & C \\
\end{bmatrix}
$$
i.e., M is block triangular with respect to X.
Now suppose we can partition $R^m$ into invariant subspaces $C_1, C_2, \ldots, C_n$ of M. As before, we have $MC_i = C_iA_i$, so
$$
\begin{eqnarray}
M[C_1 \ C_2 \ \ldots \ C_n] &=& [MC_1 \ MC_2 \ \ldots \ MC_n] \\
&=& [C_1A_1 \ C_2A_2 \ \ldots \ C_nA_n] \\
&=& [C_1 \ C_2 \ \ldots \ C_n]
\begin{bmatrix}
A_1 & & & \\
& A_2 & & \\
& & \ddots & \\
& & & A_n \\
\end{bmatrix}.
\end{eqnarray}
$$
So M is block diagonal with respect to $[C_1 \ C_2 \ \ldots \ C_n]$.