Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. If there are n observations with p variables, then the number of distinct principal components is \(min(n-1,p)\). This transformation is defined in such a way that the first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components. The resulting vectors are an uncorrelated orthogonal basis set. PCA is sensitive to the relative scaling of the original variables.
More …
The assumption that a given time series is stationary ergodic is one of the most general assumptions used in statistics; in particular, it allows for arbitrary long-range serial dependence, and sub- sumes most of the nonparametric as well as modelling assumptions used in the literature on clustering time series, such as i.i.d., (hidden) Markov, or mixing time series.
More …
In numerical optimization, the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm is an iterative method for solving unconstrained nonlinear optimization problems.
More …
Below is a list most interesting data sources I’ve come across:
Big Data: 33 Brilliant And Free Data Sources Anyone Can Use
Awesome Public Datasets – GitHub
More …
In mathematical statistics, the Fisher information
is a way of measuring the amount of information that an observable random variable X carries
about an unknown parameter θ of a distribution that models X.
Formally, it is the variance of the score, or the expected value of the observed information.
In Bayesian statistics, the Asymptotic distribution of the posterior mode depends on the Fisher
information and not on the prior (according to the Bernstein–von Mises theorem,
which was anticipated by Laplace for exponential families).
The Fisher information is also used in the calculation of the Jeffreys prior,
which is used in Bayesian statistics.
The Fisher-information matrix is used to calculate the covariance matrices associated
with maximum-likelihood estimates. It can also be used in the formulation of test statistics,
such as the Wald test.
The Fisher information has been used to find bounds on the accuracy of neural codes.
It is also used in machine learning techniques such as elastic weight consolidation,
which reduces catastrophic forgetting in artificial neural networks.
More …