distribution-is-all-you-need is the basic distribution probability tutorial for most common distribution focused on Deep learning using python library.

Overview of distribution probability

In Bayesian probability theory, if the posterior distributions p(θ | x) are in the same probability distribution family as the prior probability distribution p(θ), the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood function.
Conjugate prior, wikipedia

  • Multi-Class means that Random Varivance are more than 2.

  • N Times means that we also consider prior probability P(X).

  • To learn more about probability, I recommend reading [pattern recognition and machine learning, Bishop 2006].

distribution probabilities and features

  1. Uniform distribution(continuous), code

    • Uniform distribution has same probaility value on [a, b], easy probability.
  2. Bernoulli distribution(discrete), code

    • Bernoulli distribution is not considered about prior probability P(X). Therefore, if we optimize to the maximum likelihood, we will be vulnerable to overfitting.
    • We use binary cross entropy to classify binary classification. It has same form like taking a negative log of the bernoulli distribution.
  3. Binomial distribution(discrete), code

    • Binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments.
    • Binomial distribution is distribution considered prior probaility by specifying the number to be picked in advance.
  4. Multi-Bernoulli distribution, Categorical distribution(discrete), code

    • Multi-bernoulli called categorical distribution, is a probability expanded more than 2.
    • cross entopy has same form like taking a negative log of the Multi-Bernoulli distribution.
  5. Multinomial distribution(discrete), code

    • The multinomial distribution has the same relationship with the categorical distribution as the relationship between Bernoull and Binomial.
  6. Beta distribution(continuous), code

    • Beta distribution is conjugate to the binomial and Bernoulli distributions.
    • Using conjucation, we can get the posterior distribution more easily using the prior distribution we know.
    • Uniform distiribution is same when beta distribution met special case(alpha=1, beta=1).
  7. Dirichlet distribution(continuous), code

    • Dirichlet distribution is conjugate to the MultiNomial distributions.
    • If k=2, it will be Beta distribution.
  8. Gamma distribution(continuous), code

    • Gamma distribution will be beta distribution, if Gamma(a,1) / Gamma(a,1) + Gamma(b,1) is same with Beta(a,b).
    • The exponential distribution and chi-squared distribution are special cases of the gamma distribution.
  9. Exponential distribution(continuous), code

    • Exponential distribution is special cases of the gamma distribution when alpha is 1.
  10. Gaussian distribution(continuous), code

    • Gaussian distribution is a very common continuous probability distribution
  11. Normal distribution(continuous), code

    • Normal distribution is standarzed Gaussian distribution, it has 0 mean and 1 std.
  12. Chi-squared distribution(continuous), code

    • Chi-square distribution with k degrees of freedom is the distribution of a sum of the squares of k independent standard normal random variables.
    • Chi-square distribution is special case of Beta distribution
  13. Student-t distribution(continuous), code

    • The t-distribution is symmetric and bell-shaped, like the normal distribution, but has heavier tails, meaning that it is more prone to producing values that fall far from its mean.