2014年5月23日 星期五

Inferential statistics @ Khanacademy


Inferential statistics

https://www.khanacademy.org/math/probability/statistics-inferential



Confidence interval 1

Estimating the probability that the true population mean lies within a range around a sample mean.



Confidence interval example

Confidence Interval Example

Small sample size confidence intervals

Constructing small sample size confidence 

Z-statistics vs. T-statistics


T-Statistic Confidence Interval




Student's t-distribution

Student t pdf.svg

Probability density function[edit]

Student's t-distribution has the probability density function given by
f(t) = \frac{\Gamma(\frac{\nu+1}{2})} {\sqrt{\nu\pi}\,\Gamma(\frac{\nu}{2})} \left(1+\frac{t^2}{\nu} \right)^{-\frac{\nu+1}{2}},\!
where \nu is the number of degrees of freedom and \Gamma is the gamma function.

CDF


\begin{matrix}
     \frac{1}{2} + x \Gamma \left( \frac{\nu+1}{2} \right)  \times\\[0.5em]
     \frac{\,_2F_1 \left ( \frac{1}{2},\frac{\nu+1}{2};\frac{3}{2};
           -\frac{x^2}{\nu} \right)}
     {\sqrt{\pi\nu}\,\Gamma \left(\frac{\nu}{2}\right)}
     \end{matrix}


Student t cdf.svg



One Sided75%80%85%90%95%97.5%99%99.5%99.75%99.9%99.95%
Two Sided50%60%70%80%90%95%98%99%99.5%99.8%99.9%
11.0001.3761.9633.0786.31412.7131.8263.66127.3318.3636.6
20.8161.0611.3861.8862.9204.3036.9659.92514.0922.3331.60
30.7650.9781.2501.6382.3533.1824.5415.8417.45310.2112.92
40.7410.9411.1901.5332.1322.7763.7474.6045.5987.1738.610
50.7270.9201.1561.4762.0152.5713.3654.0324.7735.8936.869
60.7180.9061.1341.4401.9432.4473.1433.7074.3175.2085.959
70.7110.8961.1191.4151.8952.3652.9983.4994.0294.7855.408
80.7060.8891.1081.3971.8602.3062.8963.3553.8334.5015.041
90.7030.8831.1001.3831.8332.2622.8213.2503.6904.2974.781
100.7000.8791.0931.3721.8122.2282.7643.1693.5814.1444.587
110.6970.8761.0881.3631.7962.2012.7183.1063.4974.0254.437
120.6950.8731.0831.3561.7822.1792.6813.0553.4283.9304.318
130.6940.8701.0791.3501.7712.1602.6503.0123.3723.8524.221
140.6920.8681.0761.3451.7612.1452.6242.9773.3263.7874.140
150.6910.8661.0741.3411.7532.1312.6022.9473.2863.7334.073
160.6900.8651.0711.3371.7462.1202.5832.9213.2523.6864.015
170.6890.8631.0691.3331.7402.1102.5672.8983.2223.6463.965
180.6880.8621.0671.3301.7342.1012.5522.8783.1973.6103.922
190.6880.8611.0661.3281.7292.0932.5392.8613.1743.5793.883
200.6870.8601.0641.3251.7252.0862.5282.8453.1533.5523.850
210.6860.8591.0631.3231.7212.0802.5182.8313.1353.5273.819
220.6860.8581.0611.3211.7172.0742.5082.8193.1193.5053.792
230.6850.8581.0601.3191.7142.0692.5002.8073.1043.4853.767
240.6850.8571.0591.3181.7112.0642.4922.7973.0913.4673.745
250.6840.8561.0581.3161.7082.0602.4852.7873.0783.4503.725
260.6840.8561.0581.3151.7062.0562.4792.7793.0673.4353.707
270.6840.8551.0571.3141.7032.0522.4732.7713.0573.4213.690
280.6830.8551.0561.3131.7012.0482.4672.7633.0473.4083.674
290.6830.8541.0551.3111.6992.0452.4622.7563.0383.3963.659
300.6830.8541.0551.3101.6972.0422.4572.7503.0303.3853.646
400.6810.8511.0501.3031.6842.0212.4232.7042.9713.3073.551
500.6790.8491.0471.2991.6762.0092.4032.6782.9373.2613.496
600.6790.8481.0451.2961.6712.0002.3902.6602.9153.2323.460
800.6780.8461.0431.2921.6641.9902.3742.6392.8873.1953.416
1000.6770.8451.0421.2901.6601.9842.3642.6262.8713.1743.390
1200.6770.8451.0411.2891.6581.9802.3582.6172.8603.1603.373
\infty0.6740.8421.0361.2821.6451.9602.3262.5762.8073.0903.291

How the t-distribution arises

Sampling distribution

Let x1, ..., xn be the numbers observed in a sample 
from a continuously distributed population with expected value μ. 
The sample mean and sample variance are given by:

\begin{align}
\bar{x} &= \frac{x_1+\cdots+x_n}{n} \\
s^2 &= \frac{1}{n-1}\sum_{i=1}^n (x_i - \bar{x})^2
\end{align}
The resulting t-value is
 t = \frac{\bar{x} - \mu}{s/\sqrt{n}}.


Suppose X1, ..., Xn are independent realizations 
of the normally-distributed, random variable X,
which has an expected value μ and variance σ2.

Let
        \overline{X}_n = \frac{1}{n}(X_1+\cdots+X_n)

        S_n^{\;2} = \frac{1}{n-1}\sum_{i=1}^n\left(X_i-\overline{X}_n\right)^2

==>

Z = \left(\overline{X}_n-\mu\right)\frac{\sqrt{n}}{\sigma}
 is normally distributed with mean 0 and variance 1,

V = (n-1)\frac{S_n^2}{\sigma^2}
has a chi-squared distribution with v=n−1 degrees of freedom


T \equiv \frac{Z}{\sqrt{V/v}} = \left(\overline{X}_n-\mu\right)\frac{\sqrt{n}}{S_n},
has a Student's t-distribution with v=n−1 degrees of freedom

Parametersν = (n-1)> 0 degrees of freedom (real)
Mean0 for ν > 1, otherwise undefined
Variance\textstyle\frac{\nu}{\nu-2} for ν > 2, ∞ for 1 < ν ≤ 2, otherwiseundefined
pdf\textstyle\frac{\Gamma \left(\frac{\nu+1}{2} \right)} {\sqrt{\nu\pi}\,\Gamma \left(\frac{\nu}{2} \right)} \left(1+\frac{x^2}{\nu} \right)^{-\frac{\nu+1}{2}}\!
CDF\begin{matrix}
     \frac{1}{2} + x \Gamma \left( \frac{\nu+1}{2} \right)  \times\\[0.5em]
     \frac{\,_2F_1 \left ( \frac{1}{2},\frac{\nu+1}{2};\frac{3}{2};
           -\frac{x^2}{\nu} \right)}
     {\sqrt{\pi\nu}\,\Gamma \left(\frac{\nu}{2}\right)}
     \end{matrix}
where 2F1 is the hypergeometric function

Density of the t-distribution (red) for 1, 5, and 30 degrees of freedom 
compared to the standard normal distribution (blue).
Previous plots shown in green.

1df5df30df

Normal Distribution for Z value:

File:NormalDist1.96.png


T distribution for T value, with v= inf, v=10, v=1














blue:    Normal,     95% ==>  1.96
red:     T, v= 10,    95% ==>  2.28
green: T, v=1,       95% ==> 12.71



沒有留言:

張貼留言