Reading 8

Quantitative Methods · Hypothesis Testing

MODULE 8.1: HYPOTHESIS TESTING BASICS

LOS 8.a

Explain hypothesis testing and its components, including null and alternative hypotheses, test statistics, significance levels, and Type I and Type II errors; and explain and interpret p-values.

A hypothesis is a statement about the value of a population parameter, developed to test a theory or belief. Hypotheses are stated in terms of the population parameter being tested (e.g., the population mean $\mu$).

Hypothesis testing procedures use sample statistics and probability theory to decide whether a hypothesis is reasonable (not rejected) or unreasonable (rejected).

中文翻譯

假說（hypothesis）是針對母體參數所作的陳述，目的是檢驗某個理論或信念。假說以被測試的母體參數來表述（例如母體平均數 $\mu$）。

假說檢定（hypothesis testing）程序利用樣本統計量與機率理論，判斷某個假說是否合理（不拒絕）或不合理（拒絕）。

The hypothesis-testing process

State the hypotheses.
Select the appropriate test statistic.
Specify the level of significance.
State the decision rule.
Collect the sample and calculate the sample statistic.
Make a decision regarding the hypothesis.
Make a decision based on the results of the test.

中文翻譯

假說檢定七步驟：

陳述假說（虛無假說與對立假說）。
選擇合適的檢定統計量（test statistic）。
指定顯著水準（significance level）。
建立決策規則（decision rule）。
收集樣本並計算樣本統計量。
對假說作出決定（拒絕或不拒絕）。
根據檢定結果作出實際決策。

Null and alternative hypotheses

Null hypothesis ($H_0$) — what the researcher wants to reject. It is the hypothesis actually tested and is the basis for selecting the test statistic. The null always includes the "equal to" condition.
Alternative hypothesis ($H_a$) — what is concluded if there is sufficient evidence to reject the null. Usually what the researcher is really trying to assess. Statistics can never prove anything; when $H_0$ is discredited, the implication is that $H_a$ is valid.

For $H_0: \mu = \mu_0$, the alternative is $H_a: \mu \ne \mu_0$. The two hypotheses are mutually exclusive and exhaustive.

中文翻譯

虛無假說（null hypothesis，$H_0$）——研究者希望拒絕的假說，也是實際被檢定的假說，並作為選取檢定統計量的依據。虛無假說一定包含「等於」條件。
對立假說（alternative hypothesis，$H_a$）——若有足夠證據拒絕虛無假說時所得出的結論，通常才是研究者真正想評估的命題。統計永遠無法「證明」任何事；當 $H_0$ 被推翻，意即 $H_a$ 成立。

對於 $H_0: \mu = \mu_0$，對立假說為 $H_a: \mu \ne \mu_0$。兩個假說互斥且完全窮舉（mutually exclusive and exhaustive）。

Decision rule (two-tailed z-test at α = 0.05)

At α = 0.05, the test statistic is compared with critical z-values of ±1.96 (corresponding to $\pm z_{\alpha/2} = \pm z_{0.025}$, the range within which 95% of probability lies).

\[\text{Reject } H_0 \text{ if test statistic} < -1.96 \text{ or test statistic} > 1.96\]

Each tail of the distribution beyond ±1.96 contains 0.05 / 2 = 0.025 probability.

Commonly used critical z-values

Critical value	Use for
1.65	2-tailed test with 10% significance, or 1-tailed test with 5% significance
1.96	2-tailed test with 5% significance
2.33	1-tailed test with 1% significance
2.58	2-tailed test with 1% significance

中文翻譯

在 α = 0.05 時，檢定統計量與臨界 z 值 ±1.96 比較（對應 $\pm z_{\alpha/2} = \pm z_{0.025}$，涵蓋 95% 機率的區間）。

決策規則：若檢定統計量 < −1.96 或 > 1.96，則拒絕 $H_0$。

±1.96 之外的每一尾各包含 0.05 / 2 = 0.025 的機率。

常用臨界 z 值對照：

1.65：雙尾 10% 顯著水準，或單尾 5% 顯著水準
1.96：雙尾 5% 顯著水準
2.33：單尾 1% 顯著水準
2.58：雙尾 1% 顯著水準

The test statistic

\[\text{test statistic} = \frac{\text{sample statistic} - \text{hypothesized value}}{\text{standard error of the sample statistic}}\]

For the sample mean $\bar{x}$, the standard error is:

\[\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} \quad \text{(when population } \sigma \text{ is known)}\] \[s_{\bar{x}} = \frac{s}{\sqrt{n}} \quad \text{(when population } \sigma \text{ is unknown; use sample } s\text{)}\]

Test statistics can follow one of four distributions: t, z (standard normal), chi-square, and F. The critical value depends on the distribution.

中文翻譯

檢定統計量（test statistic）公式：

檢定統計量 = （樣本統計量 − 假設值）/ 樣本統計量的標準誤（standard error）

對於樣本平均數 $\bar{x}$，標準誤為：

已知母體 σ 時：$\sigma_{\bar{x}} = \sigma / \sqrt{n}$
未知母體 σ 時（使用樣本 s 代替）：$s_{\bar{x}} = s / \sqrt{n}$

檢定統計量可服從四種分配之一：t 分配、z 分配（標準常態）、卡方（chi-square）分配、F 分配。臨界值依所用分配而定。

Type I and Type II errors

	$H_0$ is true	$H_0$ is false
Do not reject $H_0$	Correct decision	Type II error (incorrect)
Reject $H_0$	Type I error (incorrect) — significance level α = P(Type I error)	Correct decision — power = 1 − P(Type II error)

Significance level $\alpha$ = P(Type I error) = probability of rejecting a true null.
Power of a test = P(correctly rejecting a false null) = 1 − P(Type II error).

中文翻譯

兩類錯誤決策矩陣：

第一類錯誤（Type I error）：拒絕了一個實際上為真的虛無假說。發生機率即顯著水準 α。
第二類錯誤（Type II error）：未拒絕一個實際上為假的虛無假說。

顯著水準（significance level）α = P(第一類錯誤) = 拒絕真虛無假說的機率。

檢定力（power of a test） = P(正確拒絕假的虛無假說) = 1 − P(第二類錯誤)。

Relationships and trade-offs

Decreasing α (e.g., 5% → 1%) increases the probability of Type II error and reduces the power.
For a given sample size, increasing power increases the probability of Type I error.
For a given α, the only way to decrease P(Type II error) and increase power is to increase the sample size.

We reject or fail to reject the null — never "accept" it; a null can only be supported or rejected.

中文翻譯

兩類錯誤的取捨關係：

降低 α（例如 5% → 1%），會提高第二類錯誤機率，並降低檢定力。
在樣本數固定的情況下，提高檢定力會提高第一類錯誤機率。
在 α 固定的情況下，唯一能同時降低第二類錯誤機率（提高檢定力）的方法是增加樣本數。

我們只能拒絕（reject）或不能拒絕（fail to reject）虛無假說，永遠不說「接受」它；虛無假說只能被支持或被推翻。

p-value

The p-value is the probability of obtaining a test statistic that would lead to rejecting the null, assuming the null is true. It is the smallest significance level at which the null can be rejected.

中文翻譯

p 值（p-value）是「在虛無假說為真的前提下，獲得至少如觀察值一樣極端之檢定統計量的機率」。它也是能拒絕虛無假說的最小顯著水準。

決策規則：若 p 值 < α，則拒絕 $H_0$；若 p 值 ≥ α，則不能拒絕 $H_0$。

Module Quiz 8.1

1. For a hypothesis test with a probability of Type II error of 60% and a probability of Type I error of 5%, which statement is most accurate?

A. The power of the test is 40%, and there is a 5% probability that the test statistic will exceed the critical value(s).
B. There is a 95% probability that the test statistic will be between the critical values, if this is a two-tailed test.
C. There is a 5% probability that the null hypothesis will be rejected when actually true, and the probability of rejecting the null when it is false is 40%.

C — A Type I error is rejecting the null when it is true. The probability of rejecting a false null = 1 − P(Type II) = 1 − 0.60 = 40% (the power of the test). Note that choice A misstates that the 5% is the probability the test statistic exceeds the critical value; it is the probability of rejecting a true null. (LOS 8.a)

2. If the significance level of a test is 0.05 and the probability of a Type II error is 0.15, what is the power of the test?

A. 0.850
B. 0.950
C. 0.975

A — Power = 1 − P(Type II error) = 1 − 0.15 = 0.85. The significance level (0.05) is not used in this calculation. (LOS 8.a)

MODULE 8.2: TYPES OF HYPOTHESIS TESTS

LOS 8.b

Construct hypothesis tests and determine their statistical significance, the associated Type I and Type II errors, and power of the test given a significance level.

Summary of Hypothesis Tests

Hypothesis test of	Test statistic	Degrees of freedom
One population mean	t-test (or z-test if sample is large enough)	n − 1
Two population means (independent samples)	t-test (difference in means)	n₁ + n₂ − 2
Two population means (dependent samples)	t-test (paired comparisons)	n − 1
One population variance	chi-square (χ²) test	n − 1
Two population variances	F-test	n₁ − 1, n₂ − 1

中文翻譯

各種假說檢定類型總覽：

單一母體平均數：用 t 檢定（樣本夠大時可用 z 檢定）；自由度（degrees of freedom, df）= n − 1。
兩母體平均數差（獨立樣本）：用 t 檢定（差異均值法）；自由度 = n₁ + n₂ − 2。
兩母體平均數差（相依樣本）：用 t 檢定（配對比較，paired comparisons）；自由度 = n − 1。
單一母體變異數：用卡方（chi-square, χ²）檢定；自由度 = n − 1。
兩母體變異數：用 F 檢定；自由度 = n₁ − 1（分子）、n₂ − 1（分母）。

Test 1 — Value of a Population Mean

Example — Test 1

Daily Returns on a Portfolio of Call Options

Daily returns on a portfolio of call options over 250 days had a mean of 0.1% and sample standard deviation of 0.25%. Test whether the mean daily return is different from zero.

Hypotheses:

\[H_0: \mu = 0 \quad \text{versus} \quad H_a: \mu \ne 0\]

With n = 250, the sample is large enough that the z-distribution is acceptable. At α = 0.05, critical z-values are ±1.96.

Standard error:

\[s_{\bar{x}} = \frac{s}{\sqrt{n}} = \frac{0.25\%}{\sqrt{250}} = 0.0158\%\]

Test statistic:

\[\frac{0.001}{0.000158} = 6.33\]

Because 6.33 > 1.96, reject $H_0$. The mean daily option return is significantly different from zero.

中文翻譯

【例一】選擇權投資組合日報酬率檢定

某選擇權投資組合連續 250 個交易日的日報酬率，樣本平均數為 0.1%，樣本標準差為 0.25%。檢定其日均報酬率是否顯著不為零。

假說：$H_0: \mu = 0$，$H_a: \mu \ne 0$（雙尾）。

n = 250 已夠大，可用 z 分配；α = 0.05 時臨界值為 ±1.96。

標準誤：$s_{\bar{x}} = 0.25\% / \sqrt{250} = 0.0158\%$。

檢定統計量：0.001 / 0.000158 = 6.33。

因 6.33 > 1.96，拒絕 $H_0$。日均報酬率在 5% 顯著水準下顯著不為零。

Test 2 — Difference Between Means (Independent Samples)

Used when two samples are independent and both populations are normally distributed. With unknown variances assumed equal, the pooled-variance t-test is used:

\[t = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{\sqrt{\dfrac{s_p^2}{n_1} + \dfrac{s_p^2}{n_2}}}\]

where the pooled variance is:

\[s_p^2 = \frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}\]

Degrees of freedom: $n_1 + n_2 - 2$.

Intuition: if sample means are close, the t-statistic is small and we do not reject equality; if they are far apart, the t-statistic is large and we reject.

Example — Test 2

Horizontal vs. Vertical Mergers — Abnormal Returns

Smith samples abnormal returns: horizontal mergers — mean 1.0%, std dev 1.0%; vertical mergers — mean 2.5%, std dev 2.0%. She computes t = −5.474, df = 120. At α = 0.05, the critical t-value (two-tailed) is 1.980.

t-table excerpt (right-tail probabilities)

df	p = 0.10	p = 0.05	p = 0.025
110	1.289	1.659	1.982
120	1.289	1.658	1.980
200	1.286	1.653	1.972

Because −5.474 < −1.980, reject $H_0$. Mean abnormal returns differ between horizontal and vertical mergers.

中文翻譯

當兩組樣本獨立且兩母體均服從常態分配時使用此檢定。若假設兩母體變異數相等（但未知），採用合併方差 t 檢定（pooled-variance t-test）：

\[t = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{\sqrt{s_p^2/n_1 + s_p^2/n_2}}\]

其中合併方差：$s_p^2 = [(n_1-1)s_1^2 + (n_2-1)s_2^2] / (n_1+n_2-2)$；自由度 = $n_1 + n_2 - 2$。

直觀理解：若兩樣本均值接近，t 值小，不拒絕相等假設；若相差大，t 值大，拒絕相等。

【例二】水平併購與垂直併購之異常報酬

Smith 計算得 t = −5.474，df = 120。在 α = 0.05 雙尾檢定下，查 t 表臨界值為 1.980。

因 −5.474 < −1.980，拒絕 $H_0$。兩類合併的平均異常報酬（abnormal returns）有顯著差異。

Test 3 — Paired Comparisons (Means of Dependent Samples)

Used when samples are not independent — observations both depend on a common factor (e.g., market returns, industry conditions, before/after the same event).

Hypotheses (two-tailed, general form):

\[H_0: \mu_d = \mu_{dz} \quad \text{versus} \quad H_a: \mu_d \ne \mu_{dz}\]

where $\mu_{dz}$ is the hypothesized mean difference (commonly zero).

Test statistic with $n - 1$ degrees of freedom:

\[t = \frac{\bar{d} - \mu_{dz}}{s_{\bar{d}}}\]

where:

\[\bar{d} = \frac{1}{n}\sum_{i=1}^{n} d_i, \qquad s_{\bar{d}} = \frac{s_d}{\sqrt{n}}, \qquad s_d = \sqrt{\frac{\sum_{i=1}^{n} (d_i - \bar{d})^2}{n-1}}\]

Example — Test 3

Telecom Betas Before and After Deregulation

Andrews computes t = 10.26 with n = 39 (so df = 38). At α = 0.05, the two-tailed critical t is ±2.024.

t-table excerpt (right-tail probabilities)

df	p = 0.10	p = 0.05	p = 0.025
38	1.304	1.686	2.024
39	1.304	1.685	2.023
40	1.303	1.684	2.021

Because 10.26 > 2.024, reject $H_0$. Mean firm betas differ before and after deregulation.

PROFESSOR'S NOTE — Rule of Thumb

Independent samples → difference-in-means test.
Dependent samples → paired-comparisons test.

中文翻譯

當兩組樣本非獨立（相依）時使用——例如觀察值同時受共同因素影響（市場報酬、產業環境、同一事件前後等）。

假說（雙尾）：$H_0: \mu_d = \mu_{dz}$，$H_a: \mu_d \ne \mu_{dz}$，其中 $\mu_{dz}$ 為假設的平均差（通常為零）。

檢定統計量（自由度 n − 1）：$t = (\bar{d} - \mu_{dz}) / s_{\bar{d}}$。

$\bar{d}$：各對差值（$d_i$）的樣本平均數。
$s_{\bar{d}} = s_d / \sqrt{n}$：差值樣本均值的標準誤。
$s_d$：差值的樣本標準差。

【例三】電信業除管制前後的 Beta 值

Andrews 計算得 t = 10.26，n = 39（df = 38），α = 0.05 雙尾臨界值 = ±2.024。

因 10.26 > 2.024，拒絕 $H_0$。除管制前後各公司 Beta 值有顯著差異。

經驗法則：樣本獨立 → 差異均值 t 檢定；樣本相依 → 配對比較（paired comparisons）t 檢定。

Test 4 — Value of a Population Variance (chi-square test)

Used for hypothesis tests on the variance of a normally distributed population.

Hypotheses (two-tailed):

\[H_0: \sigma^2 = \sigma_0^2 \quad \text{versus} \quad H_a: \sigma^2 \ne \sigma_0^2\]

Test statistic with $n - 1$ degrees of freedom:

\[\chi_{n-1}^2 = \frac{(n - 1) s^2}{\sigma_0^2}\]

The chi-square distribution is asymmetric and bounded below by zero — values cannot be negative. It approaches normal as df increases.

Chi-square table excerpt — right-tail probabilities

df	0.975	0.95	0.90	0.10	0.05	0.025
9	2.700	3.325	4.168	14.684	16.919	19.023
10	3.247	3.940	4.865	15.987	18.307	20.483
11	3.816	4.575	5.578	17.275	19.675	21.920
30	16.791	18.493	20.599	40.256	43.773	46.979

For a two-tailed test at α = 0.05 with df = 30: critical values are 16.791 (lower) and 46.979 (upper).

Example — Test 4

High-Return Equity Fund

Historical claim: std dev of monthly returns = 4% (so $\sigma_0^2 = 0.0016$). Recent 24-month sample: std dev = 3.8%. Test statistic = 20.76, df = 23.

Hypotheses:

\[H_0: \sigma^2 = 0.0016 \quad \text{versus} \quad H_a: \sigma^2 \ne 0.0016\]

Critical values at df = 23 (α = 0.05, two-tailed): 11.689 (lower, 0.975 column) and 38.076 (upper, 0.025 column).

Since 11.689 < 20.76 < 38.076, fail to reject $H_0$. The recent standard deviation is not significantly different from 4% at the 5% level.

中文翻譯

用於對常態分配母體的變異數（variance）進行假說檢定。

假說（雙尾）：$H_0: \sigma^2 = \sigma_0^2$；$H_a: \sigma^2 \ne \sigma_0^2$。

檢定統計量（自由度 n − 1）：$\chi^2 = (n-1)s^2 / \sigma_0^2$。

卡方分配（chi-square distribution）不對稱且有下界零——數值不可為負；隨自由度增加逐漸趨近常態分配。

在 α = 0.05、df = 30 的雙尾檢定中，臨界值為下界 16.791（查 0.975 欄）、上界 46.979（查 0.025 欄）。

【例四】高報酬股票基金月報酬率方差檢定

歷史宣稱月報酬率標準差為 4%（即 $\sigma_0^2 = 0.0016$）。最近 24 個月樣本標準差為 3.8%，計算得 $\chi^2 = 20.76$，df = 23，臨界值為 11.689 與 38.076。

因 11.689 < 20.76 < 38.076，不能拒絕 $H_0$。近期標準差在 5% 顯著水準下與 4% 無顯著差異。

Test 5 — Comparing Two Population Variances (F-test)

Used for the equality of variances of two normal populations from independent samples.

Hypotheses (two-tailed):

\[H_0: \sigma_1^2 = \sigma_2^2 \quad \text{versus} \quad H_a: \sigma_1^2 \ne \sigma_2^2\]

Test statistic (always put the larger variance in the numerator):

\[F = \frac{s_1^2}{s_2^2}\]

Degrees of freedom: $n_1 - 1$ (numerator) and $n_2 - 1$ (denominator).

Properties of the F-distribution

Right-skewed, bounded below by zero.
When sample variances are equal, F = 1.
Lower critical value is the reciprocal of the upper critical value.
By putting the larger variance in the numerator, we only need to consider the upper-tail critical value.

Example — Test 5

Textile vs. Paper Industries — Earnings Variance

Cower samples 31 textile firms (std dev = $4.30) and 41 paper firms (std dev = $3.80). Test statistic F = 1.2805. df₁ = 30, df₂ = 40. Critical F at 2.5% upper tail = 1.94.

Decision rule: Reject $H_0$ if F > 1.94.

Since 1.2805 < 1.94, fail to reject $H_0$. Earnings variances are not significantly different between the two industries.

中文翻譯

用於檢定兩個常態分配母體的變異數是否相等，要求兩組樣本獨立。

假說（雙尾）：$H_0: \sigma_1^2 = \sigma_2^2$；$H_a: \sigma_1^2 \ne \sigma_2^2$。

檢定統計量（較大的樣本方差放分子）：$F = s_1^2 / s_2^2$；自由度：分子 $n_1 - 1$，分母 $n_2 - 1$。

F 分配（F-distribution）特性：

右偏（right-skewed），下界為零。
當兩樣本方差相等時，F = 1。
下臨界值為上臨界值的倒數。
將較大方差置於分子，雙尾檢定時只需查上尾臨界值。

【例五】紡織業 vs 造紙業盈餘方差

Cower 抽取 31 家紡織廠（std dev = $4.30）與 41 家造紙廠（std dev = $3.80）。計算得 F = 1.2805，df₁ = 30，df₂ = 40，上尾 2.5% 臨界值 = 1.94。

因 1.2805 < 1.94，不能拒絕 $H_0$。兩產業的盈餘方差在 5% 顯著水準下無顯著差異。

LOS 8.c

Explain parametric and nonparametric tests and describe the situations in which the use of nonparametric tests may be appropriate.

Parametric tests rely on assumptions about the distribution of the population and are specific to population parameters. The z-test, for example, requires a defined mean and standard deviation, and either a large sample (via the central limit theorem) or a normally distributed population.

Nonparametric tests either don't consider a particular population parameter, or make few assumptions about the population.

When nonparametric tests are appropriate

Distributional assumptions fail — e.g., testing a mean for a small sample from a non-normal distribution, where neither the t-test nor z-test is appropriate.
Data are ranks (ordinal measurement scale) rather than values.
The hypothesis doesn't concern distribution parameters — e.g., testing whether a variable is normally distributed, or whether a sequence is random. The runs test estimates the probability that a series of changes (+, +, −, −, +, −, …) is random.

中文翻譯

參數檢定（parametric tests）：依賴對母體分配的假設，且針對特定母體參數。例如 z 檢定需要已知均值與標準差，且要求大樣本（透過中央極限定理，central limit theorem）或母體服從常態分配。

非參數檢定（nonparametric tests）：不針對特定母體參數，或對母體分配幾乎不作假設。

適合使用非參數檢定的三種情況：

分配假設不成立：例如小樣本且母體非常態，此時 t 檢定與 z 檢定均不適用。
資料為排序（ranks）而非數值（序位量尺，ordinal measurement scale）。
假說與分配參數無關：例如檢定某變數是否服從常態分配，或某序列是否隨機。連串檢定（runs test）可估計一系列符號變動（+, +, −, −, +, −, …）是否隨機。

Module Quiz 8.2

1. Which of the following assumptions is least likely required for the difference-in-means test based on two samples?

A. The two samples are independent.
B. The two populations are normally distributed.
C. The two populations have known variances.

C — The difference-in-means test does not require known population variances; the pooled-variance t-test is used when variances are unknown (but assumed equal). (LOS 8.b)

2. The appropriate test statistic for a test of the equality of variances for two normally distributed random variables, based on two independent random samples, is the:

A. t-test.
B. F-test.
C. χ² test.

B — The F-test is used for equality of two population variances from independent samples. (LOS 8.b)

3. The appropriate test statistic to test the hypothesis that the variance of a normally distributed population is equal to 13 is the:

A. t-test.
B. F-test.
C. χ² test.

C — A test of a single population variance uses the chi-square (χ²) test. (LOS 8.b)

KEY CONCEPTS

LOS 8.a

The hypothesis-testing process: state hypotheses → select test statistic → set significance level → state decision rule → calculate sample statistic → make a decision → act on the result.

$H_0$ is what the researcher wants to reject and always contains the "equal to" condition; $H_a$ is supported when $H_0$ is rejected.

Type I error — rejecting a true null; probability = α (the significance level). Type II error — failing to reject a false null.

Power = 1 − P(Type II error). Decreasing α increases P(Type II error). For a fixed α, increasing sample size is the only way to increase power.

p-value = smallest α at which the null would be rejected. Reject $H_0$ if p-value < α.

LOS 8.b

Hypothesis test	Statistic	df
One population mean	t (or z for large n)	n − 1
Two means — independent samples	t (pooled variance)	n₁ + n₂ − 2
Two means — dependent samples	t (paired comparisons)	n − 1
One population variance	χ²	n − 1
Two population variances	F	n₁ − 1, n₂ − 1

For the F-test, always place the larger sample variance in the numerator. The chi-square distribution is asymmetric and bounded below by zero.

LOS 8.c

Parametric tests (t-test, F-test, chi-square) make distributional assumptions. Nonparametric tests are used when distributional assumptions cannot be supported, when data are ranked (ordinal), or when the hypothesis does not concern distribution parameters (e.g., the runs test for randomness).

中文翻譯（重點整理）

LOS 8.a

假說檢定流程：陳述假說 → 選取檢定統計量 → 設定顯著水準 → 建立決策規則 → 計算樣本統計量 → 作出決定 → 採取行動。

$H_0$（虛無假說）是研究者想拒絕的，且一定含「等於」條件；$H_a$（對立假說）是 $H_0$ 被推翻後成立的命題。

第一類錯誤：拒絕真實的虛無假說，機率 = α；第二類錯誤：未拒絕假的虛無假說。

檢定力 = 1 − P(第二類錯誤)；降低 α 會提高第二類錯誤機率；在 α 固定下，唯有增加樣本數才能提高檢定力。

p 值 = 能拒絕虛無假說的最小 α 值；p 值 < α 時拒絕 $H_0$。

LOS 8.b

各類假說檢定彙整（參見上方表格）：F 檢定時較大方差放分子；卡方分配不對稱且下界為零。

LOS 8.c

參數檢定（t、F、χ²）需滿足分配假設；非參數檢定適用於：(1) 分配假設不成立；(2) 資料為序位（排序）資料；(3) 假說不涉及分配參數（如連串檢定用於檢定隨機性）。

	\(H_0\) is true	\(H_0\) is false
Do not reject \(H_0\)	Correct decision	Type II error (incorrect)
Reject \(H_0\)	Type I error (incorrect) — significance level α = P(Type I error)	Correct decision — power = 1 − P(Type II error)