Reading 7

Quantitative Methods · Estimation and Inference

MODULE 7.1: SAMPLING TECHNIQUES AND THE CENTRAL LIMIT THEOREM

LOS 7.a

Compare and contrast simple random, stratified random, cluster, convenience, and judgmental sampling and their implications for sampling error in an investment problem.

Probability sampling refers to selecting a sample when we know the probability of each sample member in the overall population. With random sampling, each item is assumed to have the same probability of being selected. If we have a population of data and select our sample by using a computer to randomly select a number of observations from the population, each data point has an equal probability of being selected—we call this simple random sampling. If we want to estimate the mean profitability for a population of firms, this may be an appropriate method.

Nonprobability sampling is based on either low cost and easy access to some data items, or on using the judgment of the researcher in selecting specific data items. Less randomness in selection may lead to greater sampling error.

中文翻譯

機率抽樣（Probability sampling）是指在我們已知每位樣本成員在整體母群中被抽到的機率的情況下進行抽樣。在隨機抽樣（random sampling）下，每一個項目被選中的機率相同。若我們有一組母群資料，並利用電腦隨機挑選若干觀察值，則每個資料點被選中的機率相等——這種做法稱為簡單隨機抽樣（simple random sampling）。若我們想估計某一母群企業的平均獲利能力，此方法便相當合適。

非機率抽樣（Nonprobability sampling）是以「取得某些資料的成本低廉且便利」，或「研究者的主觀判斷」作為選取依據。選取過程的隨機性越低，抽樣誤差（sampling error）通常越大。

Probability Sampling Methods

Simple random sampling is a method of selecting a sample in such a way that each item or person in the population being studied has the same likelihood of being included in the sample. As an example of simple random sampling, assume that you want to draw a sample of 5 items out of a group of 50 items. This can be accomplished by numbering each of the 50 items, placing them in a hat, and shaking the hat. Next, one number can be drawn randomly from the hat. Repeating this process (experiment) four more times results in a set of five numbers. The five drawn numbers (items) comprise a simple random sample from the population.

In applications like this one, a random-number table or a computer random-number generator is often used to create the sample. Another way to form an approximately random sample is systematic sampling—selecting every \(n\)th member from a population.

Stratified random sampling uses a classification system to separate the population into smaller groups based on one or more distinguishing characteristics. From each subgroup, or stratum, a random sample is taken and the results are pooled. The size of the samples from each stratum is based on the size of the stratum relative to the population.

Stratified sampling is often used in bond indexing because of the difficulty and cost of completely replicating the entire population of bonds. In this case, bonds in a population are categorized (stratified) according to major bond risk factors including, but not limited to, duration, maturity, and coupon rate. Then, samples are drawn from each separate category and combined to form a final sample.

To see how this works, suppose you want to construct a portfolio of 100 bonds that is indexed to a major municipal bond index of 1,000 bonds, using a stratified random sampling approach. First, the entire population of 1,000 municipal bonds in the index can be classified on the basis of maturity and coupon rate. Then, cells (stratum) can be created for different maturity/coupon combinations, and random samples can be drawn from each of the maturity/coupon cells. To sample from a cell containing 50 bonds with 2-to-4-year maturities and coupon rates less than 5%, we would select five bonds. The number of bonds drawn from a given cell corresponds to the cell's weight relative to the population (index), or \((50/1{,}000) \times 100 = 5\) bonds. This process is repeated for all the maturity/coupon cells, and the individual samples are combined to form the portfolio.

By using stratified sampling, we guarantee that we sample five bonds from this cell. If we had used simple random sampling, there would be no guarantee that we would sample any of the bonds in the cell. Or, we may have selected more than five bonds from this cell.

Cluster sampling is also based on subsets of a population, but in this case, we are assuming that each subset (cluster) is representative of the overall population with respect to the item we are sampling. For example, we may have data on personal incomes for a state's residents by county. The data for each county is a cluster.

In one-stage cluster sampling, a random sample of clusters is selected, and all the data in those clusters comprise the sample. In two-stage cluster sampling, random samples from each of the selected clusters comprise the sample. Contrast this with stratified random sampling, in which random samples are selected from every subgroup.

To the extent that the subgroups do not have the same distribution as the entire population of the characteristic of interest, cluster sampling will have greater sampling error than simple random sampling. Two-stage cluster sampling can be expected to have greater sampling error than one-stage cluster sampling. Lower cost and less time required to assemble the sample are the primary advantages of cluster sampling, and it may be most appropriate for a smaller pilot study.

中文翻譯

機率抽樣方法

簡單隨機抽樣是一種抽樣方式，使研究母群中的每個個體或項目被納入樣本的可能性相同。以抽取 50 個項目中的 5 個為例：對 50 個項目逐一編號後放入帽子中搖勻，每次隨機抽出一個號碼，重複五次，所得的五個號碼即構成一個簡單隨機樣本。

實際應用中通常使用亂數表或電腦亂數產生器。另一種近似隨機的方法是系統抽樣（systematic sampling）——即每隔第 \(n\) 個成員抽取一個。

分層隨機抽樣（Stratified random sampling）依據一個或多個區別特徵，將母群分成較小的子群（層次，stratum），再從每個子群中隨機抽樣並將結果合併。各層的樣本數量根據該層占整體母群的比例決定。

分層抽樣常用於債券指數化，因為完整複製整個債券母群的成本高昂且困難。此時依照存續期間（duration）、到期期限（maturity）、票息率（coupon rate）等主要風險因子對債券進行分層，再從各類別中抽樣並合併。

舉例：假設要以分層隨機抽樣方法，建構一個追蹤 1,000 支市政債券指數的 100 支債券組合。首先按到期期限與票息率對 1,000 支債券分類，建立各種到期/票息組合的格子（層）；其中某格有 50 支到期期限 2 至 4 年、票息率低於 5% 的債券，應從中抽取 \((50/1{,}000) \times 100 = 5\) 支。此流程重複至所有格子，合併即得最終組合。

分層抽樣確保我們從每一層都有適當數量的抽樣；若用簡單隨機抽樣，則無法保證該層被抽到，也可能被過度抽取。

群集抽樣（Cluster sampling）同樣以母群的子集為基礎，但此處假設每個子集（群集，cluster）在抽樣特徵上均代表整體母群。例如，我們有全州各縣居民的個人所得資料，每個縣的資料即為一個群集。

單階段群集抽樣（one-stage cluster sampling）：隨機選取若干群集，所選群集的全部資料構成樣本。兩階段群集抽樣（two-stage cluster sampling）：先隨機選取群集，再從各選定群集中再次隨機抽樣。與分層隨機抽樣不同，分層抽樣是從每個子群各自抽樣，而群集抽樣只從「被選中的群集」中抽。

若各子群的特徵分布與整體母群不同，群集抽樣的抽樣誤差會大於簡單隨機抽樣；兩階段群集抽樣的誤差又大於單階段。群集抽樣的主要優點是成本低、組樣時間短，最適合小規模的前導性研究（pilot study）。

Nonprobability Sampling Methods

Convenience sampling refers to selecting sample data based on ease of access, using data that are readily available. Because such a sample is typically not random, sampling error will be greater. An analyst should initially look at the data before adopting a sampling method with less sampling error.

Judgmental sampling refers to samples for which each observation is selected from a larger dataset by the researcher, based on one's experience and judgment. As an example, a researcher interested in assessing company compliance with accounting standards may have experience suggesting that evidence of noncompliance is typically found in certain ratios derived from the financial statements. The researcher may select only data on these items. Researcher bias (or simply poor judgment) may lead to samples that have excessive sampling error. In the absence of bias or poor judgment, judgmental sampling may produce a more representative sample or allow the researcher to focus on a sample that offers good data on the characteristic or statistic of interest.

An important consideration when sampling is ensuring that the distribution of data of interest is constant for the whole population being sampled. For example, judging a characteristic of U.S. banks using data from 2005 to 2015 may not be appropriate. Regulatory reform of the banking industry after the financial crisis of 2007–2008 may have resulted in significant changes in banking practices, so that the mean of a statistic precrisis and its mean value across the population of banks postcrisis are quite different. Pooling the data over the entire period from 2005 to 2015 would not be appropriate if this is the case, and the sample mean calculated from these data would not be a good estimate of either precrisis or postcrisis mean values.

中文翻譯

非機率抽樣方法

便利抽樣（Convenience sampling）是指依據取得的便利性來選取樣本，使用隨手可得的資料。由於此類樣本通常缺乏隨機性，抽樣誤差往往較大。分析師在決定採用抽樣誤差較小的方法之前，應先對資料有初步了解。

判斷抽樣（Judgmental sampling）是指研究者依據自身經驗與判斷，從較大的資料集中主觀挑選每個觀察值。例如，某研究者想評估企業是否遵循會計準則，其過往經驗顯示違規跡象通常出現在特定的財務比率中，因此他只選取這些項目的資料。研究者的偏誤或判斷失準可能導致樣本產生過大的抽樣誤差；但在沒有偏誤且判斷得當的情況下，判斷抽樣有可能產生更具代表性的樣本，或讓研究者聚焦於對目標特徵而言資料品質良好的樣本。

抽樣時另一個重要考量是：確保所抽樣本的目標特徵分布在整個母群中保持一致。例如，以 2005 至 2015 年的資料評估美國銀行業某一特徵可能並不合適——2007 至 2008 年金融危機後的監管改革可能使銀行業的做法產生重大改變，導致危機前後某一統計量的母群均值差異顯著。若情況如此，將這段期間的資料混合使用便不恰當，所計算出的樣本均值也無法準確估計危機前或危機後的真實母群均值。

LOS 7.b

Explain the central limit theorem and its importance for the distribution and standard error of the sample mean.

The central limit theorem states that for simple random samples of size \(n\) from a population with a mean \(\mu\) and a finite variance \(\sigma^2\), the sampling distribution of the sample mean \(\bar{x}\) approaches a normal probability distribution with mean \(\mu\) and a variance equal to \(\dfrac{\sigma^2}{n}\) as the sample size becomes large.

The central limit theorem is extremely useful because the normal distribution is relatively easy to apply to hypothesis testing and to the construction of confidence intervals. Specific inferences about the population mean can be made from the sample mean, regardless of the population's distribution, as long as the sample size is sufficiently large, which usually means \(n \geq 30\).

Important properties of the central limit theorem include the following:

If the sample size \(n\) is sufficiently large (\(n \geq 30\)), the sampling distribution of the sample means will be approximately normal. Remember what's going on here: random samples of size \(n\) are repeatedly being taken from an overall larger population. Each of these random samples has its own mean, which is itself a random variable, and this set of sample means has a distribution that is approximately normal.
The mean of the population, \(\mu\), and the mean of the distribution of all possible sample means are equal.
The variance of the distribution of sample means is \(\dfrac{\sigma^2}{n}\), the population variance divided by the sample size.

The standard error of the sample mean is the standard deviation of the distribution of the sample means.

When the standard deviation of the population, \(\sigma\), is known, the standard error of the sample mean is calculated as:

\[\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}\]

where:

\(\sigma_{\bar{x}}\) = standard error of the sample mean
\(\sigma\) = standard deviation of the population
\(n\) = size of the sample

However, practically speaking, the population's standard deviation is almost never known. Instead, the standard error of the sample mean must be estimated by dividing the standard deviation of the sample by \(\sqrt{n}\):

\[s_{\bar{x}} = \frac{s}{\sqrt{n}}\]

中文翻譯

中央極限定理（central limit theorem）指出：對於一個均值為 \(\mu\)、有限變異數為 \(\sigma^2\) 的母群，以簡單隨機抽樣抽取大小為 \(n\) 的樣本，當樣本數夠大時，樣本均值 \(\bar{x}\) 的抽樣分配會趨近於均值為 \(\mu\)、變異數為 \(\dfrac{\sigma^2}{n}\) 的常態分配。

中央極限定理非常實用，因為常態分配相對容易應用於假設檢定（hypothesis testing）與信賴區間（confidence intervals）的建構。只要樣本夠大（通常指 \(n \geq 30\)），無論母群本身的分配為何，我們都可以利用樣本均值對母群均值進行特定推論。

中央極限定理的重要性質如下：

若樣本數 \(n\) 夠大（\(n \geq 30\)），樣本均值的抽樣分配將近似常態分配。此處的意思是：從較大的整體母群中反覆抽取大小為 \(n\) 的隨機樣本，每次樣本各有其均值（均值本身即為一隨機變數），這些樣本均值所形成的分配近似常態。
母群均值 \(\mu\) 等於所有可能樣本均值之分配的期望值。
樣本均值分配的變異數為 \(\dfrac{\sigma^2}{n}\)，即母群變異數除以樣本數。

樣本均值的標準誤（standard error of the sample mean）是樣本均值分配的標準差。

當母群標準差 \(\sigma\) 已知時，樣本均值的標準誤計算如下：

\[\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}\]

其中：\(\sigma_{\bar{x}}\) 為樣本均值的標準誤；\(\sigma\) 為母群標準差；\(n\) 為樣本數。

然而在實務上，母群標準差幾乎永遠未知。因此必須以樣本標準差除以 \(\sqrt{n}\) 來估計樣本均值的標準誤：

\[s_{\bar{x}} = \frac{s}{\sqrt{n}}\]

Example

Standard error of sample mean (unknown population variance) — n = 30

Suppose a sample contains the past 30 monthly returns for McCreary, Inc. The mean return is 2%, and the sample standard deviation is 20%. Calculate and interpret the standard error of the sample mean.

Answer:

Because \(\sigma\) is unknown, this is the standard error of the sample mean:

\[s_{\bar{x}} = \frac{s}{\sqrt{n}} = \frac{20\%}{\sqrt{30}} = 3.6\%\]

This implies that if we took all possible samples of the size of 30 from McCreary's monthly returns and prepared a sampling distribution of the sample means, the mean would be 2% with a standard error of 3.6%.

中文翻譯

例題：樣本均值標準誤（母群變異數未知）— n = 30

假設某樣本包含 McCreary 公司過去 30 個月的報酬率，樣本均值為 2%，樣本標準差為 20%。計算並解讀樣本均值的標準誤。

解：由於 \(\sigma\) 未知，樣本均值標準誤為：

\[s_{\bar{x}} = \frac{s}{\sqrt{n}} = \frac{20\%}{\sqrt{30}} = 3.6\%\]

這意味著，若我們從 McCreary 的月報酬中反覆抽取大小為 30 的所有可能樣本，並建立樣本均值的抽樣分配，則該分配的均值為 2%，標準誤為 3.6%。

Example

Standard error of sample mean (unknown population variance) — n = 200

Continuing with our example, suppose that instead of a sample size of 30, we take a sample of the past 200 monthly returns for McCreary, Inc. To highlight the effect of sample size on the sample standard error, let's assume that the mean return and standard deviation of this larger sample remain at 2% and 20%, respectively. Now, calculate the standard error of the sample mean for the 200-return sample.

Answer:

The standard error of the sample mean is computed as follows:

\[s_{\bar{x}} = \frac{s}{\sqrt{n}} = \frac{20\%}{\sqrt{200}} = 1.4\%\]

The result of the preceding two examples illustrates an important property of sampling distributions. Notice that the value of the standard error of the sample mean decreased from 3.6% to 1.4% as the sample size increased from 30 to 200. This is because as the sample size increases, the sample mean gets closer, on average, to the true mean of the population. In other words, the distribution of the sample means about the population mean gets smaller and smaller, so the standard error of the sample mean decreases.

中文翻譯

例題：樣本均值標準誤（母群變異數未知）— n = 200

承接上例，假設從 McCreary 公司改抽 200 個月的報酬資料。為凸顯樣本數對標準誤的影響，假設均值與標準差仍維持 2% 與 20%。計算此 200 個報酬樣本的樣本均值標準誤。

解：

\[s_{\bar{x}} = \frac{s}{\sqrt{n}} = \frac{20\%}{\sqrt{200}} = 1.4\%\]

前兩個例題說明了抽樣分配的一個重要性質：當樣本數從 30 增加至 200，樣本均值的標準誤從 3.6% 下降至 1.4%。這是因為樣本越大，樣本均值平均而言越接近母群真實均值——換言之，樣本均值圍繞母群均值的分散程度越來越小，標準誤也隨之降低。

LOS 7.c

Describe the use of resampling (bootstrap, jackknife) to estimate the sampling distribution of a statistic.

Previously, we used the sample variance to calculate the standard error of our estimate of the mean. The standard error provides better estimates of the distribution of sample means when the sample is unbiased and the distribution of sample means is approximately normal.

Two alternative methods of estimating the standard error of the sample mean involve resampling of the data. The first of these, termed the jackknife, calculates multiple sample means, each with one of the observations removed from the sample. The standard deviation of these sample means can then be used as an estimate of the standard error of sample means. The jackknife is a computationally simple tool and can be used when the number of observations available is relatively small. This method can remove bias from statistical estimates.

The jackknife (so named because it is a handy and readily available tool) was developed when computational power was not as readily available and as low cost as today. A bootstrap method is more computationally demanding, but it has some advantages. To estimate the standard error of the sample mean, we draw repeated samples of size \(n\) from the full dataset (replacing the sampled observations each time). We can then directly calculate the standard deviation of these sample means as our estimate of the standard error of the sample mean.

The bootstrap method can improve accuracy compared to using only the data in a single sample, and it can be used to construct confidence intervals for various statistics in addition to the mean, such as the median. This method can also be used to estimate the distributions of complex statistics, including those that do not have an analytic form.

中文翻譯

前面我們用樣本變異數來計算均值估計值的標準誤。在樣本無偏且樣本均值分配近似常態的前提下，標準誤可以提供對樣本均值分配較佳的估計。

還有兩種利用資料重抽（resampling）來估計樣本均值標準誤的替代方法。第一種稱為折刀法（jackknife）：每次從樣本中移除一個觀察值後計算樣本均值，共得到 \(n\) 個樣本均值，其標準差即作為樣本均值標準誤的估計。折刀法計算簡單，適合觀察值數量相對較少的情況，且可以消除統計估計量的偏差（bias）。

折刀法之所以得名，是因為它「方便隨手可用」（如折疊刀）——該方法誕生於電腦計算資源尚不普及且成本高昂的年代。拔靴法（bootstrap）的計算需求更高，但具有一些優勢：為估計樣本均值的標準誤，我們從完整資料集中反覆有放回地抽取大小為 \(n\) 的樣本，再直接計算這些樣本均值的標準差，作為標準誤的估計。

拔靴法比僅用單一樣本資料能提供更準確的估計，且可用於建構多種統計量（不僅是均值，也包含中位數等）的信賴區間。此方法還可用於估計複雜統計量的分配，包括那些沒有解析形式（analytic form）的統計量。

MODULE QUIZ 7.1

1. A simple random sample is a sample drawn in such a way that each member of the population has:

A. some chance of being selected in the sample.
B. an equal chance of being included in the sample.
C. a 1% chance of being included in the sample.

B — In a simple random sample, each element of the population has an equal probability of being selected. The 1% chance answer option allows for an equal chance, but only if there are 100 elements in the population from which the random sample is drawn. (LOS 7.a)

2. To apply the central limit theorem to the sampling distribution of the sample mean, the sample is usually considered to be large if \(n\) is at least:

A. 20.
B. 25.
C. 30.

C — Sample sizes of 30 or greater are typically considered large. (LOS 7.b)

3. Which of the following techniques to improve the accuracy of confidence intervals on a statistic is most computationally demanding?

A. Jackknife resampling.
B. Systematic resampling.
C. Bootstrap resampling.

C — Bootstrap resampling, repeatedly drawing samples of equal size from a large dataset, is more computationally demanding than the jackknife. We have not defined systematic resampling as a specific technique. (LOS 7.c)

中文翻譯（題目）

1. 簡單隨機抽樣是一種抽取樣本的方法，使母群中每位成員：

A. 有某種被選入樣本的機會。
B. 被納入樣本的機會相等。
C. 被納入樣本的機會為 1%。

答：B。簡單隨機抽樣中，母群每個元素被選中的機率相等。選項 C 的 1% 雖然相等，但只有在母群恰好有 100 個元素時才成立，並非通用條件。（LOS 7.a）

2. 要將中央極限定理應用於樣本均值的抽樣分配，樣本通常需要達到多大才算「夠大」？

A. 20
B. 25
C. 30

答：C。樣本數達到 30 或以上通常被認為夠大。（LOS 7.b）

3. 下列哪種改善統計量信賴區間準確性的方法，計算需求最高？

A. 折刀法（Jackknife）重抽樣
B. 系統性重抽樣（Systematic resampling）
C. 拔靴法（Bootstrap）重抽樣

答：C。拔靴法需要反覆從大型資料集中抽取等量樣本，計算需求高於折刀法。「系統性重抽樣」並非本課程定義的特定技術。（LOS 7.c）

KEY CONCEPTS

LOS 7.a

Simple random sampling is a method of selecting a sample in such a way that each item or person in the population being studied has the same probability of being included in the sample.

Stratified random sampling involves randomly selecting samples proportionally from subgroups that are formed based on one or more distinguishing characteristics of the data, so that random samples from the subgroups will have the same distribution of these characteristics as the overall population.

Cluster sampling is also based on subgroups (not necessarily based on data characteristics) of a larger dataset. In one-stage cluster sampling, the sample is formed from randomly chosen clusters (subsets) of the overall dataset. In two-stage cluster sampling, random samples are taken from each of the randomly chosen clusters (subgroups).

Convenience sampling refers to selecting sample data based on ease of access, using data that are readily available. Judgmental sampling refers to samples for which each observation is selected from a larger dataset by the researcher, based on the researcher's experience and judgment. Both are examples of nonprobability sampling and are nonrandom.

LOS 7.b

The central limit theorem states that for a population with a mean \(\mu\) and a finite variance \(\sigma^2\), the sampling distribution of the sample mean of all possible samples of size \(n\) (for \(n \geq 30\)) will be approximately normally distributed with a mean equal to \(\mu\) and a variance equal to \(\sigma^2/n\).

The standard error of the sample mean is the standard deviation of the distribution of the sample means and is calculated as \(\sigma_{\bar{X}} = \dfrac{\sigma}{\sqrt{n}}\) (where \(\sigma\), the population standard deviation, is known) and as \(s_{\bar{X}} = \dfrac{s}{\sqrt{n}}\) (where \(s\), the sample standard deviation, is used because the population standard deviation is unknown).

LOS 7.c

Two resampling techniques to improve our estimates of the distribution of sample statistics are the jackknife and bootstrap. With the jackknife, we calculate \(n\) sample means, one with each observation in a sample of size \(n\) removed, and base our estimate on the standard error of sample means of size \(n\). This can remove bias from our estimates based on the sample standard deviation without resampling.

With bootstrap resampling, we use the distribution of sample means (or other statistics) from a large number of samples of size \(n\), drawn from a large dataset. Bootstrap resampling can improve our estimates of the distribution of various sample statistics and provide such estimates when analytical methods will not.

中文翻譯（重點整理）

LOS 7.a

簡單隨機抽樣：母群中每個個體被納入樣本的機率相同。

分層隨機抽樣：依據一個或多個區別特徵將母群分層，按比例從各層隨機抽樣並合併，使各子群樣本的特徵分布與整體母群一致。

群集抽樣：以較大資料集中的子群為基礎（子群不一定根據資料特徵劃分）。單階段群集抽樣從隨機選定的群集（子集）中取全部資料；兩階段群集抽樣則從各選定群集中再各自隨機抽樣。

便利抽樣：依取得便利性選取資料；判斷抽樣：研究者依自身經驗與判斷主觀挑選觀察值。兩者均屬非機率抽樣，為非隨機方法。

LOS 7.b

中央極限定理：對於均值為 \(\mu\)、有限變異數為 \(\sigma^2\) 的母群，當 \(n \geq 30\) 時，所有可能大小為 \(n\) 之樣本的樣本均值分配，將近似均值為 \(\mu\)、變異數為 \(\sigma^2/n\) 的常態分配。

樣本均值標準誤：已知母群標準差 \(\sigma\) 時，\(\sigma_{\bar{X}} = \dfrac{\sigma}{\sqrt{n}}\)；母群標準差未知時，以樣本標準差 \(s\) 代入，\(s_{\bar{X}} = \dfrac{s}{\sqrt{n}}\)。

LOS 7.c

折刀法（jackknife）：每次移除一個觀察值後各算一個樣本均值，共計算 \(n\) 個，以其標準差估計樣本均值標準誤，並可消除估計偏差，無須重新抽樣。

拔靴法（bootstrap）：從大型資料集中有放回地反覆抽取大小為 \(n\) 的樣本，建立樣本均值（或其他統計量）的分配，可改善估計精度，並在解析方法不可行時仍能提供分配估計。