Reading 11
MODULE 11.1: INTRODUCTION TO FINTECH
Describe aspects of "fintech" that are directly relevant for the gathering and analyzing of financial data.
The term fintech refers to developments in technology that can be applied to the financial services industry. Companies that are in the business of developing technologies for the finance industry are often referred to as fintech companies.
Some of the primary areas where fintech is developing include the following:
- Increasing functionality to handle large sets of data that may come from many sources and exist in various forms
- Tools and techniques such as artificial intelligence for analyzing very large datasets
金融科技(fintech)一詞,指的是可應用於金融服務業的各類科技發展。從事開發金融業相關技術的公司,通常被稱為金融科技公司(fintech companies)。
金融科技目前主要發展的領域包括以下幾項:
- 提升處理大型資料集的能力,這些資料可能來自多種來源、以各種形式存在
- 運用人工智慧等工具與技術,分析超大型資料集
Describe Big Data, artificial intelligence, and machine learning.
Big Data is a widely used expression that refers to all the potentially useful information that is generated in the economy. This includes not only data from traditional sources, such as financial markets, company financial reports, and government economic statistics, but also alternative data from nontraditional sources. Some of these nontraditional sources are as follows:
- Individuals generate usable data such as social media posts, online reviews, email, and website visits.
- Businesses generate potentially useful information such as bank records and retail scanner data. These kinds of data are referred to as corporate exhaust.
- Sensors, such as radio frequency identification chips, are embedded in numerous devices such as smartphones and smart buildings. The broad network of such devices is referred to as the Internet of Things.
大數據(Big Data)是一個廣泛使用的術語,指的是經濟體中所有潛在有用的資訊。這不僅包括來自傳統來源的資料(如金融市場、公司財務報告、政府經濟統計數據),也包括來自非傳統來源的替代資料(alternative data)。這些非傳統來源包括:
- 個人產生的可用資料,如社群媒體貼文、網路評論、電子郵件及網站瀏覽記錄。
- 企業產生的潛在有用資訊,如銀行記錄和零售掃描器資料。這類資料被稱為企業廢氣(corporate exhaust)。
- 感測器,例如射頻識別(RFID)晶片,嵌入於智慧型手機和智慧建築等眾多裝置中。這類裝置所組成的廣泛網路被稱為物聯網(Internet of Things)。
Characteristics of Big Data
Characteristics of Big Data include its volume, velocity, and variety.
The volume of data continues to grow by orders of magnitude. The units in which data can be measured have increased from megabytes and gigabytes to terabytes (1,000 gigabytes) and even petabytes (1,000 terabytes).
Velocity refers to how quickly data are communicated. Real-time data such as stock market price feeds are said to have low latency. Data that are only communicated periodically or with a lag are said to have high latency.
The variety of data refers to the varying degrees of structure in which data may exist. These range from structured forms (e.g., spreadsheets and databases), to semistructured forms (e.g., photos and web page code), to unstructured forms (e.g., video).
大數據的特徵包括其量(volume)、速度(velocity)和多樣性(variety)。
資料量(Volume)持續呈數量級增長。資料的計量單位已從百萬位元組(MB)和十億位元組(GB)提升至兆位元組(TB,即 1,000 GB)甚至千兆位元組(PB,即 1,000 TB)。
速度(Velocity)指的是資料傳遞的快慢。像股市報價這類即時資料,被稱為具有低延遲(latency);只有定期或有時滯才傳遞的資料,則稱為高延遲。
多樣性(Variety)指資料可能以不同的結構程度存在。從結構化形式(如試算表和資料庫)、半結構化形式(如照片和網頁程式碼),到非結構化形式(如影片)均有涵蓋。
Data Science
The field of data science concerns how we extract information from Big Data. Data science describes methods for processing and visualizing data. Processing methods include the following:
- Capture. This is collecting data and transforming it into usable forms.
- Curation. This is assuring data quality by adjusting for bad or missing data.
- Storage. This is archiving and accessing data.
- Search. This is examining stored data to find needed information.
- Transfer. This is moving data from their source or a storage medium to where they are needed.
Visualization techniques include the familiar charts and graphs that display structured data. To visualize less structured data requires other methods. Some examples of these are word clouds that illustrate the frequency with which words appear in a sample of text, or mind maps that display logical relations among concepts.
Taking advantage of Big Data presents numerous challenges. Analysts must ensure that the data they use are of high quality, accounting for the possibilities of outliers, bad or missing data, or sampling biases. The volume of data collected must be sufficient and appropriate for its intended use.
資料科學(data science)領域關注的是如何從大數據中萃取資訊。資料科學描述了處理和視覺化資料的方法。處理方法包含以下幾種:
- 擷取(Capture):收集資料並將其轉換為可使用的形式。
- 整理(Curation):透過處理不良或遺失資料來確保資料品質。
- 儲存(Storage):封存和存取資料。
- 搜尋(Search):檢視已儲存的資料以找到所需資訊。
- 傳輸(Transfer):將資料從來源或儲存媒介移至需要的地方。
視覺化技術包括熟悉的圖表,用於顯示結構化資料。若要視覺化結構性較低的資料,則需要其他方法,例如文字雲(展示文本中詞彙出現頻率)或心智圖(顯示概念之間的邏輯關係)。
善用大數據面臨諸多挑戰。分析師必須確保所使用資料的品質,考慮到可能存在的異常值、不良或缺失資料,或抽樣偏差。收集的資料量必須充足且符合其預期用途。
Artificial Intelligence
The need to process and organize data before using it can be especially problematic with qualitative and unstructured data. This is a process to which artificial intelligence, or computer systems that can be programmed to simulate human cognition, may be applied usefully. Neural networks are an example of artificial intelligence in that they are programmed to process information in a way similar to the human brain.
在使用資料之前需要對其進行處理和整理,這對於定性和非結構化資料來說尤其棘手。這類處理過程可以善用人工智慧(artificial intelligence)——即可被程式化以模擬人類認知的電腦系統。神經網路(neural networks)是人工智慧的一個例子,因為它們被程式化為以類似人類大腦的方式處理資訊。
Machine Learning
An important development in the field of artificial intelligence is machine learning. In machine learning, a computer algorithm is given inputs of source data, with no assumptions about their probability distributions, and may be given outputs of target data. The algorithm is designed to learn, without human assistance, how to model the output data based on the input data, or to learn how to detect and recognize patterns in the input data.
Machine learning typically requires vast amounts of data. A typical process begins with a training dataset in which the algorithm looks for relationships. A validation dataset is then used to refine these relationship models, which can then be applied to a test dataset to analyze their predictive ability.
In supervised learning, the input and output data are labeled, the machine learns to model the outputs from the inputs, and then the machine is given new data on which to use the model. In unsupervised learning, the input data are not labeled, and the machine learns to describe the structure of the data. Deep learning is a technique that uses layers of neural networks to identify patterns, beginning with simple patterns and advancing to more complex ones. Deep learning may employ supervised or unsupervised learning. Some of the applications of deep learning include image and speech recognition.
Machine learning can produce models that overfit or underfit the data. Overfitting occurs when the machine learns the input and output data too exactly, treats noise as true parameters, and identifies spurious patterns and relationships. In effect, the machine creates a model that is too complex. Underfitting occurs when the machine fails to identify actual patterns and relationships, treating true parameters as noise. This means that the model is not complex enough to describe the data. A further challenge with machine learning is that its results can be a "black box," producing outcomes based on relationships that are not readily explainable.
人工智慧領域的一項重要發展是機器學習(machine learning)。在機器學習中,電腦演算法接受原始資料作為輸入,無需對其概率分布做出假設,並可接受目標資料作為輸出。該演算法被設計為在不需人工協助的情況下,學習如何根據輸入資料建立輸出資料的模型,或學習如何偵測和識別輸入資料中的規律。
機器學習通常需要龐大的資料量。典型的流程從訓練資料集(training dataset)開始,演算法在其中尋找關係。然後使用驗證資料集(validation dataset)來精煉這些關係模型,之後可將其應用於測試資料集(test dataset)以分析其預測能力。
在監督式學習(supervised learning)中,輸入和輸出資料均有標籤,機器學習從輸入建立輸出模型,然後再以新資料套用該模型。在非監督式學習(unsupervised learning)中,輸入資料沒有標籤,機器學習描述資料的結構。深度學習(deep learning)是一種使用多層神經網路識別規律的技術,從簡單模式開始,逐步推進到更複雜的模式。深度學習可採用監督式或非監督式學習,應用包括影像辨識和語音識別。
機器學習可能產生對資料過度擬合(overfitting)或欠擬合(underfitting)的模型。過度擬合發生在機器過度精確地學習輸入和輸出資料、把雜訊當成真實參數、並識別出虛假規律和關係時——實際上機器建立了過於複雜的模型。欠擬合則發生在機器未能識別實際規律和關係、把真實參數視為雜訊時——意即模型不夠複雜,無法描述資料。機器學習的另一項挑戰是其結果可能成為「黑箱」,基於難以解釋的關係產生結論。
Describe applications of Big Data and Data Science to investment management.
Applications of fintech that are relevant to investment management include text analytics, natural language processing, risk governance, and algorithmic trading.
Text analytics refers to the analysis of unstructured data in text or voice forms. An example of text analytics is analyzing the frequency of words and phrases. In the finance industry, text analytics have the potential to partially automate specific tasks such as evaluating company regulatory filings.
Natural language processing refers to the use of computers and artificial intelligence to interpret human language. Speech recognition and language translation are among the uses of natural language processing. Possible applications in finance could be to check for regulatory compliance in an examination of employee communications, or to evaluate large volumes of research reports to detect more subtle changes in sentiment than can be discerned from analysts' recommendations alone.
與投資管理相關的金融科技應用包括文字分析、自然語言處理、風險治理和演算法交易。
文字分析(text analytics)指的是分析文字或語音形式的非結構化資料。文字分析的一個例子是分析詞彙和片語出現的頻率。在金融業,文字分析有潛力部分自動化特定任務,例如評估公司的法規申報文件。
自然語言處理(natural language processing)指的是使用電腦和人工智慧來解讀人類語言。語音識別和語言翻譯是自然語言處理的應用之一。在金融領域,可能的應用包括檢查員工通訊中的法規遵循情況,或評估大量研究報告以偵測比分析師建議更細微的情緒變化。
Risk governance requires an understanding of a firm's exposure to a wide variety of risks. Financial regulators require firms to perform risk assessments and stress testing. The simulations, scenario analysis, and other techniques used for risk analysis require large amounts of quantitative data along with a great deal of qualitative information. Machine learning and other techniques related to Big Data can be useful in modeling and testing risk, particularly if firms use real-time data to monitor risk exposures.
Algorithmic trading refers to computerized securities trading based on a predetermined set of rules. For example, algorithms may be designed to enter the optimal execution instructions for any given trade based on real-time price and volume data. Algorithmic trading can also be useful for executing large orders by determining the best way to divide the orders across exchanges. Another application of algorithmic trading is high-frequency trading that identifies and takes advantage of intraday securities mispricings.
風險治理(risk governance)需要了解公司面臨各類風險的暴露程度。金融監理機構要求公司進行風險評估和壓力測試。用於風險分析的模擬、情境分析及其他技術,需要大量量化資料以及大量定性資訊。機器學習和其他大數據相關技術,在建模和測試風險方面很有用,特別是在公司使用即時資料監測風險暴露的情況下。
演算法交易(algorithmic trading)指的是根據預定規則集進行的電腦化證券交易。例如,演算法可被設計為根據即時價格和成交量資料,為任何給定的交易輸入最優化的執行指令。演算法交易也可透過確定在各交易所之間最佳分配方式來執行大額訂單。演算法交易的另一項應用是高頻交易(high-frequency trading),用於識別並利用盤中的證券錯誤定價。
- A. application of technology to the financial services industry.
- B. replacement of government-issued money with electronic currencies.
- C. clearing and settling of securities trades through distributed ledger technology.
- A. Machine learning.
- B. High-latency capture.
- C. The Internet of Things.
Fintech refers to developments in technology that can be applied to the financial services industry. Companies that develop technologies for the finance industry are referred to as fintech companies.
Primary fintech development areas include handling large datasets from varied sources and forms, and tools like artificial intelligence for analyzing very large datasets.
Big Data refers to the potentially useful information that is generated in the economy, including data from traditional and nontraditional sources (individuals, corporate exhaust, Internet of Things). Characteristics of Big Data include its volume, velocity, and variety.
Data science describes methods for processing (capture, curation, storage, search, transfer) and visualizing data to extract information from Big Data.
Artificial intelligence refers to computer systems that can be programmed to simulate human cognition. Neural networks are an example of artificial intelligence.
Machine learning is programming that gives a computer system the ability to improve its performance of a task over time and is often used to detect patterns in large sets of data. A typical ML process uses training, validation, and test datasets. Supervised learning uses labeled data; unsupervised learning uses unlabeled data. Deep learning uses layered neural networks for pattern recognition. Overfitting and underfitting are key risks in model development.
Applications of fintech to investment management include text analytics, natural language processing, risk governance, and algorithmic trading.
- Text analytics — analyzing unstructured data in text or voice forms (e.g., regulatory filings).
- Natural language processing — using computers and AI to interpret human language (e.g., compliance checks, sentiment analysis of research reports).
- Risk governance — using Big Data and machine learning to model and test risk exposures, including real-time monitoring.
- Algorithmic trading — computerized trading based on predetermined rules; includes high-frequency trading to exploit intraday securities mispricings.
LOS 11.a
金融科技(fintech)指可應用於金融服務業的科技發展;從事金融業科技開發的公司稱為金融科技公司。主要發展方向包括:處理多來源、多形式大型資料集的能力提升,以及人工智慧等用於分析超大型資料集的工具。
LOS 11.b
大數據(Big Data)指經濟體中所有潛在有用的資訊,包含傳統來源(金融市場、財務報告、政府統計)及非傳統來源(個人資料、企業廢氣、物聯網裝置)。大數據的三大特徵為:量(volume)、速度(velocity)、多樣性(variety)。
資料科學描述從大數據中萃取資訊的方法,包含處理(擷取、整理、儲存、搜尋、傳輸)和視覺化技術。
人工智慧是可被程式化以模擬人類認知的電腦系統;神經網路是一個典型例子。
機器學習是讓電腦系統隨時間改善任務表現的程式設計,常用於偵測大數據中的規律。典型 ML 流程使用訓練、驗證和測試資料集。監督式學習使用有標籤資料;非監督式學習使用無標籤資料;深度學習使用多層神經網路進行模式識別。過度擬合與欠擬合是模型開發的主要風險。
LOS 11.c
金融科技在投資管理上的應用包括:
- 文字分析——分析文字或語音形式的非結構化資料(如法規申報文件)。
- 自然語言處理——使用電腦和 AI 解讀人類語言(如合規檢查、研究報告情緒分析)。
- 風險治理——利用大數據和機器學習建模並測試風險暴露,包括即時監測。
- 演算法交易——基於預定規則的電腦化交易;高頻交易利用盤中錯誤定價獲利。