Algorithm Trading Introduction DeepNumbers, 안명호 james@deepnumbers.com 1
3
4
5
적절한종목을선택하고매도와매수시기를알아내수익을실현하는것!!! 6
미국은 2012 년알고리즘트레이딩거래량이 85% 에달할만큼알고리즘트레이딩은가파르게증가 7
Goove WM, Zald DH등이작성한 Clinical versus Mechanical Prediction: a metaanalysis 논문에의하면 136건의사례를조사해보니수학적모델이사람보다비슷하거나더좋은결과를가져올확률이 94% 라고한다. (source - http://www.ncbi.nlm.nih.gov/pubmed/10752360) 8
Nevsky Capital, a $1.5bn hedge fund, earlier this month decided to call it quits after a long period of success, partly blaming black box algorithmic funds for making markets harder to navigate for old-fashioned investors. Common lament among fund managers Martin Taylor, Nevsky Captital Source - http://www.ft.com/cms/s/0/5eb91614-bee5-11e5-846f-79b0e3d20eaf.html#axzz4c07wwhss
장점 단점 10
Theory Driven Approach 를많이적용함 11
Renaissance is one of the first highly successful hedge funds using quantitative Hedge Fund Rank 12 AUM : 65B$ Top-Performing Hedge Funds trading known as "quant hedge funds" that rely on powerful computers and sophisticated mathematics to guide investment strategies. Medallion Fund : From 1994 through mid-2014 it averaged a 71.8% annual return before fee
Machine Learning, Distributed Computing Hedge Fund Rank 11 AUM : 35B$
Edward Thorp 1세대 Edward Throp는 알고리즘 트레이딩의 1세대로 월스트리트에 서 최초로 수학을 이용해 펀드를 운영하였다. MIT 수학과 교수출신으로 Princeton-Newport Partners라는 투자를 설립했고 1970년부터 1998년 청산하기 까지 연평균 2 0%라는 놀라운 수익을 보여주었다. 주식에 조금이라도 관심있는 사람이라면 누구나 아는 워렌 버 핏의 연평균수익율이 21.6%이고, 같은 기간 S&P500의 연평 균수익율이 8.84%라는 것을 감안해보면 소프가 보여준 수익 율이 얼마나 대단한 것인지를 알 수 있을 것이다. 14
15
16
17
머신러닝 ( 영어 : machine learning) 또는기계학습 ( 機械學習 ) 은인공지능의한분야로, 컴퓨터가학습할수있도록하는알고리즘과기술을개발하는분야를말한다. 가령, 기계학습을통해서수신한이메일이스팸인지아닌지를구분할수있도록훈련할수있다. 기계학습의핵심은표현 (representation) 과일반화 (generalization) 에있다. 표현이란데이터의평가이며, 일반화란아직알수없는데이터에대한처리이다. 이는전산학습이론분야이기도하다. 다양한기계학습의응용이존재한다. 문자인식은이를이용한가장잘알려진사례이다. From WIKI Machine Learning Crash Course by MHR 19
Machine Learning Crash Course by MHR 20
y = f(x) output prediction function feature Training: given a training set of labeled examples {(x 1,y 1 ),, (x N,y N )}, estimate the prediction function f by minimizing the prediction error on the training set Testing: apply f to a never before seen test example x and output the predicted value y = f(x) Machine Learning Crash Course by MHR 21
Training Data Set predictor variable x,y = w 0 + w 1 x Machine Learning Crash Course by MHR 22
Machine Learning Crash Course by MHR 23
Unsupervised Learning Supervised Learning Clustering Regression Classification Machine Learning Crash Course by MHR 24
Machine Learning Crash Course by MHR 25
Machine Learning Crash Course by MHR 26
하드웨어 Intel i5 이상, 메모리 4GB, HDD 256GB 이상 되도록이면 i7 에서제일빠른 CPU 사용 OS 파이썬을사용할수있는환경 (Windows, OS X, Linux) Linux 를강력추천 Database MySQL, 주가관련데이터저장용이다. 프로그래밍언어 파이썬, 파이썬은강력한언어로웹개발, 클라우드, 금융에이르기까지폭넓게사용되고있다. 언어가간결하고배우기쉬우며다양한라이브러리를가지고있다. Machine Learning Crash Course by MHR 27
Machine Learning Crash Course by MHR 28
Python Machine Learning Library Deep Learning 은지원하지않음 Used in many areas 학계, 산업계등다양한분야에서폭넓게활용 Wide Coverage 상당히많은수의 Machine Learning 알고리즘지원 Machine Learning 알고리즘뿐만아니라 Data Processing 등연관된거의모든기능을지원함 Developer Friendly 개발자가사용하기쉬운라이브러리 Consistency in Usage 알고리즘에상관없이거의일관된사용법으로 Machine Learning 활용가능 Machine Learning Crash Course by MHR 29
Python Data Analysis Library Originally from Hedge Fund Developer Wes McKinney Many Functions for Finance Industry Support DBMS, Excel, CSV SQL 을사용해데이터처리가능 Time Series Analysis Moving Average, ARIMA Machine Learning Crash Course by MHR 30
In [6]: df['min wage']['2010-01-01':].plot() Out[6]: <matplotlib.axes._subplots.axessubplot at 0x108705208> In [7]: import matplotlib.pyplot as plt In [8]: plt.show() Machine Learning Crash Course by MHR 31
Anaconda 를이용해각자의환경에맞는 Python 과라이브러리를설치하세요. https://www.continuum.io/downloads Machine Learning Crash Course by MHR 32
Machine Learning Crash Course by MHR 33
Machine Learning Crash Course by MHR 34
Machine Learning Crash Course by MHR 35
def download_stock_data(file_name,company_code,year1,month1,date1,year2,month2,date 2): start = datetime.datetime(year1, month1, date1) end = datetime.datetime(year2, month2, date2) df = web.datareader("%s.ks" % (company_code), "yahoo", start, end) df.to_pickle(file_name) return df download_stock_data('samsung_2010.data','005930',2010,1,1,2015,11,30) download_stock_data('hanmi.data','128940',2015,1,1,2015,11,30) Machine Learning Crash Course by MHR 36
def download_whole_stock_data(market_type,year1,month1,date1,year2,month2,date2): df_code = read_stock_code_from_xls('stock_code.xls') for index in range(df_code.shape[0]): stock_code = df_code.loc[index,df_code.columns[0]] name = df_code.loc[index,df_code.columns[1]] market = df_code.loc[index,df_code.columns[2]] if market_type.upper()=='kospi': print "... downloading %s of %s : code=%s, name=%s" % (index+1, df_code.shape[0], stock_code,name) download_stock_data('%s.data'%(stock_code),stock_code,year1,month1,date1,year2,month2,date2) Data_loader.py 참조 Machine Learning Crash Course by MHR 37
Machine Learning Crash Course by MHR 39
Series 1차원 : index를사용하여참조 Dataframe DataFrame 2차원 : index, columns 스프레드시트, SQL 테이블 Panel 3차원 items: axis 0 major_axis: axis 1 minor_axis: axis 2 Panel4D, PanelND ( 실험중 ) http://pandas.pydata.org/pandas-docs/stable/dsintro.html 참조 Machine Learning Crash Course by MHR 40
In [1]: import pandas as pd In [2]: file = "https://raw.githubusercontent.com/sk8erchoi/csvfiles/master/college_reg_fee_2015.csv" In [3]: df = pd.read_csv(file) In [4]: df[['name', 'Avg']][:3] Out[4]: Name Avg 0 ICT폴리텍대학 2200 1 가톨릭상지대학교 5504 2 강동대학교 5725 In [5]: df[['name', 'Avg']].sort(['Avg'], ascending=false).head(3) Out[5]: Name Avg 72 서울예술대학교 8101 18 계원예술대학교 7564 61 백제예술대학교 7486 Machine Learning Crash Course by MHR 41
def load_stock_data(file_name): df = pd.read_pickle(file_name).dropna(how='any') return df.reset_index(drop=true) Machine Learning Crash Course by MHR 42
def dopreprocessing(dataset,column_name,threshold=3): df_revised = dataset[ dataset[column_name]>0 ] return df_revised[((df_revised[column_name] - df_revised[column_name].mean()) / df_revised[column_name].std()).abs() < threshold] Machine Learning Crash Course by MHR 43
Machine Learning Crash Course by MHR 44
def preparedataset(dataset,split_ratio=0.75): split_index = math.trunc(dataset.shape[0] * 0.75) input_column_array = ['Open','Low','Close','Volume'] output_column = ['High'] input_data = dataset[input_column_array] output_data = dataset[output_column] # Create training and test sets X_train = input_data[0:split_index] Y_train = output_data[0:split_index] X_test = input_data[split_index+1:input_data.shape[0]-1] Y_test = output_data[split_index+1:input_data.shape[0]-1] return X_train,np.ravel(Y_train),X_test,np.ravel(Y_test) def createsvrmodel(): regr = SVR(C=1.0, epsilon=0.2, kernel='rbf') return regr Machine Learning Crash Course by MHR 45
def plottrainingresult(model,test_x,test_y): # Plot outputs pred_y = model.predict(test_x) plt.plot(pred_y, color='blue', linewidth=3) plt.plot(test_y, color='red', linewidth=3) plt.show() Machine Learning Crash Course by MHR 46
RandomForest Regressor을이용해서주가를예측 ML 프로그램개발 RFR 정보는아래링크참조 http://scikitlearn.org/stable/modules/generated/skl earn.ensemble.randomforestregressor. html 앞의예제를수정해서사용 Machine Learning Crash Course by MHR 47
48