[ 인공지능입문랩 ] SEOPT ( Study on the Elements Of Python and Tensorflow ) <-3 회차 >. ( 통계적이아니라시행착오적 ) 회귀분석 ( 지도학습 ) by Tensorflow - Tensorflow 를사용하는이유, 신경망구조 - youngdocseo@gmail.com
인공지능 데이터분석 When you re fundraising, its AI( 인공지능 ). When you re hiring, it s ML( 머신러닝 ). When you re implementing, it s linear regression( 회귀분석 ). - Baron Schwartz @xaprb AI is just an acronym for linear regression. https://towardsdatascience.com/no-machine-learning-is-not-just-glorifiedstatistics-6d39534e3 - Sean Gies @seangies
백讀이불여일打 import random Xdata = list() ydata = list() for num in range( 500 ) : temp = random.random() xdata.append( temp ) ydata.append( 0 + 3 * temp ) On your mark Get Set Go import tensorflow myb = tensorflow.variable( 0.5 ) myw = tensorflow.variable( 0.5 ) 3 myy = myb + myw * xdata 4 myloss = tensorflow.reduce_mean( ( myy ydata )** ) 5 mytrain = tensorflow.train.gradientdescentoptimizer( 0.5 ).minimize( myloss ) 6 myinit = tensorflow.global_variables_initializer() * 학습률 7 mysess = tensorflow.session() 8 mysess.run( myinit ) * 시작값 cf) tensorflow.zeros( [] ) cf) tensorflow.random_uniform( [], -.0,.0 ). 단, 0 은안됨. cf) tensorflow.square( myy ydata ) cf) 이코드의위치는 myw, myb 뒤쪽이기만하면됨 여기에도 print(mysess.run( myb ), mysess.run( myw ), mysess.run( myloss ) ) 를넣고실행해보면? for step in range( 00 ) : * 학습횟수 (epoch) 9 mysess.run( mytrain ) 0 print( mysess.run( myb ), mysess.run( myw ), mysess.run( myloss ) ) 3
변수 Python 함수 Tensorflow 3 myy = myb + myw * xdata xdata [ 500 ] myy [ 500 ] myb myw 찾고싶은값 Variable() On your mark ydata [ 500 ] myloss 4 배열의요소값들의평균 reduce_mean() 결국이기능때문에 Tensorflow 사용 0 run mytrain 9 run * 여러번 ( 학습과정 ) initialize myinit 8 run * 한번만 ( 초기값실행 ) 5 6 옵티마이저 train.gradientdesc entoptimizer(0.5). minimize() global_variables _initializer() Get Set mysess 7 Session() run() Go 4
상수변수함수 회귀모형 = 선형 ( 인간의추정 ) 독립변수 xdata myb + myw * xdata 입력데이터 ( 神이만든세계 ) 종속변수 ydata 실제값 (target/label) vs 회귀모형에의한값 myy 손실 / 오차 / 비용함수 Mean Squared Error 분류문제일경우에는 Cross Entropy 를많이사용 bias( 편향, 절편 ) weight( 가중치 ) 모형이계산한값 myb myw parameter / kernel / filter cf) hyper-parameter: 학습률, 학습횟수등 parameter( 그냥가중치라고도함 ) ( 기계의계산 ) 옵티마이저 GradientDescent Optimizer 손실 / 오차 / 비용점수 손실 / 오차 / 비용점수를최소화하도록 parameter( 가중치 + 편향 ) 를업데이트함 = 학습 / 지능 (parameter 는학습 / 지능의저장소 ) = 결국모형자체까지보다는모형의 parameter 값을찾는것일뿐일지도 5
[ Appendix ] 다중 회귀분석도가능 yy = 0 + 3 xx + 8 xx 비선형 회귀분석도가능 yy = 0 + 3 xx 6 xx + 8 xx 단, 코드3에서리스트값의제곱 (square) 처리는아래와같이 myy = myw * [ i** for i in xdata ] + myw * xdata + myw3 * x_data + myb 조절효과 회귀분석도가능 yy = 0 + 3xx xx 6 xx + 8 xx 승수찾기 도가능? yy = 0 + xx 3 ( 즉, 3 을찾는것 ) 6
[ Appendix ] 주의사항 가중치를업데이트하는경사하강법 (gradient descent) 의특성에의해 효과적인학습을위해 - 입출력값은 [ 참고 ] 데이터전처리고급버전백색화 (whitening) 주성분분석 (PCA) : 독립변수간상관성제거 정규화 (normalize: 0~) 혹은표준화 (standardize: 95% 로 -~) 할것 *xdata 값생성을 temp = random.random()* 혹은 *3 으로해보자. (Overshooting 문제 ) - 초기가중치의값을가급적 -~ 사이로하되 0으로하거나모든가중치의값이동일한값으로는설정하지말것 * 참고로가중치의수가많으면많을수록작은값으로시작할것 그래서가중치의초기값을 NN(0, ) 로설정하기도함. 노드수 7
회귀분석 with CSV, Test and Batch Size # 데이터생성 /import xdata = list() ydata = list() import csv file = open("data.csv") read = csv.reader(file) for row in read : xdata.append(float(row[0])) ydata.append(float(row[])) # tensorflow 로모델링 import tensorflow as tf # 데이터를훈련데이터와테스트데이터로나눔. 그리고 python list를 numpy array로변환한후 축으로설정. import numpy as np trainindex = 400 # 보통전체데이터의 80% trainxdata = np.array( xdata[:trainindex] ) trainxdata = np.reshape( trainxdata, ( trainindex, ) ) trainydata = np.array( ydata[:trainindex] ) trainydata = np.reshape( trainydata, ( trainindex, ) ) testxdata = np.array( xdata[trainindex:] ) testydata = np.array( ydata[trainindex:] ) myb = tf.variable( 0.5 ) myw = tf.variable( 0.5 ) 8
trainxdatabatch = tf.placeholder( tf.float3, [None, ] ) trainydatabatch = tf.placeholder( tf.float3, [None, ] ) myy = myb + myw * trainxdatabatch myloss = tf.reduce_mean( tf.square( myy - trainydatabatch ) ) mytrain = tf.train.gradientdescentoptimizer( 0. ).minimize( myloss ) myinit = tf.global_variables_initializer() *tensorflow.constant(): 계속고정된값 (data) mysess = tf.session() mysess.run( myinit ) tensorflow.variable(): 변화시키면서찾고자하는값 (parameter) tensorflow.placeholder(): 그때그때입력하고자하는값 (data batch) # 학습 np.random.seed() # for indentical random sequence for step in range( 00 ) : # Epoch 수 rand_index = np.random.choice(trainindex, 00) # Batch Size batchxdata = trainxdata[ rand_index ] *random selection cf) slice selection batchydata = trainydata[ rand_index ] mysess.run( mytrain, feed_dict={ trainxdatabatch: batchxdata, trainydatabatch: batchydata } ) print( mysess.run(myb), mysess.run(myw) ) Placeholder 사용구조 trainmyy = myb + myw * trainxdata trainloss = tf.reduce_mean( ( trainmyy - trainydata )** ) print(mysess.run(trainloss)) # 테스트 testmyy = myb + myw * testxdata testloss = tf.reduce_mean( ( testmyy - testydata )** ) print(mysess.run(testloss)) # import matplotlib.pyplot as plt # plt.plot(testxdata, testydata, 'ro') # plt.plot(testxdata, mysess.run(testmyy)) # plt.show() trainxdatabatch myy myloss placeholder trainydatabatch mysess.run( mytrain, feed_dict={ } 9
*Epoch은학습횟수를칭하기도하고전체 data의사용횟수를칭하기도함. Mini-Batch Data selection = slice일경우에는전체 data의사용횟수를칭하는편임. 즉 Epoch에다수의학습이이루어짐. (Batch size = total일때는동일함 ) 손실 ( 오차 ) 점수 [ Appendix ] 학습률, 학습횟수 (epoch), Batch Size 학습속도 *Batch Size = Total 이라고하여한번에학습을완료하는것은아니다. 어디까지나 반복적시행착오 이다. 즉, 동일한데이터일지라도반복하여사용하면추가학습이이루어진다. 학습안정성 ㅇ Batch Size : (SGD) vs Mini-Batch vs Total/Batch ㅇ Mini-Batch Data selection : random vs slice Batch Size : 회학습에사용하는데이터 (inputs, label) 수 학습률 ( 작을수록폭이좁음 ) ㅇ학습률 : 크게 vs 작게 학습횟수 (epoch) ㅇ학습률이작을수록 (!) Batch Size 가작을수록 (?) 학습횟수는많이 * 통계적회귀분석은 total data 를사용하여한번에최저지점으로간다. 가중치 0
회귀분석 with multiple inputs(independent variables) import random import numpy # 데이터생성 xdata = list() ydata = list() for num in range( 500 ) : temp = random.random() temp = random.random() temp3 = random.random() xdata.append( [temp, temp, temp3] ) ydata.append( 0 + 3 * temp 4 * temp + * temp3 ) import numpy as np ydata = np.reshape( ydata, [500,] ) # ydata (500,) 를 (500,) 로변환 # tensorflow 로모델링 import tensorflow as tf myb = tf.variable( 0.0 ) # bias 의초기값은보통 0.0 으로 ( 정수 0 은안됨 ) myw = tf.variable( tf.random.normal( [3, ], 0, ) ) # weight 의초기값은보통정규분포 ( 적절한표준편차공식있음 ) myy = tf.matmul( xdata, myw ) + myb # 회귀식 myloss = tf.reduce_mean( tf.square(myy - ydata) ) # 손실함수 mytrain = tf.train.gradientdescentoptimizer( 0.05 ).minimize( myloss ) # 옵티마이저 myinit = tf.global_variables_initializer() mysess = tf.session() mysess.run( myinit ) for step in range( 500 ) : mysess.run( mytrain ) #print( xdata[0], mysess.run( myw ), mysess.run( myb ), mysess.run( myy[0] ) ) print( mysess.run(myb), mysess.run(myw) )
[ Appendix ] 입력값및가중치의행렬표기방법과 tf.matmul for one data < 수학적방정식 > ww xx + ww xx + ww 3 xx 3 + bb = yy < 행렬표기법 > xx ww ww ww 3 xx + bb = yy matmul(ww 3, XX 3 ) + b = y xx 3 matmul(ww 3, XX 3 ) + b = y Why bb instead of [bb]? < 행렬표기법 : 더권장 > ww xx xx xx 3 ww ww 3 + bb = yy matmul(xx 3, WW 3 ) + b = y matmul(xx 3, WW 3 ) + b = y import numpy as np a = np.array( [,, 3 ] ) b = np.array( [ 4, 5, 6 ] ) c = np.array( [ [4], [5], [6]] ) print(np.matmul(a, b)) print(np.matmul(b, a)) print(np.matmul(a, c)) print(np.matmul(c, a)) # 오류
[ Appendix ] 입력값및가중치의행렬표기방법과 tf.matmul for many inputs(batch) < 수학적방정식 > ww xx + ww xx + ww 3 xx 3 + bb = yy ww xx + ww xx + ww 3 xx 3 + bb = yy < 행렬표기법 > xx xx ww ww ww 3 xx xx xx 3 xx 3 + bb = yy yy Why bb instead of [bb, bb]? matmul(ww 3, XX 3 ) + b = YY < 행렬표기법 : 더권장 > ww xx xx xx 3 ww + bb = yy xx xx xx 3 ww yy 3 Why bb instead of bb bb matmul(xx 3, WW 3 ) + bb = YY ii. ee. [ [bb], [bb] ]? import numpy as np w = np.array( [,, 3 ] ) x = np.array( [ [, 4 ], [, 5 ], [3, 6] ] ) print(np.matmul(w, x)) x = np.array( [ [,, 3 ], [ 4, 5, 6] ] ) w = np.array( [ [ ], [ ], [ 3 ] ] ) print(np.matmul(x, w)) 3
[Appendix 3 ] 입력값및가중치의행렬표기방법과 tf.matmul for many inputs(batch) with many outputs for next Hidden Layer < 수학적방정식 : many inputs(batch) 와 many outputs 구분및 W, B 형태이해중요 > ww xx + ww xx + ww 3 xx 3 + bb = yy ww xx + ww xx + ww 3 xx 3 + bb = yy ww xx + ww xx + ww 3 xx 3 + bb = yy ww xx + ww xx + ww 3 xx 3 + bb = yy case xx xx xx 3 xx xx xx 3 input 종류 XX bb mm WW mm nn + BB nn = YY bb nn bb : input batch/case/sample 개수 mm : input 종류개수 nn : output 종류개수 참고 < 행렬표기법 > xx xx ww ww ww 3 ww ww ww xx xx 3 xx 3 xx 3 < 행렬표기법: 더권장 > output 연결 ww ww ww ww yy + bb bb = yy ww 3 ww yy 3 yy input 연결 + bb = yy yy bb yy yy output 종류 case Why bb bb instead of bb bb bb bb? matmul(xx 3, WW 3 ) + BB = YY import numpy as np x = np.array( [ [,, 3 ], [ 4, 5, 6] ] ) w = np.array( [ [, ], [ 3, 4 ], [ 5, 6 ] ] ) b = np.array( [0, 0] ) b = np.array( [ [0, 0], [0, 0] ] ) print( np.matmul(x, w) ) print( np.matmul(x, w) + b ) print( np.matmul(x, w) + b ) 4
신경망구조 Input Data xx xx xx ii Kernel : HH bbbb = XX bbbb WW iimm + BB mm h h h mm * bb for batch size * is possible for bb * ii is given, mm is decision Hidden Layer h h h mm Activation Function (relu, sigmoid etc.) Optimizer Kernel : ZZ bbbb = HH bbbb WW mmmm + BB jj zz zz zz jj * bb for batch size * is possible for bb * jj is given. Output Layer zz zz zz jj Activation Function (sigmoid, softmax etc.) Output Data yy yy yy jj Loss Function (MSE, CrossEntropy etc.) 5
회귀분석 with hidden layer and activation function 이전 : No hidden layer> 선형분리가능문제 만해결 선형분리불가능문제 는해결안됨. Input xdata < 이번 : One hidden layer(with two units/nodes) and Activation Function(AF) > Input Xdata 선형분리불가능문제일명 XOR 문제 도해결가능. Xdata * mywto + mybto Xdata * mywto3 + mybto3 kernel kernel Xdata * myw + myb Hidden Layer myh AF myh myh3 3 AF myh3 3 은닉층의 AF 는주로 relu, sigmoid, tanh 사용. 분류 문제에서는출력층에도 AF 적용 ( 주로 sigmoid, softmax). 입력층에는 AF 미적용. kernel myh * mywto4 + myh3 * myw3to4 + myb3to4 Output myy ydata Output myy 4 ydata 6
import random xdata = list() ydata = list() for num in range( 500 ) : temp = random.random() xdata.append(temp) ydata.append( 3 * temp + 0 ) import tensorflow as tf mywto = tf.variable( tf.random.normal ( [], 0, ) ) mybto = tf.variable( 0.0 ) mywto3 = tf.variable( tf.random.normal ( [], 0, ) ) mybto3 = tf.variable( 0.0 ) mywto4 = tf.variable( tf.random.normal ( [], 0, ) ) myw3to4 = tf.variable( tf.random.normal ( [], 0, ) ) myb3to4 = tf.variable( 0.0 ) # - 3 * (temp 0.5)** + 0 으로해보면? * 여러번실행 : 가중치초기값의중요성!! 적합한초기값을찾는 ML 도발전하고있음 ex) 사전학습 ( 비지도 ) with e.g. AutoEncoder or RBM( 생성모델 ) cf) vanishing gradient 문제도해결해줌 myh = xdata * mywto + mybto # 회귀식H myh = tf.nn.sigmoid( myh ) # 활성화함수 (Activation Function) myh3 = xdata * mywto3 + mybto3 # 회귀식H3 myh3 = tf.nn.sigmoid( myh3 ) # 활성화함수 (Activation Function) myy = myh * mywto4 + myh3 * myw3to4 + myb3to4 cf) 선형들의선형결합은선형일뿐이다. AF 를모두주석처리해보자. AF 를모두 sigmoid -> relu 로변경해보자. myloss = tf.reduce_mean( (myy - ydata) ** ) # 손실함수 mytrain = tf.train.gradientdescentoptimizer( 0.0 ).minimize( myloss ) # 옵티마이저. 학습률 =0.0이면? myinit = tf.global_variables_initializer() mysess = tf.session() mysess.run( myinit ) for step in range( 0000 ) : mysess.run( mytrain ) print( mysess.run(myloss) ) # import matplotlib.pyplot as plt # plt.plot( xdata, ydata, 'ro ) # plt.plot( xdata, mysess.run(myy), bo ) # plt.show() 7
[Appendix] 앞의 - 3 * (temp 0.5)** + 0 예제 with AutoEncoder( 사전학습 ) *< 주의 > 아직개념적이해수준의실행코드 ( 확정필요 ) import random xdata = list() ydata = list() for num in range( 500 ) : temp = random.random() xdata.append(temp) ydata.append( -3 * (temp - 0.5)** + 0 ) import tensorflow as tf # AutoEncoder 를통한사전학습 mywto = tf.variable( tf.random.normal( [], 0, ) ) mybto = tf.variable( 0.0) mywto3 = tf.variable( tf.random.normal( [], 0, ) ) mybto3 = tf.variable( 0.0 ) mywto4 = tf.variable( tf.random.normal( [], 0, ) ) myw3to4 = tf.variable( tf.random.normal( [], 0, ) ) myb3to4 = tf.variable( 0.0 ) myh = xdata * mywto + mybto myh = tf.nn.sigmoid( myh ) myh3 = xdata * mywto3 + mybto3 myh3 = tf.nn.sigmoid( myh3 ) myy = myh * mywto4 + myh3 * myw3to4 + myb3to4 myloss = tf.reduce_mean((myy - xdata) ** ) # 정답이 xdata 그자체 mytrain = tf.train.gradientdescentoptimizer( 0.0 ).minimize( myloss ) myinit = tf.global_variables_initializer() mysess = tf.session() mysess.run( myinit ) for step in range( 0000 ) : mysess.run( mytrain ) # 본학습 # 사전학습된지식 ( 가중치 ) 초기값설정 mywto = tf.variable( mysess.run(mywto) ) mybto = tf.variable( mysess.run(mybto) ) mywto3 = tf.variable( mysess.run(mywto3) ) mybto3 = tf.variable( mysess.run(mybto3) ) mywto4 = tf.variable( mysess.run(mywto4) ) myw3to4 = tf.variable( mysess.run(myw3to4) ) myb3to4 = tf.variable( mysess.run(myb3to4) ) myh = xdata * mywto + mybto myh = tf.nn.sigmoid( myh ) myh3 = xdata * mywto3 + mybto3 myh3 = tf.nn.sigmoid( myh3 ) myy = myh * mywto4 + myh3 * myw3to4 + myb3to4 myloss = tf.reduce_mean((myy - ydata) ** ) # 정답이 ydata mytrain = tf.train.gradientdescentoptimizer( 0.0 ).minimize( myloss ) myinit = tf.global_variables_initializer() mysess = tf.session() mysess.run( myinit ) for step in range( 0000 ) : mysess.run( mytrain ) print( mysess.run(myloss) ) import matplotlib.pyplot as plt plt.plot(xdata, ydata, 'ro') plt.plot(xdata, mysess.run(myy), 'bo') plt.show() 8
< 과제 : 신경망 and Regression 종합 > Input( 독립변수 ) 값이 3개 000개의 data set. CSV 파일사용 : Train set 700개, Test set 300개 Batch Size 50 적용 One hidden layer(with two units/nodes) and Activation Function(Relu) 적용 커널설정은행렬을사용한 tf.matmul 적용 ( Hint: 앞에있는 Appendix 4_3 ) reduce_mean() 에 index 추가? Input X data X data Kernel Hidden AF AF X3 data Kernel Output myy ydata 9