Siamese Neural Network 박천음 강원대학교 Intelligent Software Lab. Intelligent Software Lab.
Intro. S2Net Siamese Neural Network(S2Net) 입력 text 들을 concept vector 로표현하기위함에기반 즉, similarity 를위해가중치가부여된 vector 로표현 linear projection // metric learning approach Input Output Original term vector (one-hot) Projected concept vector Model parameters (weights) e.g. word embedding Model parameters IIIIIIIIII pppppppp rrrrrr tttttttt vvvvvvvvvvvv, ttttttttt llllllllllll, 각 oooooooooooo vvvvvvvvvvvvvv ssssss. : min llllllll(. ) llllllllll: ssssssssssssss oooo nnnnnn Intelligent Software Lab. 2
Learning concept vector vector z OOOOOOOOOOOO Projected concept vector VV pp : < > ssssss(vv pp, VV qq ) VV qq : < > HHHHHHHHHHHH: ttttt Model parameters IIIIIIIIII: tttt aa 0,0,, aa 0,kk aa 1,0,, aa 1,kk aa dd,0,, aa dd,kk Original term vector (one-hot) <0, 1, 0, 0,.. 0, 1>_(dx1) Intelligent Software Lab. 3
Model design ffffffffffffffffffffff NN: nnnntt jj = dd ii=0 ww iiii xx ii S2Net: ttww cc jj = ttii VV αα iiii tttt(tt ii ) αα iiii AA aaaaaaaaaaaaaaaaaaaa ffffffffffffffff sigmoid (non-linear) ssssssssss ttww (. ) ff = tttttttt vvvvvvvvvvvv tttt, ff = VV, dd 1 rrrrrr vvvvvvvvvvvv AA = αα iiii, αα dd kk iiii AA ( weights) gg = AA TT ff ssssmm AA (ff pp, ff qq ) vvvvvvvvvvvv gg kk 1 ( projected concept vector) Intelligent Software Lab. 4
Loss function using the cosine similarity tttttttt vvvvvvvvvvvv pppppppp ff pp, ff qq cccccccccccccccccccccccccc cccccccccccccc vvvvvvvvvvvvvv cccccccccccc vvvvvvvvvv gg pp, gg qq gg pp = AA TT ff pp, gg qq = AA TT ff qq ssssmm AA ff pp, ff qq = TTTTTTTT llllllllll = yy pppp gg pp gg pp TT gg qq gg qq llllllll ffffffffffffffff(mmmmmm) = 1 2 yy pppp ssssmm AA ff pp, ff qq 2 요런방법도있음 ( 기본적인방법 ) query 에가까운 text object 를선택하기위해 MSE 사용 Intelligent Software Lab. 5
Training procedure : ssssssssssssssssssss ssssssssss tttttttt vvvvvvvvvvvv ff ppp, ff qqq, (ff ppp, ff qqq ) 를비교 첫번째 term vector 가더 high similarity 식으로표현하면 = ssssmm AA ff ppp, ff qqq ssssmm AA ff ppp, ff qqq 를 logistic loss 로변환 [0 or 1] 로 loss 값을반올림 LL ; AA = log(1 + exp γγ ) 값이커질수록 loss 값이작아짐 γγ 는 cosine function 때문에 를 γγ 만큼확대시킴 ( dddddddddddddd : [-2, 2]) 따라서 prediction error 에도움됨 γγ 는경험상크게 (= 10) 여기서 AA 를최적화 ( 타겟, high score) ( 출력 ) 본논문에서의 loss function Intelligent Software Lab. 6
Gradient derivation LL ; AA 에서 AA 최적화 LL ;AA AA = γγ 1+exp( γγ ) AA = ssssmm AA AA AA ff ppp, ff qqq ssssmm AA AA ff pp, ff qq cos gg pp, gg qq = AA ssssmm AA ff pp2, ff qq2 = AA cos gg pp, gg qq gg pp TT gg qq gg pp gg qq 1 AA gg TT pp gg qq = AA AA TT ff pp gg qq + AA AA TT ff qq gg pp = ff pp gg TT TT qq + ff qq gg pp 1 = gg TT 1 2 2 gg pp gg pp pp 1 2 3 = 1 2 gg pp TT gg pp 23 AA gg pp TT gg pp = gg pp TT gg pp 23 ff pp gg pp TT LL ; AA = log(1 + exp γγ ) = ssssmm AA ff ppp, ff qqq gg pp = AA TT ff pp, ssssmm AA ff ppp, ff qqq gg qq = AA TT ff qq 3 1 gg qq = gg qq TT gg qq 2 3 ff qq gg qq TT Intelligent Software Lab. 7
S2Net example: max() Input sentences Sen1: what movies johnny depp is in? Sen2: what movies does johnny depp play in? Sen3: who has been married to julia roberts? Bag of words(size 16): Term vector: Intelligent Software Lab. 8
S2Net example: max() cont` Weights: (size: 2 x 16) 0.005 0.0015 0.1 0.075 0.12 0.005 0.0015 0.02 0.1 0.05 0.1 0.05 0.1 0.05 0.0005 0.2 0.003 0.2 0.1 0.005 0.1 0.08 0 0.075 0 0.09 0.2 0.076 0.005 0.0125 0.065 0.01 xx ii xx ww WW TT Intelligent Software Lab. 9
S2Net example: max() cont` yy = ff(ww TT xx), (cccccccccccccc vvvvvvvvvvvv yy 2 1 ) sigmoid xx ii yy ii yy ii ff(nnnntt jj ) ssssssssss zz ddddddd, zz ddddddd, max(0, 1 zz ddddddd + zz ddddddd ) yy ii zz ddddddd yy ii yy ii zz dddddd2 score(.) yy ii Intelligent Software Lab. 10
S2Net example: max() cont` Back prop. ssssss zz ddddddd, zz ddddddd mmmmmm 0, nnnnnn = 1 iiii 1 ss + ss cc > 0 0 ooooooooooooooooo gradient update fff nnnnnn = ssssssssssssss. gradient update logistic sigmoid ff nnnnnn = ff nnnnnn nnnnnn 1 1 + exp nnnnnn = ff nnnnnn 1 ff nnnnnn Intelligent Software Lab. 11
Intelligent Software Lab. 12
S2Net example: cosine_sim() Input sentences Sen1: what movies johnny depp is in? Sen2: what movies does johnny depp play in? Sen3: who has been married to julia roberts? Bag of words(size 16): Term vector: Intelligent Software Lab. 13
S2Net example cosine_sim() cont` Weights: (size: d x k) 0.005 0.0015 0.1 0.075 0.12 0.005 0.0015 0.02 0.1 0.05 0.1 0.05 0.1 0.05 0.0005 0.2 0.003 0.2 0.1 0.005 0.1 0.08 0 0.075 0 0.09 0.2 0.076 0.005 0.0125 0.065 0.01 Term vector: ssssssssss ttww (. ) ff = tttttttt vvvvvvvvvvvv tttt, ff = VV, dd 1 rrrrrr vvvvvvvvvvvv AA = αα iiii, αα dd kk iiii AA ( weights) gg = AA TT ff ssssmm AA (ff pp, ff qq ) vvvvvvvvvvvv gg kk 1 ( projected concept vector) Intelligent Software Lab. 14
S2Net example cosine_sim() cont` Term vectors: AA TT : 0.005, 0.1, 0.12, 0.0015, 0.1, 0.1, 0.1, 0.0005, 0.003, 0.1, 0.1, 0, 0, 0.2, 0.005, 0.065 0.0015, 0.075, 0.005, 0.02, 0.05, 0.05, 0.05, 0.2, 0.2, 0.005, 0.08, 0.075, 0.09, 0.076, 0.0125, 0.01 Intelligent Software Lab. 15
S2Net example cosine_sim() cont` gg = AA TT ff (cccccccccccccc vvvvvvvvvvvv gg kk 1 ) ssssmm AA ff pp, ff qq = gg pp TT gg qq gg pp gg qq, (cosine similarity) (aaaaaaaaaaaaaaaaaaaa ffffffffffffffff) Intelligent Software Lab. 16
S2Net example cosine_sim() cont` YY llllllllll llllllll ffffffffffffffff(mmmmmm) = 1 2 yy pppp ssssmm AA ff pp, ff qq 2 Or 0 Intelligent Software Lab. 17
Intelligent Software Lab.
S2Net example3 cosine_sim() activation Input sentences Sen1: what movies johnny depp is in? Sen2: what movies does johnny depp play in? Sen3: who has been married to julia roberts? Bag of words(size 16): Term vector: Intelligent Software Lab. 19
S2Net example3 cosine_sim() activation` Weights: (size: d x k) 0.05 0.05 0.05 0.075 0.1 0.125 0.02 0.075 0.1 0.05 0.005 0.05 0.005 0.05 0.08 0.2 0.125 0.0125 0.1 0.05 0.1 0.025 0.125 0.075 0.115 0.075 0.005 0.05 0.01 0.0155 0.01 0.022 Term vector: ssssssssss ttww (. ) ff = tttttttt vvvvvvvvvvvv tttt, ff = VV, dd 1 rrrrrr vvvvvvvvvvvv AA = αα iiii, αα dd kk iiii AA ( weights) gg = AA TT ff ssssmm AA (ff pp, ff qq ) vvvvvvvvvvvv gg kk 1 ( projected concept vector) Intelligent Software Lab. 20
S2Net example3 cosine_sim() activation` Input vector: AA TT : 0.05, 0.05, 0.1, 0.02, 0.1, 0.005, 0.005, 0.08, 0.125, 0.1, 0.1, 0.125, 0.115, 0.005, 0.01, 0.01 0.05, 0.075, 0.125, 0.075, 0.05, 0.05, 0.05, 0.2, 0.0125, 0.05, 0.025, 0.075, 0.075, 0.05, 0.0155, 0.022 Intelligent Software Lab. 21
S2Net example3 cosine_sim() activation` gg = AA TT ff (cccccccccccccc vvvvvvvvvvvv gg kk 1 ) ssssmm AA ff pp, ff qq = gg pp TT gg qq gg pp gg qq, (cosine similarity) Intelligent Software Lab. 22
S2Net example3 cosine_sim() activation` Y label Intelligent Software Lab. 23