파이썬을이용한빅데이터수집. 분석과시각화 이원하
목 차 1 2 3 4 5 Python 설치변수와파이썬자료형 (Data Type) 흐름제어입력 (Input) 과출력 (Output) 함수 (Function) 03 10 38 48 57 6 모듈 (Module) 62
1 1 PYTHON 설치
WHY PYTHON https://www.python.org 4
Download & Installation(1) 5
Download & Installation(2) Document ation Pip tcl/tk IDLE Python test Suite 파이썬관련문서설치 파이썬추가패키지 ( 라이브러리 ) 를손쉽게설치할수있도록지원 파이썬 GUI(Graphic User Interface) 표준라이브러리인 Tkinter 의설치및파이썬통합개발환경 (IDLE: Integrated DeveLopment Environment) 설치 파이썬단위모듈테스트를지원하는프레임워크 (Framework) 6
Download & Installation(3) 7
IDLE 사용법 8
코딩기본규칙 >>> a = 1 >>> b = 2 >>> print (a+b) 3 >>> c=3;d=4;print(c+4) 7 >>> >>> print (" 공백 ") SyntaxError: unexpected indent >>> >>> print(" 파이썬시작 ") # 주석파이썬시작 >>> # 주석표시가된부분은파이썬인터프리터에서해석하지않는다 >>> print( 파이썬시작 ) 파이썬시작 9
21 변수와파이썬자료형 (DATA TYPE) 10
변수 (Variable) 란무엇인가? >>> X = 3 X 3 3 변수이름 : X >>> X = 4 4 >>> print (X) 4 11
Data Type : 숫자형 (Number) >>> a = 1 >>> a = 0 >>> a = -1 >>> a = 1.1 >>> a = -1.1 >>> a = 1.2E10 >>> a = 1.2E-10 >>> a = 0o17 >>> a = 0x12EF 12
Data Type : 숫자형 (Number) >>> 2 + 3 # 덧셈 5 >>> 3 2 # 뺄셈 1 >>> 2 * 3 # 곱셈 6 >>> 2 / 3 # 나눗셈 0.6666666666666666 >>> 2 ** 3 # 거듭제곱 : 2를 3번곱함 8 >>> 2 % 3 # 나머지 : 2을 3으로나누면몫은 0, 나머지는 2 2 >>> 3 % 2 # 나머지 : 3을 2로나누면몫은 1, 나머지는 1 1 >>> 2 // 3 # 나눗셈후소수점이하를버림 : 0.66666 에서소수점이하를버림 0 >>> 3 // 2 #1.5에서.5를버림 1 13
Data Type : 문자열 (String) >>> a = HELLO >>> a = HELLO >>> a = HELLO >>> a = HELLO >>> a = 'she's gone' # 작은따옴표쌍이맞지않는다 SyntaxError: invalid syntax >>> a = "she's gone" # 문자열을큰따옴표쌍으로만들어주니혼동이없음 >>> print(a) she's gone >>> a = "he said that "she is gone"" # 큰따옴표의쌍은맞으나문자열을어떻게끊어야하는지알수없음 SyntaxError: invalid syntax >>> a = 'he said that "she is gone"' # 작은따옴표쌍으로문자열을만들고내부에서큰따옴표를사용 >>> print(a) he said that "she is gone" >>> a = 'he said that "she's gone"' # 작은따옴표의쌍이맞지않음 SyntaxError: invalid syntax >>> # 문자열내부에사용하는큰따옴표와작은따옴표앞에 \ 를입력하면인식 >>> a = 'he said that "she\'s gone"' >>> print(a) he said that "she's gone" >>> a = "he said that \"she\'s gone\"" >>> print(a) he said that "she's gone" 14
Data Type : 문자열 (String) \n 줄바꿈 \t 탭 \\ \ 문자 >>> print ( HI\nHELLO ) HI HELLO >>> print ( HI\tHELLO ) HI HELLO >>> print ( HI\\HELLO ) HI\HELLO \ 출력 \ 출력 15
Data Type : 문자열 (String) >>> a = "HI\nMY NAME IS MARK\nNICE TO MEET YOU" >>> print(a) HI MY NAME IS MARK NICE TO MEET YOU >>> a = ''' HI MY NAME IS MARK NICE TO MEET YOU ''' >>> print(a) HI MY NAME IS MARK NICE TO MEET YOU >>> a = """ HI MY NAME IS MARK NICE TO MEET YOU """ >>> print(a) HI MY NAME IS MARK NICE TO MEET YOU 16
Data Type : 문자열 (String) >>> "HI" + "HELLO" # 두개의문자열을붙여서출력한다 'HIHELLO' >>> "HI"*3 # 문자열을 3번반복한다 'HIHIHI' >>> "*"*10 '********** 17
Data Type : 문자열 (String) M Y N A M E I S M A R K 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4-1 -5-4 -3-2 -1-0 -9-8 -7-6 -5-4 -3-2 -1 >>> a = "MY NAME IS MARK" >>> a[8] 'I' >>> a[9] 'S' >>> a[-7] 'I' >>> a[-6] 'S 18
Data Type : 문자열 (String) >>> a[0:2] 'MY' >>> a[8:10] 'IS' >>> a[8:9] 'I' >>> a[11:] 'MARK' >>> a = "2017-06-25" >>> year = a[0:4] >>> month = a[5:7] >>> day = a[8:] >>> print(year) 2017 >>> print(month) 06 >>> print(day) 25 19
Data Type : 문자열 (String) >>> age = 20 >>> name = "Mark" >>> print (name + " is " + str(age) + " years old") Mark is 20 years old >>> print ("{0} is {1} years old".format(name, age)) Mark is 20 years old >>> print("%s is %d years old" % (name, age)) Mark is 20 years old %s 문자열 (string): 모든변수 ( 정수, 실수등 ) 는 %s 를이용하여출력할수있다 %d 정수 %c 문자 (character) %f 실수 : %5.7f 와같은형식을사용할경우에는정수부분은 5 자리 ( 모자라면공백으로채운다 ) 로표시하고소수점은 7 자리까지표현한다 %o 8 진수 %x 16 진수 20
Data Type : 문자열 (String) >>> a = "My Name is Mark" >>> a.count('m') 2 >>> a.count('my') 1 >>> a = "My Name is Mark" >>> len(a) 15 >>> a = "My Name is Mark" >>> a.find('i') 8 >>> a.find('i') # 대소문자를구별한다 -1 >>> a.find('m') # 가장먼저찾은인덱스를반환한다 0 >>> a.find('m') 5 >>> a.find('is') # 찾은문자열의맨처음인덱스를반환한다 8 21
Data Type : 문자열 (String) >>> a = "My Name is Mark" >>> a.upper() 'MY NAME IS MARK >>> a = "My Name is Mark" >>> a.lower() 'my name is mark >>> a = " My Name is Mark " >>> a.lstrip() 'My Name is Mark ' >>> a.rstrip() ' My Name is Mark' >>> a.strip() 'My Name is Mark 22
Data Type : 문자열 (String) >>> a = "My name is Mark" >>> a.replace('mark', 'James') 'My name is James >>> a = 10 >>> print(10) 10 >>> print(str(10)) 10 >>> age = 20 >>> name = "Mark" >>> print (name + " is " + age + " years old") Traceback (most recent call last): File "<pyshell#45>", line 1, in <module> print (name + " is " + age + " years old") TypeError: Can't convert 'int' object to str implicitly >>> print (name + " is " + str(age) + " years old") Mark is 20 years old 23
Data Type : 문자열 (String) >>> a = "My Name is Mark" >>> a.split() ['My', 'Name', 'is', 'Mark'] >>> a = "2017-06-25" >>> a.split('-') ['2017', '06', '25'] 24
Data Type : 리스트 (List) >>> score = [100, 90, 95] >>> score [100, 90, 95] >>> score[0] 100 >>> score[1] 90 >>> score[2] 95 >>> score[3] Traceback (most recent call last): File "<pyshell#67>", line 1, in <module> score[3] IndexError: list index out of range >>> score = [[100, 90, 95], [70, 80, 90]] >>> score [[100, 90, 95], [70, 80, 90]] >>> score[0] [100, 90, 95] >>> score[1] [70, 80, 90] >>> score[0][0] 100 >>> score[0][1] 90 25
Data Type : 리스트 (List) >>> a = [10, 20, 30, 40] >>> a.append(50) >>> a [10, 20, 30, 40, 50] >>> a.insert(1, 15) >>> a [10, 15, 20, 30, 40, 50] >>> a.pop() 50 >>> a [10, 15, 20, 30, 40] >>> a.pop(1) 15 >>> a [10, 20, 30, 40] >>> del a[1] >>> a [10, 30, 40] 26
Data Type : 리스트 (List) >>> a = [10, 20, 30, 40] >>> a[1] 20 >>> a[1] = 25 >>> a [10, 25, 30, 40] >>> a = ['A+', 'A0', 'A+', 'F'] >>> a.count('a+') 2 >>> a.count('f') 1 >>> a.count('b+') 0 27
Data Type : 리스트 (List) >>> a = ['A+', 'A0', 'A+', 'F'] >>> a.sort() >>> a ['A+', 'A+', 'A0', 'F'] >>> a = [30, 20, 40, 15] >>> a.reverse() >>> a [15, 40, 20, 30] >>> a = ['A+', 'A0', 'A+', 'F'] >>> a.clear() [] >>> type(a) <class 'list'> >>> b = list() >>> type(b) <class 'list'> 28
Data Type : 리스트 (List) >>> a = [10, 20, 30] >>> b = [40, 50] >>> a+b [10, 20, 30, 40, 50] >>> a*2 [10, 20, 30, 10, 20, 30] 29
Data Type : 튜플 (tuple) >>> a = (10, 20, 30) >>> a (10, 20, 30) >>> a.append(40) Traceback (most recent call last): File "<pyshell#130>", line 1, in <module> a.append(40) AttributeError: 'tuple' object has no attribute 'append' >>> a[1] = 25 Traceback (most recent call last): File "<pyshell#141>", line 1, in <module> a[1] = 25 TypeError: 'tuple' object does not support item assignment >>> del a(1) SyntaxError: can't delete function call >>> a[1] 20 >>> a[1:] (20, 30) 30
Data Type : 딕셔너리 (Dictionary) >>> score = {'korean':100, 'english':90, 'math':95} >>> score {'math': 95, 'korean': 100, 'english': 90} >>> score['english'] 90 >>> score['english'] = 95 >>> score {'math': 95, 'korean': 100, 'english': 95} >>> score['science'] = 90 >>> score {'math': 95, 'korean': 100, 'science': 90, 'english': 95} >>> del score['math'] >>> score {'korean': 100, 'science': 90, 'english': 95} >>> del score['music'] Traceback (most recent call last): File "<pyshell#159>", line 1, in <module> del score['music'] KeyError: music 31
Data Type : 딕셔너리 (Dictionary) >>> score = {'korean':100, 'english':90, 'math':95} >>> score.keys() dict_keys(['math', 'korean', 'english']) >> list(score.keys()) ['math', 'korean', 'english'] >>> score = {'korean':100, 'english':90, 'math':95} >>> score.values() dict_values([95, 100, 90]) >>> list(score.values()) [95, 100, 90] >>> score = {'korean':100, 'english':90, 'math':95} >>> score.items() dict_items([('math', 95), ('korean', 100), ('english', 90)]) >>> list(score.items()) [('math', 95), ('korean', 100), ('english', 90)] 32
Data Type : 딕셔너리 (Dictionary) >>> score = {'korean':100, 'english':90, 'math':95} >>> score.clear() >>> score {} >>> score = {'korean':100, 'english':90, 'math':95} >>> score.get('korean') 100 >>> score['korean'] 100 >>> score.get('science') >>> >>> score.get('science', 'NONE') 'NONE >>> score = {'korean':100, 'english':90, 'math':95} >>> 'korean' in score True >>> 'science' in score False 33
Data Type : 집합 (set) >>> a = set([10, 20, 30]) >>> a {10, 20, 30} >>> b = set("python") >>> b {'P', 'T', 'O', 'H', 'Y', 'N'} 34
Data Type : 집합 (set) >>> a = set([10, 20, 30]) >>> b = set([30, 40, 50]) >>> a & b {30} >>> a.intersection(b) {30} >>> a = set([10, 20, 30]) >>> b = set([30, 40, 50]) >>> a b {50, 20, 40, 10, 30} >>> a.union(b) {50, 20, 40, 10, 30} >>> a = set([10, 20, 30]) >>> b = set([30, 40, 50]) >>> a - b {10, 20} >>> a.difference(b) {10, 20} 35
Data Type : 집합 (set) a = set([10, 20, 30]) >>> a {10, 20, 30} >>> a.add(40) >>> a {40, 10, 20, 30} >>> a.update([50, 60, 70]) >>> a {70, 40, 10, 50, 20, 60, 30} >>> a = set([10, 20, 30]) >>> a {10, 20, 30} >>> a.remove(20) >>> a {10, 30} 36
Data Type : 참 (True) / 거짓 (False) abc True False [10, 20, 30] True [] False 1 True 0 False None False ( 파이썬에는 None 이라는데이터값이존재한다. 단순하게값이없다고생각하면된다 ) 37
31 흐름제어 38
조건판단 : if else 시험점수가 60 점이상이면합격이고그렇지않으면 ( 미만 ) 이면실격이다. IF( 만약에 ) 시험점수가 60 이상이면 (THEN) 합격이고그렇지않으면 (ELSE) 실격이다 공백 4 칸또는 Tab if [ 조건식 ]: 조건식에맞는처리 1 조건식에맞는처리 2 else: 조건식에맞지않는경우처리 1 조건식에맞지않는경우처리 2 >>> score = 90 >>> if score >= 60: print(" 합격 ") else: print(" 실격 ") 합격 >>> score = 59 >>> if score >= 60: print(" 합격 ") else: print(" 실격 ") 실격 >>> 39
조건판단 : if else X > Y X >= Y X < Y X <= Y X == Y X!= Y X 가 Y 보다큰경우참 (True) X 가 Y 보다크거나같은경우참 (True) X 가 Y 보다작은경우참 (True) X 가 Y 보다작거나같은경우참 (True) X 가 Y 와같은경우참 (True): 파이썬에 = 은대입연산자 ( 우측의값을오른쪽에넣는다는의미 ) 로사용되기때문에 = 를두개연속해서사용한다 X 가 Y 와같지않은경우참 (True) >>> x = 20 >>> y = 10 >>> x > y True >>> x >= y True >>> x < y False >>> x <= y False >>> x == y False >>> x!= y True >>> x = 10 >>> y = 10 >>> x > y False >>> x >= y True >>> x == y True >>> x!= y False 40
조건판단 : if else X and Y X 가참이고 Y 가참인경우에참 (True) 이다 X or Y X 또는 Y 중한가지이상이참인경우에참 (True) 이다 not X X 가참 (True) 이면거짓 (False), 거짓 (False) 이면참 (True) 이다 >>> korean = 90 >>> math = 85 >>> (korean >= 90) and (math >= 90) False >>> (korean >= 90) or (math >= 90) True >>> not korean #korean 점수가 1 이상이면무조건참이다. 만약 korean = 0 이라면 False False 41
조건판단 : if else score = 91 if score >= 90: print("a") else: if score >= 80: print("b") else: if score >= 70: print("c") else: print("d") if score >= 90: print("a") elif score >= 80: print("b") elif score >= 70: print("c") else: print("d") >>> if score >= 70: print(" 합격 ") else: print(" 불합격 ") 42
반복수행 : while while [ 조건식 ]: 처리할문장 1 처리할문장 2 >>> a = 1 >>> while a < 5: 1 2 3 4 >>> print(a) a = a + 1 43
반복수행 : while # while Test goal = 55 while True: in_data = int(input(" 숫자를입력하세요 : ")) if in_data < goal: print(" 숫자가작습니다 ") elif in_data > goal: print(" 숫자가큽니다 ") else: print(" 일치합니다 ") break print(" 종료합니다 ") ======== RESTART: c:/python_sample/test.py ======== 숫자를입력하세요 : 1 숫자가작습니다숫자를입력하세요 : 50 숫자가작습니다숫자를입력하세요 : 70 숫자가큽니다숫자를입력하세요 : 60 숫자가큽니다숫자를입력하세요 : 56 숫자가큽니다숫자를입력하세요 : 55 일치합니다종료합니다 >>> 44
반복수행 : while while True: in_data = input(" 문자열을입력하세요 :") if len(in_data) < 3: print(" 문자열의길이가짧습니다 ") continue print(" 종료합니다 ") break ======== RESTART: c:/python_sample/test.py ======== 문자열을입력하세요 :ab 문자열의길이가짧습니다문자열을입력하세요 :abcd 종료합니다 >>> 45
반복수행 : for for [ 변수 ] in 리스트 ( 또는튜플, 문자열 ): 처리할문장 1 처리할문장 2 >>> a = ['A', 'B', 'C'] >>> for i in a: print(i) A B C >>> a = [('science', 100), ('math', 95), ('computer', 97)] >>> for (subject, score) in a: print(subject + " : " + str(score)) science : 100 math : 95 computer : 97 >>> 46
반복수행 : for >>> a =range(10) >>> a range(0, 10) >>> a = range(2, 10) >>> a range(2, 10) >>> for i in range(1, 5): print(i) 1 2 3 4 # test.py로저장한후실행 score = [100, 55, 70, 35, 90] for number in range(len(score)): if (score[number] >= 60): print("%d 학생합격 " % (number+1)) else: print("%d 학생불합격 " % (number+1)) ======== RESTART: c:/python_sample/test.py ======== 1 학생합격 2 학생불합격 3 학생합격 4 학생불합격 5 학생합격 47
41 입력 (INPUT) 과출력 (OUTPUT) 48
데이터입출력 >>> a = input() 안녕하세요 >>> print(a) 안녕하세요 >>> a = input() 123 >>> print(a) 123 >>> a + 10 Traceback (most recent call last): File "<pyshell#45>", line 1, in <module> a + 10 TypeError: Can't convert 'int' object to str implicitly >>> a = int(input()) 123 >>> a + 10 133 >>> a = input(" 이름을입력하세요 :") 이름을입력하세요 : 홍길동 >>> print(a) 홍길동 >>> 49
데이터입출력 >>> print(" 안녕하세요 " + " 홍길동입니다 ") 안녕하세요홍길동입니다 >>> print(" 안녕하세요 " " 홍길동입니다 ") 안녕하세요홍길동입니다 >>> print(" 안녕하세요 " + " " + " 홍길동입니다 ") 안녕하세요홍길동입니다 >>> print(" 안녕하세요 ", " 홍길동입니다 ") 안녕하세요홍길동입니다 >>> >>> for i in range(4): print(i) 0 1 2 3 >>> for i in range(4): print(i, end=" ") 0 1 2 3 50
파일입출력 # 파일객체 = open( 파일명, 파일속성 ) >>> f = open( test.txt, w ) >>> f.close() r w a 파일명에해당하는파일을읽기모드로연다 파일명에해당하는파일을생성한다. 만약기존에파일이있다면기존파일을삭제하고새로만든다 ( 기존내용이전부사라진다 ) 기존에있는파일의마지막부터데이터를추가한다 51
파일입출력 52
파일입출력 f = open("c:/python_sample/test.txt", "r") print(f.readline()) f.close() ======== RESTART: c:/python_sample/test.py ======== Python File I/O Test >>> f = open("c:/python_sample/test.txt", "r") while True: line = f.readline() if not line: break print(line) f.close() ======== RESTART: c:/python_sample/test.py ======== Python File I/O Test 파이썬파일입출력시험 >>> 53
파일입출력 f = open("c:/python_sample/test.txt", "r") lines = f.readlines() for line in lines: print(line) f.close() ======== RESTART: c:/python_sample/test.py ======== Python File I/O Test 파이썬파일입출력시험 >>> f = open("c:/python_sample/test.txt", "r") lines = f.read() print(lines) f.close() ======== RESTART: c:/python_sample/test.py ======== Python File I/O Test 파이썬파일입출력시험 >>> 54
파일입출력 f = open("c:/python_sample/test.txt", 'w') f.write(" 첫번째줄입니다 ") f.write(" 두번째줄입니다 ") f.close() f = open("c:/python_sample/test.txt", 'w') f.write(" 첫번째줄입니다 ") f.write("\n") f.write(" 두번째줄입니다 ") f.write("\n") f.close() f = open("c:/python_sample/test.txt", 'a') f.write(" 추가한줄입니다 ") f.close() 55
파일입출력 with open("c:/python_sample/test.txt", "r") as f: lines = f.read() print(lines) 56
51 함수 (FUNCTION) 57
함수 (Function) 란무엇인가? def 함수명 ( 인자1, 인자2,, 인자n): 수행할명령 1 수행할명령 2 return 반환할값 ( 반환할값이없으면지정하지않아도된다 ) >>> def sum(a, b): result = a + b return result >>> a = 1 >>> b = 2 >>> ret = sum(a, b) >>> print(ret) 3 >>> print(sum(4, 10)) 14 >>> 58
함수 (Function) 란무엇인가? def display_msg(msg, times=1): print(msg * times) display_msg("test") display_msg("test", 5) ======== RESTART: c:/python_sample/test.py ======== TEST TESTTESTTESTTESTTEST >>> def sum(*args): result = 0 for i in args: result = result + i return result ======== RESTART: c:/python_sample/test.py ======== 3 10 >>> print(sum(1, 2)) print(sum(1, 2, 3, 4)) 59
함수 (Function) 란무엇인가? def local(): a = 3 #3-1번째수행 print("local a: " + str(a)) a = 10 #1번째수행 print(a) #2번째수행 local() #3번째수행 print(a) #4번째수행 ======== RESTART: c:/python_sample/test.py ======== 10 local a: 3 10 >>> a = 1 #1번째수행 def local(): global a a = 3 # 4-1번째수행 print("global a:" + str(a)) ======== RESTART: c:/python_sample/test.py ======== 10 global a:3 3 a = 10 print(a) local() print(a) #2번째수행 #3번째수행 #4번째수행 #5번째수행 60
함수 (Function) 란무엇인가? a = 1 #1 번째수행 def local(): a = 3 # 4-1번째수행 print("global a:" + str(a)) return a a = 10 #2번째수행 print(a) #3번째수행 a = local() #4번째수행 (return으로반환한값을 a 변수에대입 ) print(a) #5번째수행 61
61 모듈 (MODULE) 62
모듈 (Module) 이란무엇인가? def sum(*args): result = 0 for i in args: result = result + i return result 63
모듈 import 와단독사용 def sum(*args): result = 0 for i in args: result = result + i return result Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64 bit (AMD64)] on win32 Type "copyright", "credits" or "license()" for more information. >>> import my_module 6 >>> print (sum(1,2,3)) def sum(*args): result = 0 for i in args: result = result + i return result Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64 bit (AMD64)] on win32 Type "copyright", "credits" or "license()" for more information. >>> import my_module >>> if name == " main ": print (sum(1,2,3)) 64