看到学长写的选课程序,觉得直接一改就是爬课表,然后就改了。
首先是学号和密码,涉及个人隐私就不写了。
学校信息门户的网址是http://my.hfut.edu.cn/login.portal,直接写上去完事。
验证码识别我写了两个,一个是基于Tesseract-OCR的,另一个是云打码,当然还是云打码好用。
源码大概就这样,首先是爬课表本身的py:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
|
import os from time import sleep
from PIL import Image, ImageEnhance from selenium import webdriver
import code_pytesseract import code_ydm
username = "<你的学号>" password = "<你的信息门户密码>" chromedriver = ".\\chromedriver.exe" driver = webdriver.Chrome(chromedriver) driver.implicitly_wait(30)
url = "http://my.hfut.edu.cn/login.portal" driver.get(url)
flag = 0 while True: print("尝试登录") while True: try: driver.find_element_by_id("username").clear() driver.find_element_by_id("password").clear() driver.find_element_by_id("code").clear() driver.find_element_by_id("username").send_keys(username) driver.find_element_by_id("password").send_keys(password) screenImg = "screenImg.png" if os.path.exists(screenImg): os.remove(screenImg) driver.get_screenshot_as_file(screenImg) left = 1149 top = 359 right = 1218 bottom = 384 img = Image.open(screenImg).crop((left, top, right, bottom)) img.save(screenImg) code = code_ydm._code_decode(screenImg) print(code) driver.find_element_by_id('code').send_keys(code) break except: print("页面加载失败,刷新后重试...") if driver.current_url == url: driver.refresh() else: driver.get(url) continue break try: driver.find_element_by_xpath("//*[@id=\"loginForm\"]/table[1]/tbody/tr[3]/td/input[1]").click() driver.implicitly_wait(100) if driver.current_url == url: raise RuntimeError print("登陆成功!") flag = 1 except: print("登录失败") if flag == 1: break driver.implicitly_wait(120)
url = "http://jxglstu.hfut.edu.cn/eams5-student/wiscom-sso/login" driver.get(url) sleep(2)
url = "http://jxglstu.hfut.edu.cn/eams5-student/for-std/course-table/" driver.get(url)
course_table = driver.find_element_by_id("lessons")
table_file = open("table.txt", 'w') table_file.write(course_table.text) table_file.close()
exit(0)
|
pytesseract这个源码超简单,就几行:
1 2 3 4 5 6 7 8 9 10 11 12 13
| import pytesseract from PIL import Image, ImageEnhance
def _code_decode(screenImg): img = Image.open(screenImg) code = pytesseract.image_to_string(img) return code.strip()
|
云打码的那个调用的官网上的py文件,直接:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
| import YDM
def _code_decode(filename): username = "<你的账号>" password = "<你的密码>" appid = <你的ID> appkey = "<你的appkey>" codetype = 1004 timeout = 60 yundama = YDM.YDMHttp(username, password, appid, appkey) uid = yundama.login() balance = yundama.balance() if balance <= 0: print("不氪金怎么变强") return "" cid, result = yundama.decode(filename, codetype, timeout) return result
|
最后别忘了__init__.py,空的就行。
然后就可以爬下课表来了。
PS:写的好烂,不知道如何描述,就这样