Stock Liao information

— Basic knowledge of stocks|Introduction to basics of stocks|Stock learning|Basic knowledge of stocks
Mobile access:m.liaochihuo.com

Python crawls historical trading data of some stocks in Sina Finance

Release Time:2021-05-13 Topic:How to obtain stock trading data Reading:55 Navigation:Stock Liao information > Military > Python crawls historical trading data of some stocks in Sina Finance phone-reading

This article is for learning and communication only. If there are errors or omissions, please understand. Welcome everyone to study and discuss together!

Reference materials (thanks!) Crawling preparations for crawling ideas Module 1: Web form data crawling module 2: Add output data source code (may be modified in the near future...) Crawl nearly a month of historical transaction data Crawl nearly One year of historical transaction data

Reference materials (thanks!)

Supporting role Seven Three-How to grab the form in the web page:

https://zhuanlan.zhihu.com/p/33986020

Crawling preparation

import requestsfrom bs4 import BeautifulSoupimport pandas as pdimport osimport timeimport random

Crawling ideas

Find the webpage where the data is located, use developer tools to view the webpage url, request status, source code, etc., and then locate the data element. Subsequently, programming is performed. Use related functions to simulate access to web pages, collect data, process them, and save them locally. (The details are not in place, please forgive me, the blogger will find time to summarize separately)

Module 1: Web form data crawling

def get _ stock _ table(stockcode,i): url ='http://vip.stock.finance.sina.com.cn/corp/go.php/vMS _ MarketHistory/stockid/' + str( stockcode) +'.phtml?year=2019&jidu=' + str(i) print(url) res = requests.get(url) res.encoding ='gbk' soup = BeautifulSoup(res.text,'lxml') tables = soup.find _ all('table', {'id':'FundHoldSharesTable'}) df _ list = [] for table in tables: df _ list.append(pd.concat(pd.read _ html(table.prettify()))) df = pd.concat(df _ list) df.columns = df.iloc[ 0] headers = df.iloc[ 0] df = pd.DataFrame(df.values[ 1: ], columns=headers) #print(len(df)-1) #df There are several rows of data if (len(df)-1 <22): c =len(df)-1 df = add _ stock _ table(stockcode,i,c,df)else: df =pd.DataFrame(df.values[ 1:22 ], columns=headers) df = df.reset _ index(drop=True) df.to _ excel('...\\'+str(stockcode) +'.xlsx') sleeptime = random.randint(1, 10) #print(sleeptime) time.sleep(sleeptime)

However, the above function may sometimes not get one month's worth of data, so we need to add another function to add data. If you want to get a year, you need to add a loop

Module 2: Add output data

def add _ stock _ table(stockcode,i,c,df): i = i-1 url ='http://vip.stock.finance.sina.com.cn/corp/go.php/vMS _ MarketHistory/stockid/' + str(stockcode) +'.phtml?year=2019&jidu=' + str( i) #print(url) res = requests.get(url) res.encoding ='gbk' soup = BeautifulSoup(res.text,'lxml') tables = soup.find _ all('table', {'id':'FundHoldSharesTable'}) df _ addlist = [] for table in tables: df _ addlist.append(pd.concat(pd.read _ html(table.prettify()))) df _ add = pd.concat(df _ addlist) headers = df _ add.iloc[ 0] df _ add = pd.DataFrame(df _ add.values[ 1:random.randint(20, 22)-c ], columns=headers) #print(df _ add) df _ sum = df.append(df _ add) #print(df _ sum) #print(len(df _ sum)-1) return df _ sum

Remember! This article is for learning and communication only. If there are any mistakes or omissions, please forgive me and welcome suggestions! The blogger is more (lazy) than the buddha, please modify it as you wish!

Source code (may be revised in the near future...)

Note:

The source code cannot be directly applied!

The path of the .xlsx file needs to be modified! !

Others make changes according to their needs.

Crawl historical transaction data for the past month

from bs4 import BeautifulSoupimport requestsimport pandas as pdimport osimport timeimport randomdef get _ stock _ table(stockcode,i): url ='http://vip.stock.finance.sina.com.cn/corp/go.php/vMS _ MarketHistory/stockid/' + str( stockcode) +'.phtml?year=2019&jidu=' + str(i) print(url) res = requests.get(url) res.encoding ='gbk' soup = BeautifulSoup(res.text,'lxml') tables = soup.find _ all('table', {'id':'FundHoldSharesTable'}) df _ list = [] for table in tables: df _ list.append(pd.concat(pd.read _ html(table.prettify()))) df = pd.concat(df _ list) df.columns = df.iloc[ 0] headers = df.iloc[ 0] df = pd.DataFrame(df.values[ 1: ], columns=headers) #print(len(df)-1) #df There are several rows of data if (len(df)-1 <22): c =len(df)-1 df = add _ stock _ table(stockcode,i,c,df) else: df =pd.DataFrame(df.values[ 1:22 ], columns=headers) df = df.reset _ index(drop=True) df.to _ excel('...\\'+str(stockcode) +'.xlsx') sleeptime = random.randint(1, 10) #print(sleeptime) time.sleep(sleeptime)def add _ stock _ table(stockcode,i,c,df): i = i-1 url ='http://vip.stock.finance.sina.com.cn/corp/go.php/vMS _ MarketHistory/stockid/' + str(stockcode) +'.phtml?year=2019&jidu=' + str( i) #print(url) res = requests.get(url) res.encoding ='gbk' soup = BeautifulSoup(res.text,'lxml') tables = soup.find _ all('table', {'id':'FundHoldSharesTable'}) df _ addlist = [] for table in tables: df _ addlist.append(pd.concat(pd.read _ html(table.prettify()))) df _ add = pd.concat(df _ addlist) headers = df _ add.iloc[ 0] df _ add = pd.DataFrame(df _ add.values[ 1:random.randint(20, 22)-c ], columns=headers) #print(df _ add) df _ sum = df.append(df _ add) #print(df _ sum)#print(len(df _ sum)-1) return df _ sumif _ _ name _ _ == "_ _ main _ _ ": if os.path.exists("...\\601006.xlsx") == True: os.remove("...\\601006.xlsx")stockcode = ['601006', '000046', '601398', '000069', '601939', '000402', '000001', '000089', '000027', '399001', '000002', '000800', '601111', '600050', '601600', '600028', '601857', '601988', '000951', '601919']i=2index = 1print("Crawling month _ stock information...\n")print("---------------\n")print("Please wait patiently...\n")for x in stockcode: print(index) get _ stock _ table(x,i) index +=1

Crawling historical transaction data for the past year

from bs4 import BeautifulSoupimport requestsimport pandas as pdimport osimport timeimport randomdef get _ stock _ yeartable(stockcode,s,y): url ='http://vip.stock.finance.sina.com.cn/corp/go.php/vMS _ MarketHistory/stockid/' + str( stockcode) +'/type/S.phtml?year=' + str(y) +'&jidu=' + str(s) print(url) res = requests.get(url) res.encoding ='gbk' soup = BeautifulSoup(res.text,'lxml') tables = soup.find _ all('table', {'id':'FundHoldSharesTable'}) df _ list = [] for table in tables: df _ list.append(pd.concat(pd.read _ html(table.prettify()))) df = pd.concat(df _ list) df.columns = df.iloc[ 0] headers = df.iloc[ 0] df = pd.DataFrame(df.values[ 1: ], columns=headers) #print(len(df)-1) #df There are several rows of data while len(df)0: df = add _ stock _ table(stockcode,s,y,df) s -= 1 s = 5 y -= 1 df = df.reset _ index(drop=True) df = pd.DataFrame(df.values[ 1:250 ], columns=headers) df.to _ excel('D:\\Workplace\\PyCharm\\MySpider\\sh'+str(stockcode) +'.xlsx') sleeptime = random.randint(1, 10) #print(sleeptime) time.sleep(sleeptime)def add _ stock _ table(stockcode,s,y,df): print(y,"-",s) url ='http://vip.stock.finance.sina.com.cn/corp/go.php/vMS _ MarketHistory/stockid/' + str( stockcode) +'/type/S.phtml?year=' + str(y) +'&jidu=' + str(s) print(url) res = requests.get(url) res.encoding ='gbk' soup = BeautifulSoup(res.text,'lxml') tables = soup.find _ all('table', {'id':'FundHoldSharesTable'}) df _ addlist = [] for table in tables: df _ addlist.append(pd.concat(pd.read _ html(table.prettify()))) df _ add = pd.concat(df _ addlist) headers = df _ add.iloc[ 0] df _ add = pd.DataFrame(df _ add.values[ 1: ], columns=headers) #print(df _ add) df _ sum = df.append(df _ add) #print(df _ sum) #print(len(df _ sum)-1) return df _ sumif _ _ name _ _ == "_ _ main _ _ ": if os.path.exists("D:\\Workplace\\PyCharm\\MySpider\\sh000001.xlsx") == True: os.remove("D:\\Workplace\\PyCharm\\MySpider\\sh000001.xlsx")stockcode = ['000001']s = 2y = 2019index = 1print("Crawling year _ sh _ stock information...\n")print("---------------\n")print("Please wait patiently...\n")for x in stockcode: print(index) get _ stock _ yeartable(x,s,y)

Article Url:https://www.liaochihuo.com/info/555151.html

Label group:[python] [python function] [table

Hot topic

Military recommend

Military Popular