web訪問日志中含有來訪IP,通過IP查看歸屬地,最后統計訪問的區域分布,可細化到省、市
淘寶接口地址:http://ip.taobao.com/service/getIpInfo.php?ip=14.215.177.38,后面的IP按需修改
例如要查看14.215.177.38這個地址的相關信息,返回的信息如下:
{"code":0,"data":
{"country":"\u4e2d\u56fd",
"country_id":"CN",
"area":"\u534e\u5357",
"area_id":"800000",
"region":"\u5e7f\u4e1c\u7701",
"region_id":"440000",
"city":"\u5e7f\u5dde\u5e02",
"city_id":"440100",
"county":"",
"county_id":"-1",
"isp":"\u7535\u4fe1",
"isp_id":"100017",
"ip":"14.215.177.38"}
}
返回內容以字典形式保存,code表示查詢狀態(0為成功,1為失?。?,具體的信息有:所屬國家、區域、省份、市、所屬運營商。由于用unicode編碼,中文保存成\u4e2d等形式,使用unicode轉中文工具即可查看其中的內容。
要求,分析訪問IP的所屬省份(國外IP劃分在一起),分析各個省份分布比例。日志中的IP先處理保存成次數+IP的格式:
代碼如下:
#!/usr/bin/env python #coding:utf-8 from __future__ import division import urllib2 bs_url = " # 定義一個全局字典,用來存放最終的統計數據,保存格式{'省份':{'IP':次數,...},...} region_dic = { } # 用于獲取IP信息的函數,并計入以上的字典 def get_data(IP,WIGHT=1): city = "" area = "" country = "" region = "" isp = "" request = urllib2.Request(bs_url+IP) reponse = urllib2.urlopen(request) #print result result = eval(reponse.read()) #print result code = result['code'] country_id = result['data']['country_id'] #print country_id if code == 0: if country_id == 'CN': city = result['data']['city'].decode('unicode-escape') area = result['data']['area'].decode('unicode-escape') country = result['data']['country'].decode('unicode-escape') region = result['data']['region'].decode('unicode-escape') isp = result['data']['isp'].decode('unicode-escape') else: region = u"國外" #print region if region not in region_dic.keys(): region_dic['%s'%region] = { } region_dic['%s'%region]['%s'%IP] = int(WIGHT) else: print "request error" #print "IP:%s\nCity:%s\nArea:%s\nCountry:%s\nRegion:%s\nISP:%s"%(IP,city,area,country,region,isp) if __name__ == '__main__': count = -1 ip_list = [] fo = open('ips.txt','r') # 要分析的IP保存在文件中 for line in fo.xreadlines(): wight,ip = line.strip().split() get_data(ip,wight) count += int(wight) fo.close() print u'合計:' for regions,stats in region_dic.items(): times = 0 for time in stats.values(): times += time print "%s:%.2f %%"%(regions.encode('utf-8'),int(times)/count)
運行結果:
注:其他可用的IP庫接口:
新浪接口 http://int.dpool.sina.com.cn/iplookup/iplookup.php?format=js&ip=14.215.177.38
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。