输入数据源:web访问日志,2T左右输出数据:攻击日志开发环境:*inux+hdfs+mapreduce+java




valuelength > dExpectation + dDeviation*number_of_devi
valuelength:待检测的参数值长度
dExpectation:所有请求参数值长度均值
dDeviation:所有请求参数值长度标准差
number_of_devi:调参,距离几个标准差

模型由4个MapReduce Job构成(如上图所示),
1. 日志解析Job
Mapper任务:按以下过程提取数据 (host+path+filename?paraName=paraValue, {"length":"xx","uri":"xxx"})
Reducer任务:数据去重
提取访问日志中的以下数据:
host 域名
uri 请求URI (进一步解析为请求路径,查询字符串,参数名/值对)
http_response_code 响应码
按以下条件进行日志过滤
(1)只处理响应码2xx,3xx的访问日志
http_response_code.startsWith("2") || http_response_code.startsWith("3")
(2)不处理静态请求
String [] path_bl = { "gif","jpe","png","bmp","ico","js","cs","avi","wma","mkv","doc","pdf","ppt","txt","csv","xls","lnk" };
(3)不处理参数名异常的数据
paramname.matches("[-\\.\\w\\d\\[\\]\\x22\\x27\\s]+")
(4)不处理查询字符串为空的请求
(5)不处理参数值为空的请求
2. E(X) D(X) 计算Job
Mapper任务:计算每个参数值的长度(host+path+filename?paraName,length)
Reducer任务:计算每个参数值的均值与标准差
(host+path+filename?paraName,{"expectation":xxx,"deviation":xxx})
并将结果存储到redis数据库中
均值计算公式:

标准差计算公式:

//assume is a normal distribution
double expectation = 0.0;
double variance = 0.0;
double deviation = 0.0;
double sum = 0.0;
double sum_square = 0.0;
int count = 0;
JSONObject result_json = new JSONObject();
while(v.hasNext()){
String value_str = v.next().toString();
double iLength = Double.parseDouble(value_str);
sum += iLength;
sum_square += iLength * iLength;
count++;
}
expectation = sum / count;
variance = (sum_square - sum*sum/count)/count;
deviation = Math.sqrt(variance);
result_json.put("expectation", expectation);
result_json.put("deviation", deviation);
jedis.set(k.toString(), result_json.toJSONString()) ;
output.collect(k, new Text(result_json.toJSONString()));
3. 异常提取Job
Mapper任务:读取redis数据库中的参数值对应的均值与标准差,按以下条件提取异常数据
(host+path+filename?paraName=paraValue,
{"originURL":"xxx","attackType":"none","expectation":xxx,"deviation":xxx,"length":xxx})
(1)长度值在异常范围内的
valuelength > dExpectation + dDeviation*number_of_devi
(2)排除参数名在白名单内的请求
if(Toolkit.isParaNameWL(k, paraName))
return;参数白名单从配置文件中读取ParaNameWL
(3)去除参数取值类型在白名单中的数据
允许URI,Path,Email,SafeText
if(Toolkit.isPathText(paraValue) ||Toolkit.isSafeText(paraValue) || Toolkit.isURIText(paraValue) || Toolkit.isEmailText(paraValue))
return;
Reducer任务:数据去重
4. 攻击提取Job
异常不是攻击,接下来我们使用弱攻击签名来提取攻击
以SQL签名来示例,这里的if(Toolkit.isSQLI(paraValue)){
attackType = "SQLI";
}else if(Toolkit.isFI("FI")){
attackType = "FI";
}else if(Toolkit.isXSS("XSS")){
attackType = "XSS";
}else if(Toolkit.isCE("CE")){
attackType = "CE";
}
攻击数据实例:public static boolean isSQLI(String str){
boolean isSQLI = false;
Set<String> pattern = new HashSet<String>();
pattern.add("select");
pattern.add("union");
pattern.add("benchmark");
pattern.add("sleep");
pattern.add("pg_sleep");
pattern.add("updatexml");
pattern.add("extractvalue");
pattern.add("dbms_pipe");
pattern.add("hex");
pattern.add("ascii");
pattern.add("cast");
pattern.add("concat");
pattern.add("convert");
pattern.add("chr");
pattern.add("char");
pattern.add("upper");
pattern.add("substr");
pattern.add("substring");
pattern.add("floor");
pattern.add("bitand");
pattern.add("exec");
pattern.add("xp_cmdshell");
pattern.add("waitfor");
pattern.add("iif");
pattern.add("outfile");
pattern.add("dumpfile");
pattern.add("*/");
pattern.add("insert");
pattern.add("update");
pattern.add("delete");
pattern.add("drop");
isSQLI = containsWord(pattern, str);
return isSQLI;
}
