mirror of
https://github.com/openappsec/openappsec.git
synced 2025-06-28 16:41:02 +03:00
I analyzed the WAAP codebase to locate and understand the calculation of: - User Reputation Score (in BehaviorAnalysis.cc/h) - Payload Score (in ConfidenceCalculator.cc/h) - URL Score (in ConfidenceCalculator.cc/h) - Parameter Score (in ConfidenceCalculator.cc/h) A summary of these findings has been added in Scoring_Mechanisms_Summary.md. No code changes were made as you requested analysis and identification.
6.9 KiB
6.9 KiB
Scoring Mechanisms Summary
This document summarizes the calculation methods for User Reputation scores and Payload/URL/Parameter confidence scores within the WAAP system.
User Reputation Score
-
Relevant Files:
components/security_apps/waap/waap_clib/BehaviorAnalysis.cc
components/security_apps/waap/waap_clib/BehaviorAnalysis.h
-
Key Function for Final Score Calculation:
BehaviorAnalyzer::getRelativeReputation(double absoluteReputation)
-
Key Data Structure Holding the Score:
ReputationData
: The final score is stored in itsrelativeReputation
field (adouble
).
-
Brief Overview of Calculation Flow:
- Traffic Logging & Initial Scoring:
- Incoming requests are processed by
BehaviorAnalyzer::analyze_behavior()
. - If an attack is detected (based on keyword matches and scores from other components),
TopBucket::putAttack()
is called. This updatesattacksScoreSum
andmissed_urls
inCounters
for the source IP, User-Agent (UA), and IP+UA combination within the respectiveSource
objects. - For legitimate traffic,
TopBucket::addKeys()
is called, incrementingcountLegit
in theCounters
for the IP, UA, and IP+UA.
- Incoming requests are processed by
- Individual Source Reputation (
Source::getInfo()
):- For each source type (IP, UA, IP+UA), this function calculates a
reputation
score. - This calculation involves:
missed_urls_score
: Derived fromCounters::missed_urls
.legit_vs_attacks
: A score comparingCounters::countLegit
toCounters::attacksScoreSum
.coverage
: A metric based onmissed_urls_score
.
- The final
reputation
for the source is a normalized product of these components.
- For each source type (IP, UA, IP+UA), this function calculates a
- Absolute Reputation (
TopBucket::getInfo()
):- This function calls
Source::getInfo()
for IP, UA, and IP+UA. - The
absoluteReputation
is calculated as the simple average of these three individual reputation scores.
- This function calls
- Global Statistics Update (
BehaviorAnalyzer::updateAvrageAndVariance()
):- The newly calculated
absoluteReputation
is used to incrementally update the global mean (m_reputation_mean
) and variance (m_variance
) of all absolute reputation scores observed by the system.
- The newly calculated
- Relative Reputation - Final Score (
BehaviorAnalyzer::getRelativeReputation()
):- This function takes the
absoluteReputation
. - It normalizes this score by comparing it to the global
m_reputation_mean
, adjusted by the globalm_variance
(which is itself modified by aviscosity
factor to slow down rapid changes). - The normalized deviation is then passed through
BehaviorAnalyzer::errorProbabilityScore()
, which uses the mathematical error function (erf
) to produce a probabilistic score (0.0 to 1.0). - This probabilistic score is then scaled by 10 to yield the final
relativeReputation
, which typically ranges from 0.0 to 10.0. ThisrelativeReputation
is the User Reputation Score.
- This function takes the
- Traffic Logging & Initial Scoring:
Payload, URL, and Parameter Confidence Scores
-
Relevant Files:
components/security_apps/waap/waap_clib/ConfidenceCalculator.cc
components/security_apps/waap/waap_clib/ConfidenceCalculator.h
-
Key Function for Score Updates:
ConfidenceCalculator::calculateInterval()
-
Key Data Structure for Accumulating Scores:
m_confidence_level
(UMap<Key, UMap<Val, double>>
): A map whereKey
identifies the item being scored (e.g., "param#username") andVal
is the specific observed value (e.g., "admin"). Thedouble
is the accumulated confidence score for that (Key, Value) pair, building towards a threshold.
-
Key Data Structure for "Confident" Items:
m_confident_sets
(UMap<Key, ValueSetWithTime>
): A map whereKey
is the item identifier.ValueSetWithTime
contains a set ofVal
s that have reached the confidence threshold for thatKey
, along with a timestamp of the last update. These are considered the learned baseline of normal/expected values.
-
Distinguishing Score Types:
- Different types of scores (Payload, URL, Parameter) are distinguished by the string format of the
Key
. This is a convention established by the callers of theConfidenceCalculator
. - Examples:
- Parameter:
"param#<parameter_name>"
(e.g.,"param#country_code"
) - URL:
"url#<url_pattern_or_exact_url>"
(e.g.,"url#/api/v1/users"
) - Payload-related aspects (e.g., data types, specific field values) would also use a structured
Key
string, like"payload#dataType#fieldName"
.
- Parameter:
- The
ConfidenceCalculator
uses thisKey
to group observations and their corresponding scores.
- Different types of scores (Payload, URL, Parameter) are distinguished by the string format of the
-
Brief Overview of How a Value Becomes "Confident":
- Observation Logging (
ConfidenceCalculator::log()
):- When a specific
Value
(e.g., "US") is observed for aKey
(e.g., "param#country_code") from a particularSource
(e.g., an IP address), this observation is logged inm_time_window_logger
.
- When a specific
- Interval Calculation (
ConfidenceCalculator::calculateInterval()
):- Periodically, this function processes the logged data from the past interval.
- For each
(Key, Value)
pair, it calculates how many unique sources observed it, and the ratio of these sources to all unique sources that interacted with theKey
. - The confidence score
m_confidence_level[Key][Value]
is incremented. The increment amount depends on:- A base value related to
SCORE_THRESHOLD
(target score, e.g., 100.0) andminIntervals
(number of intervals to reach confidence). - The calculated ratio of sources (higher ratio = bigger increment).
- A logarithmic scaling of the number of sources observing the value (more sources increase confidence, but with diminishing returns).
- Tuning factors (e.g., if a parameter is marked as benign, its values gain confidence faster).
- A base value related to
- If a previously known
(Key, Value)
is not seen in the current interval, its score inm_confidence_level
decays, reducing its confidence over time unless reinforced.
- Reaching Confidence Threshold (
ConfidenceCalculator::calcConfidentValues()
):- After scores in
m_confidence_level
are updated, this function checks them. - If
m_confidence_level[Key][Value]
reaches the predefinedSCORE_THRESHOLD
(e.g., 100.0), thatValue
is considered "confident" for thatKey
. - The confident
Value
is then added to them_confident_sets[Key]
. This set represents the learned baseline of expected values for that specific parameter, URL, or payload characteristic.
- After scores in
- Usage:
- Other parts of the system can then use
ConfidenceCalculator::is_confident(Key, Value)
to check if a newly observed value is part of this learned baseline. Values not found in the confident set might be considered anomalous or suspicious.
- Other parts of the system can then use
- Observation Logging (