Introducing Stable Behavioral Signatures: Clustering JavaScript by What It Does

ScanMalware Research Team
9 min read

Stable Behavioral Signatures - JavaScript API calls being clustered into behavioral categories

The Problem: Exact Hashes Don't Cluster

ScanMalware's JavaScript analysis engine monitors runtime behavior during scans -- tracking API calls like eval(), setTimeout(), fetch(), and DOM manipulation. Until now, we hashed this behavioral data with SHA-256 to create a composite fingerprint. The idea was simple: same behavior, same hash.

In practice, it didn't work. SHA-256 is an exact-match algorithm: a single bit difference produces a completely different hash. Runtime profiling data includes non-deterministic values like CPU sampling counts that vary between executions. The result: 95% of behavioral hashes were completely unique, matching only a single scan. For critical and high-risk scans, the ratio was essentially 1:1 -- every malicious scan had its own unique fingerprint.

The "Find Similar" feature on scan results was returning zero matches for nearly every scan.

Two New Algorithms

We've added two new fingerprinting algorithms that complement the existing composite hash, giving us three layers of behavioral matching with different trade-offs.

Stable Behavioral Signatures

Instead of hashing exact API call counts, we categorize behavior across 21 dimensions, grouped into five categories:

API usage buckets (7 dimensions) -- each monitored API is bucketed by call count:

CountBucket
0none
1-3low
4-10med
11-50high
51-200vhigh
201+extreme

This applies to eval, Function(), setTimeout, setInterval, fetch, document.write, and DOM manipulation (innerHTML/outerHTML).

Risk flags (4 dimensions) -- boolean yes/no for whether eval, Function constructor, DOM manipulation, or document.write are used at all.

Complexity and volume (2 dimensions) -- function count bucketed into six tiers (tiny through huge), plus total API call volume.

Timing (2 dimensions) -- the execution pattern (periodic/burst/random) and timing speed (rapid/fast/moderate/slow) based on average intervals between API calls.

Call pattern shape (3 dimensions) -- how the frequency distribution looks across the top functions:

  • Shape: flat (many functions called equally), gradual, steep, or spike (one dominant function)
  • Depth: how many functions are active (shallow/mid/deep)
  • Spread: how much the tail matters (wide/tapered/narrow/point)

Sequence topology (3 dimensions) -- the API call graph structure: graph complexity (simple/moderate/complex), whether self-loops exist (e.g. setTimeout calling itself repeatedly), and the most common API transition (e.g. setTimeout->setTimeout).

The result is a 21-dimension signature like:

ev:vhigh|fc:none|st:extreme|si:high|ft:high|dw:none|dm:high|
E:y|F:n|D:y|W:n|cx:huge|vol:vhigh|tp:burst|ts:fast|
ps:gradual|pd:deep|pw:tapered|sq:complex|sl:y|tt:setTimeout->setTimeout

This signature is fully deterministic -- the same behavioral profile always produces the same signature, regardless of exact call counts or profiler sampling variation. A script calling setTimeout 150 times and one calling it 200 times both map to st:vhigh.

With 21 dimensions, the theoretical signature space is approximately 43 trillion unique values -- specific enough to be meaningful even across millions of scans, while still clustering truly similar behavior together. The SHA-256 hash of this string enables fast exact-match lookups across our database.

Behavior Vectors

For continuous similarity scoring rather than categorical matching, we generate a 32-byte feature vector encoding:

  • API usage intensity (9 dimensions): Log-scaled counts for each monitored API (eval, Function, setTimeout, setInterval, document.write, document.writeln, fetch, innerHTML, outerHTML)
  • Code complexity (1 dimension): Log-scaled function count
  • Risk flags (3 dimensions): Binary indicators for eval, Function constructor, and DOM manipulation
  • Call pattern (10 dimensions): Normalized frequency distribution of the top 10 most-called functions
  • Timing classification (1 dimension): Periodic, burst, or random execution pattern
  • Sequence topology (5 dimensions): Normalized frequencies of the most common API call transitions
  • Reserved (3 dimensions): For future use

The vector is stored as a 64-character hex string. Similarity between two scans is computed using L1 (Manhattan) distance -- the sum of absolute differences across all 32 bytes. Identical behavior scores 1.0; maximally different behavior scores 0.0.

This enables questions like "which scans behave most similarly to this one?" without requiring exact category matches.

Early Results

We now have three complementary algorithms running side by side. After processing ~600 scans with the 21-dimension signatures:

MethodUnique %What it tells you
Composite Hash (SHA-256)~98%Exact behavioral match -- almost every scan is unique
Behavior Vector (32-byte L1)~56%Continuous similarity -- fine-grained grouping
Stable Signature (21 dimensions)~47%Categorical match -- meaningful behavioral clusters

The composite hash clusters almost nothing. The stable signature groups over half of all scans into meaningful clusters, while the behavior vector provides finer continuous similarity scoring between them.

Precise Clustering Without False Collisions

The expanded 21-dimension signature solves the key problem with the earlier 9-dimension version: outside the one large "minimal pages" cluster (sites with no significant JavaScript activity), every active site gets its own unique signature. The additional dimensions -- call pattern shape, sequence topology, timing speed, and top API transition -- prevent unrelated complex sites from colliding.

Some notable signature patterns from production scans:

  • tt:setTimeout->setTimeout -- timer self-loops, the most common pattern in active JavaScript. Seen across ad frameworks, SPAs, and analytics code.
  • tt:Function->Function -- Function constructor self-loops. Characteristic of bot-detection systems (e.g. validate.perfdrive.com).
  • tt:eval->setTimeout -- eval chaining into timers. A suspicious pattern worth investigating, seen on load.kisskh.co.
  • ps:flat|sq:complex|sl:y -- many functions called with equal frequency in a complex self-looping call graph. Typical of large web applications like weathernews.jp.

Deterministic Across Repeat Scans

To test consistency, we scanned the same sites multiple times:

  • wikipedia.org (2 scans): Composite hash differed. Stable signature and behavior vector were both identical -- perfect 1.0 similarity.
  • gronnjobb.no (2 scans, JS-heavy SPA): Composite hash differed. Stable signature was identical. Behavior vector similarity: 0.999 (only 4 of 32 bytes differed, by 2-5 each).
  • validate.perfdrive.com (4 scans, bot-detection page): Composite hash always different. Stable signature was identical in 3/4 scans (one flipped at the timing classification boundary). Behavior vector similarity ranged 0.96-0.99.

Vector Similarity Bridges Across Categories

The behavior vector finds relationships that categorical signatures miss. When we queried the vector similarity endpoint for a confirmed phishing site (socialy.pro), it returned two other user-reported malicious sites -- vtg.lol and grabber.cy -- at 0.999 similarity, despite being completely different domains with different attack purposes (phishing, malware, credential grabber). Their composite hashes were all different; only the behavior vector linked them.

The vector also bridges across signature boundaries. A scan with st:high (count 45) and one with st:vhigh (count 55) get different stable signatures, but their behavior vectors may score 0.95+ similarity -- the L1 distance captures this continuity where categorical bucketing cannot.

What We Don't Know Yet

These algorithms cluster by behavioral similarity, but behavioral similarity does not equal malicious intent. A site with eval:yes|timing:burst|complexity:huge could be a sophisticated phishing kit or just a heavily-minified legitimate application. The real value will emerge over time as we cross-reference stable signatures against confirmed indicators -- YARA malware matches, community reports, and Safe Browsing flags. When a confirmed malicious site shares a stable signature with other unanalyzed sites, that's a lead worth investigating.

How to Use It

Every new scan automatically generates both fingerprints. On any scan result page, expand the Scripts tab, then Page-Level JavaScript Analysis to see:

  • The Stable Signature displayed in purple
  • The Behavior Vector hex string
  • A Find Similar link that searches for scans with the same behavioral pattern

For example, here is a stable signature search showing scans that share the st:high|cx:huge|tp:burst|ps:flat|pw:wide|sl:y|tt:setTimeout->setTimeout pattern — heavy timer self-loops with burst timing across complex sites.

Both search methods are available via the API:

Stable signature search (exact categorical match):

GET /api/v1/js-fingerprinter2/search/stable-signature/{hash}

Behavior vector similarity (ranked by L1 distance):

GET /api/v1/js-fingerprinter2/search/behavior-vector/similar/{scan_id}?min_similarity=0.7

The vector similarity endpoint returns scans ranked by a continuous similarity score (0-1), enabling nearest-neighbor search even when stable signatures don't match exactly.

First Application: Microsoft 365 Phishing Detection

Within hours of deploying these signatures, we put them to work on a real problem: detecting Microsoft 365 credential phishing across seven different attack tools (Evilginx, Muraena, Modlishka, EvilnoVNC, static clones, and obfuscated kits).

We discovered that the real login.microsoftonline.com has a highly distinctive behavioral signature: fc:vhigh|ps:spike|sl:y|tt:Function->Function -- heavy Function constructor usage with self-loops, characteristic of Microsoft's MSAL authentication library. No other legitimate site produces this pattern.

This led to a multi-signal phishing scoring system that combines behavioral signatures with other scan data:

  • Title + domain mismatch: Page says "Sign in to your Microsoft account" but the domain isn't Microsoft
  • Form credential harvesting: Login form submits to a non-Microsoft endpoint
  • AiTM bootstrap detection: document.write injection pattern used by transparent proxy tools
  • MSAL proxy detection: Non-Microsoft domain running Microsoft's real authentication library
  • VNC/canvas phishing: Microsoft login title but no HTML form elements (browser-in-the-middle)
  • Fingerprint exfiltration: Hidden form fields collecting browser fingerprints

The scoring system detected all seven phishing tools as malicious or high-risk, with zero false positives against legitimate Azure AD SSO pages. The behavioral signatures provide signals that no other detection method captures -- particularly the AiTM bootstrap pattern (dw:low|W:y) that identifies transparent proxy tools like Evilginx and Modlishka.

What's Next

  • More protected brands: Expanding the baseline library beyond Microsoft to Google, Apple, banking portals, and other commonly phished services
  • Temporal analysis: Tracking how a domain's behavioral signature changes over time to detect compromise
  • Cross-domain clustering: Identifying coordinated campaigns that share behavioral patterns across different domains
  • Automated baseline collection: Continuously scanning legitimate login pages to keep behavioral baselines current

The existing SHA-256 composite hash remains available for exact behavioral matching. The new algorithms complement rather than replace it -- just as our TLSH fuzzy hashing complements exact content hashes for code similarity.


Stable behavioral signatures, behavior vectors, and multi-signal phishing detection are now live on all new scans. Try scanning a URL to see them in action, or explore the API documentation for programmatic access.