Introducing SMQL: Search 343K+ Scans with 120+ Filters

ScanMalware Research Team
5 min read

SMQL - ScanMalware Query Language search interface with filter categories

Why a Query Language?

ScanMalware processes thousands of URL scans daily, collecting data across dozens of dimensions: TLS certificates, JavaScript behavior, WHOIS registration, technology stacks, IDS alerts, malware patterns, and more. Basic text search gets you started, but real threat hunting requires combining signals.

SMQL (ScanMalware Query Language) lets you write precise queries that cross-reference any combination of our 120+ filters and 22 existence checks. Think of it as SQL for security scan data, but with a syntax designed for quick, interactive use.

Quick Start

SMQL queries are built from filters, boolean operators, and modifiers:

field:value                    # Exact match
field:>value                   # Comparison (>, <, >=, <=)
field:value1..value2           # Range
field:*wildcard*               # Wildcard with * and ?
has:feature                    # Existence check
-field:value                   # Negation
filter1 AND filter2            # Boolean AND (implicit between terms)
filter1 OR filter2             # Boolean OR
(filter1 OR filter2) AND ...   # Grouping with parentheses
sort:newest                    # Sort order

Try it now at scanmalware.com/search-advanced.

Real-World Examples

Finding Phishing Campaigns

Search for pages mimicking PayPal that our AI confirmed as scams:

title:*paypal* -domain:paypal.com verdict:CONFIRMED_SCAM

Newly Registered Domains with Suspicious Verdicts

Combine WHOIS domain age with security verdicts:

domain_age:<30 verdict:CONFIRMED_SCAM

Expired Certificates in the Last Month

Find sites with expired TLS certificates from recent scans:

cert_expired:true submitted:>last30d

Self-Signed Certificates with IDS Alerts

Cross-reference TLS anomalies with network intrusion signatures:

cert_self_signed:true has:ids

Obfuscated JavaScript with High Risk Scores

Hunt for heavily obfuscated scripts that triggered behavioral risk scoring:

js_obfuscated:true js_risk_score:>70

WordPress Sites Behind Cloudflare with Bot Protection

technology:WordPress bot_detection:cloudflare

Filter Categories

SMQL covers 120+ filters organized into 14 categories:

CategoryFiltersExamples
Core12url, domain, title, status, submitted, http_status
Network7ip, asn, asn_org, country, city, ip_count
WHOIS/RDAP8registrar, domain_age, nameserver, rir
Security11verdict, ai_risk_score, clamav, rpki, ioc
TLS/Certificates21cert_issuer, cert_expired, key_algorithm, key_size, cert_risk, ct_logged
JARM2jarm, jarm_known
Technologies4technology, tech_category, cpe
JavaScript15js_risk, js_eval, malware_family, obfuscation_score
Hashes9phash, favicon_hash, js_hash, ssdeep, tlsh
Tracking4tracker, tracking_id, tracker_category
Content16ocr, bot_detection, ids_category, clearfake_type, bundler
CT5ct_domain, ct_san, ct_hash
DNS2ns, zone_domain
rDNS2rdns, ptr

Plus 22 existence checks with the has: prefix: has:malware, has:certificate, has:ids, has:phishing, has:pastejacking, has:clearfake, has:clone, has:webpack, has:pcap, has:ct, and more.

Comparison and Range Operators

Numeric and date filters support comparison operators:

load_time:>5                   # Pages slower than 5 seconds
key_algorithm:RSA key_size:<2048  # Weak RSA keys
ai_risk_score:>7               # High AI risk scores (scale 0-10)
domain_age:<30                 # Domains younger than 30 days
cert_days:<7                   # Certificates expiring within a week
submitted:>last24h             # Scans from the last 24 hours

Range queries use the .. syntax:

http_status:400..499           # All 4xx client errors
js_risk_score:60..100          # High to critical JS risk
submitted:2025-06..2025-12     # Second half of 2025

Boolean Logic

SMQL supports AND, OR, NOT, and parentheses for grouping. Adjacent terms are implicitly AND-ed:

# These are equivalent:
technology:WordPress country:RU
technology:WordPress AND country:RU

# OR requires explicit operator:
technology:WordPress OR technology:Joomla

# Grouping:
(technology:WordPress OR technology:Joomla) AND country:CN

# Negation (two forms):
-domain:google.com
NOT domain:google.com

Performance

SMQL queries are optimized for speed and safety:

  • Count capping limits result counting to 10,000 rows for fast pagination on broad queries
  • Caching with stampede protection reduces load on repeated queries
  • 15-second timeout with clear error messages if a query is too complex

Every query result includes an API link that opens the raw JSON response, making it easy to integrate SMQL into scripts and automation -- including the ScanMalware CLI.

API Access

SMQL is available as a REST API endpoint:

GET /api/v1/search/smql?q=<query>&page=1&limit=20&sort=newest

The response includes paginated results, query timing, and filter metadata:

{
  "query": "technology:WordPress AND country:RU",
  "results": [...],
  "pagination": {
    "total_items": 463,
    "total_pages": 24,
    "exact_count": true
  },
  "query_time_ms": 42.3
}

The filter reference endpoint at /api/v1/search/smql/filters returns all available filters with descriptions, examples, and supported operators -- useful for building autocomplete interfaces. Full API documentation is available at /api/docs.

Try It

Visit scanmalware.com/search-advanced to start querying. The search page includes example queries, an inline filter reference, and a sort dropdown. Click the API link in any result to see the raw JSON.

We're actively expanding SMQL -- upcoming features include filters for Certificate Transparency data, DNS zone lookups, and reverse DNS searches. To see SMQL in action for real-world threat hunting, check out our analysis of the ShinyHunters phishing kit.