Introducing SMQL: Search 343K+ Scans with 120+ Filters

ScanMalware Research Team
4 min read

SMQL - ScanMalware Query Language search interface with filter categories

Why a Query Language?

ScanMalware processes thousands of URL scans daily, collecting data across dozens of dimensions: TLS certificates, JavaScript behavior, WHOIS registration, technology stacks, IDS alerts, malware patterns, and more. Basic text search gets you started, but real threat hunting requires combining signals.

SMQL (ScanMalware Query Language) lets you write precise queries that cross-reference any combination of our 108 filters and 20 existence checks. Think of it as SQL for security scan data, but with a syntax designed for quick, interactive use.

Quick Start

SMQL queries are built from filters, boolean operators, and modifiers:

field:value                    # Exact match
field:>value                   # Comparison (>, <, >=, <=)
field:value1..value2           # Range
field:*wildcard*               # Wildcard with * and ?
has:feature                    # Existence check
-field:value                   # Negation
filter1 AND filter2            # Boolean AND (implicit between terms)
filter1 OR filter2             # Boolean OR
(filter1 OR filter2) AND ...   # Grouping with parentheses
sort:newest                    # Sort order

Try it now at scanmalware.com/search-advanced.

Real-World Examples

Finding Phishing Campaigns

Search for pages mimicking PayPal that our AI confirmed as scams:

title:*paypal* -domain:paypal.com verdict:CONFIRMED_SCAM

Newly Registered Domains Serving Malware

Combine WHOIS domain age with ClamAV detection:

domain_age:<7 has:malware country:RU

Expired Certificates in the Last Month

Find sites with expired TLS certificates from recent scans:

cert_expired:true submitted:>last30d

Self-Signed Certificates with IDS Alerts

Cross-reference TLS anomalies with network intrusion signatures:

cert_self_signed:true has:ids

Obfuscated JavaScript with High Risk Scores

Hunt for heavily obfuscated scripts that triggered risk scoring:

js_obfuscated:true js_risk_score:>70

WordPress Sites Behind Cloudflare with Bot Protection

technology:WordPress bot_detection:cloudflare country:CN

Filter Categories

SMQL covers 108 filters organized into 11 categories:

CategoryFiltersExamples
Core11url, domain, title, status, submitted, http_status
Network7ip, asn, asn_org, country, city, ip_count
WHOIS/RDAP8registrar, domain_age, nameserver, rir
Security11verdict, ai_risk_score, clamav, rpki, ioc
TLS/Certificates21cert_issuer, cert_expired, key_size, cert_risk, ct_logged
JARM2jarm, jarm_known
Technologies4technology, tech_category, cpe
JavaScript13js_risk, js_eval, malware_family, obfuscation_score
Hashes11tlsh, ssdeep, phash, favicon_hash, js_hash
Tracking4tracker, tracking_id, tracker_category
Content16ocr, bot_detection, ids_category, clearfake_type, bundler

Plus 20 existence checks with the has: prefix: has:malware, has:certificate, has:ids, has:pastejacking, has:clearfake, has:clone, has:webpack, and more.

Comparison and Range Operators

Numeric and date filters support comparison operators:

load_time:>5                   # Pages slower than 5 seconds
key_size:<2048                 # Weak RSA keys
ai_risk_score:>7               # High AI risk scores (scale 0-10)
domain_age:<30                 # Domains younger than 30 days
cert_days:<7                   # Certificates expiring within a week
submitted:>last24h             # Scans from the last 24 hours

Range queries use the .. syntax:

http_status:400..499           # All 4xx client errors
js_risk_score:60..100          # High to critical JS risk
submitted:2025-01..2025-06     # First half of 2025

Boolean Logic

SMQL supports AND, OR, NOT, and parentheses for grouping. Adjacent terms are implicitly AND-ed:

# These are equivalent:
technology:WordPress country:RU
technology:WordPress AND country:RU

# OR requires explicit operator:
technology:WordPress OR technology:Joomla

# Grouping:
(technology:WordPress OR technology:Joomla) AND country:CN

# Negation (two forms):
-domain:google.com
NOT domain:google.com

Performance

SMQL queries execute against PostgreSQL with several optimizations:

  • Parameterized queries prevent SQL injection -- all user input is passed as $N parameters
  • EXISTS subquery merging combines multiple filters on the same table into a single subquery
  • Count capping limits the count query to 10,000 rows for fast pagination on broad queries
  • Redis caching with 10-second TTL and stampede protection reduces database load
  • 15-second timeout with clear error messages if a query is too complex

Every query result includes an API link that opens the raw JSON response, making it easy to integrate SMQL into scripts and automation.

API Access

SMQL is available as a REST API endpoint:

GET /api/v1/search/smql?q=<query>&page=1&limit=20&sort=newest

The response includes paginated results, query timing, and filter metadata:

{
  "query": "technology:WordPress AND country:RU",
  "results": [...],
  "pagination": {
    "total_items": 463,
    "total_pages": 24,
    "exact_count": true
  },
  "query_time_ms": 42.3
}

The filter reference endpoint at /api/v1/search/smql/filters returns all available filters with descriptions, examples, and supported operators -- useful for building autocomplete interfaces.

Try It

Visit scanmalware.com/search-advanced to start querying. The search page includes example queries, an inline filter reference, and a sort dropdown. Click the API link in any result to see the raw JSON.

We're actively expanding SMQL -- upcoming features include Elasticsearch-backed filters for Certificate Transparency data, DNS zone lookups, and reverse DNS searches.