Introducing SMQL: Search 343K+ Scans with 120+ Filters

Why a Query Language?
ScanMalware processes thousands of URL scans daily, collecting data across dozens of dimensions: TLS certificates, JavaScript behavior, WHOIS registration, technology stacks, IDS alerts, malware patterns, and more. Basic text search gets you started, but real threat hunting requires combining signals.
SMQL (ScanMalware Query Language) lets you write precise queries that cross-reference any combination of our 108 filters and 20 existence checks. Think of it as SQL for security scan data, but with a syntax designed for quick, interactive use.
Quick Start
SMQL queries are built from filters, boolean operators, and modifiers:
field:value # Exact match
field:>value # Comparison (>, <, >=, <=)
field:value1..value2 # Range
field:*wildcard* # Wildcard with * and ?
has:feature # Existence check
-field:value # Negation
filter1 AND filter2 # Boolean AND (implicit between terms)
filter1 OR filter2 # Boolean OR
(filter1 OR filter2) AND ... # Grouping with parentheses
sort:newest # Sort order
Try it now at scanmalware.com/search-advanced.
Real-World Examples
Finding Phishing Campaigns
Search for pages mimicking PayPal that our AI confirmed as scams:
title:*paypal* -domain:paypal.com verdict:CONFIRMED_SCAM
Newly Registered Domains Serving Malware
Combine WHOIS domain age with ClamAV detection:
domain_age:<7 has:malware country:RU
Expired Certificates in the Last Month
Find sites with expired TLS certificates from recent scans:
cert_expired:true submitted:>last30d
Self-Signed Certificates with IDS Alerts
Cross-reference TLS anomalies with network intrusion signatures:
cert_self_signed:true has:ids
Obfuscated JavaScript with High Risk Scores
Hunt for heavily obfuscated scripts that triggered risk scoring:
js_obfuscated:true js_risk_score:>70
WordPress Sites Behind Cloudflare with Bot Protection
technology:WordPress bot_detection:cloudflare country:CN
Filter Categories
SMQL covers 108 filters organized into 11 categories:
| Category | Filters | Examples |
|---|---|---|
| Core | 11 | url, domain, title, status, submitted, http_status |
| Network | 7 | ip, asn, asn_org, country, city, ip_count |
| WHOIS/RDAP | 8 | registrar, domain_age, nameserver, rir |
| Security | 11 | verdict, ai_risk_score, clamav, rpki, ioc |
| TLS/Certificates | 21 | cert_issuer, cert_expired, key_size, cert_risk, ct_logged |
| JARM | 2 | jarm, jarm_known |
| Technologies | 4 | technology, tech_category, cpe |
| JavaScript | 13 | js_risk, js_eval, malware_family, obfuscation_score |
| Hashes | 11 | tlsh, ssdeep, phash, favicon_hash, js_hash |
| Tracking | 4 | tracker, tracking_id, tracker_category |
| Content | 16 | ocr, bot_detection, ids_category, clearfake_type, bundler |
Plus 20 existence checks with the has: prefix: has:malware, has:certificate, has:ids, has:pastejacking, has:clearfake, has:clone, has:webpack, and more.
Comparison and Range Operators
Numeric and date filters support comparison operators:
load_time:>5 # Pages slower than 5 seconds
key_size:<2048 # Weak RSA keys
ai_risk_score:>7 # High AI risk scores (scale 0-10)
domain_age:<30 # Domains younger than 30 days
cert_days:<7 # Certificates expiring within a week
submitted:>last24h # Scans from the last 24 hours
Range queries use the .. syntax:
http_status:400..499 # All 4xx client errors
js_risk_score:60..100 # High to critical JS risk
submitted:2025-01..2025-06 # First half of 2025
Boolean Logic
SMQL supports AND, OR, NOT, and parentheses for grouping. Adjacent terms are implicitly AND-ed:
# These are equivalent:
technology:WordPress country:RU
technology:WordPress AND country:RU
# OR requires explicit operator:
technology:WordPress OR technology:Joomla
# Grouping:
(technology:WordPress OR technology:Joomla) AND country:CN
# Negation (two forms):
-domain:google.com
NOT domain:google.com
Performance
SMQL queries execute against PostgreSQL with several optimizations:
- Parameterized queries prevent SQL injection -- all user input is passed as
$Nparameters - EXISTS subquery merging combines multiple filters on the same table into a single subquery
- Count capping limits the count query to 10,000 rows for fast pagination on broad queries
- Redis caching with 10-second TTL and stampede protection reduces database load
- 15-second timeout with clear error messages if a query is too complex
Every query result includes an API link that opens the raw JSON response, making it easy to integrate SMQL into scripts and automation.
API Access
SMQL is available as a REST API endpoint:
GET /api/v1/search/smql?q=<query>&page=1&limit=20&sort=newest
The response includes paginated results, query timing, and filter metadata:
{
"query": "technology:WordPress AND country:RU",
"results": [...],
"pagination": {
"total_items": 463,
"total_pages": 24,
"exact_count": true
},
"query_time_ms": 42.3
}
The filter reference endpoint at /api/v1/search/smql/filters returns all available filters with descriptions, examples, and supported operators -- useful for building autocomplete interfaces.
Try It
Visit scanmalware.com/search-advanced to start querying. The search page includes example queries, an inline filter reference, and a sort dropdown. Click the API link in any result to see the raw JSON.
We're actively expanding SMQL -- upcoming features include Elasticsearch-backed filters for Certificate Transparency data, DNS zone lookups, and reverse DNS searches.