Semantic Malware Intelligence Platform

From raw binary
to threat
intelligence

in minutes.

SemSearch eliminates manual reverse engineering. Critical function identification pinpoints the exact malicious code ◆ like a function 99% similar to WannaCry's encryption logic. Semantic search surfaces code reuse across obfuscated variants, tracks attacker campaigns, generates YARA rules, and extracts every IOC automatically. Binary to full intelligence report in under 7 minutes.

Static Analysis ◆ Live Dynamic Analysis ◆ Coming Soon

PE x86 ◆ x64 automated pipeline Binary to Report < 7 min YARA auto-generation Campaign tracking
15Pipeline Stages
2AI Vector Models
Structural + Semantic
14IOC Types Extracted
12Families Classified
<7 minBinary to Report
Avg. Pipeline Time
5Threat Intel Feeds

How SemSearch Works

Every sample runs through a deterministic multi-stage automated pipeline. No manual steps. No guesswork.

01 ◆ UPLOAD

Submit Your Sample

Upload any PE binary via the REST API or dashboard. SemSearch validates the format, assigns analysis priority, and immediately begins the automated pipeline ◆ results ready in minutes.

PE Instant deduplication Priority triage 5 intel feeds
02 ◆ ANALYZE

multi-stage automated Pipeline

Fifteen automated stages examine every aspect of the binary ◆ critical function identification, semantic code matching, behavioral patterns, evasion techniques, and ML family classification. Across x86, x64, and ARM. No manual steps at any stage.

Critical function ID Semantic similarity AI enrichment ML classification Evasion detection
03 ◆ REPORT

Analyst-Ready Intelligence

Get a full report: composite threat score, malware family attribution, AI-generated function summaries, 14-type IOC extraction, auto-generated YARA rules, function-level similar samples, and campaign attribution. HTML and PDF export included.

Threat scoring IOC extraction YARA generation Campaign detection

Built to automate the hard parts

Every capability is implemented and live. Binary ingestion, disassembly, multi-model AI, family classification, IOC extraction ◆ fully automated. No roadmap items, no vaporware.

Function-Level Similarity Search

Decomposes every binary into individual functions. Finds code reuse across samples even when files are repackaged or obfuscated. Three search strategies: structural, semantic, hybrid.

multi-model search ◆ structural + semantic re-ranking ◆ ANN index

Dual AI Analysis

Structural similarity model + deep semantic code model. Together they catch what either alone misses ◆ obfuscated code, code reuse across compilers, and variant families. Both vectors are stored and searched independently.

Structural model ◆ semantic code model ◆ quantized vector index

AI-powered Function Enrichment

Every critical function gets a natural-language summary, capability tags, and individual risk score. Identifies the exact function responsible for malicious behavior ◆ reducing analyst review time dramatically.

AI inference optimization ◆ semantic search ◆ critical function ID

Automated Family Classification

Proprietary ML classifiers trained on hundreds of structural and behavioral features identify malware families ◆ Emotet, LockBit, Cobalt Strike, and 9 more. Live A/B model comparison keeps accuracy improving continuously.

Ensemble ML classifiers ◆ automated HPO ◆ experiment tracking

IOC Extraction (14 Types)

Automatically pulls IPs, domains, URLs, registry keys, mutex names, file paths, hashes, Bitcoin and Ethereum addresses from every sample. Smart exclusions filter common benign noise.

14 IOC types ◆ multi-source extraction ◆ deduplication

YARA Rule Generation

Auto-generates YARA rules from samples and function clusters. Full CRUD with enable/disable toggles. Export all rules as a single downloadable file with match-count tracking.

Sample + cluster sources ◆ bulk generation ◆ export all

Real-Time Alerts

WebSocket push the moment a critical threat lands. Three rule types: threat score threshold, malware family, and novel malware detection (<5% function match). Distributed broadcast for multi-instance deployments.

High-concurrency WebSocket ◆ heartbeat keepalive ◆ cooldown protection

SIEM Integration COMING SOON

Connectors to push threat intelligence directly into your existing SIEM and SOAR stack — currently in active development.

In active development
SIEM / SOAR
SIEM
In Development
SOAR
In Development
Security Platforms
In Development
Coming Soon

What an analysis looks like

Every sample produces a structured report. Click any section in the sidebar to see what each panel contains.

semsearch.io /analysis/ a3f4b2c1d8e9f047ac62b381de5...
Threat Score
0.94
CRITICAL
Evasion boost applied: +0.25
Signal Breakdown
AI Risk Score
0.96
ML Confidence
0.91
Similarity Score
0.88
Family Confidence
0.87
Evasion Detection
EXTREME
Cobalt Strike
ML confidence: 91.4% ◆ Rule-based: matched 8/12 capabilities
Cobalt Strike
91.4%
Beacon Loader
6.1%
Unknown
2.5%
process-injectionshellcode-loaderanti-debugVMProtect packedreflective-loadingC2-beacon
FUN_00401a40 ◆ ReflectiveLoader_Stage2
CRITICAL anti-debug shellcode-loader
AI Analysis ◆ confidence 0.94

This function implements a reflective PE loader. It resolves kernel32.dll exports by walking the PEB loader list, allocates executable memory via VirtualAllocEx, writes a shellcode payload using WriteProcessMemory, and invokes it through CreateRemoteThread. An IsDebuggerPresent check with a conditional exit path is present ◆ consistent with observed Cobalt Strike beacon staging behavior. The normalization-resistant XOR key rotation before decryption is a known fingerprint of CS 4.x payload loaders.

Threat Score 0.96 Instructions 312 Basic Blocks 28 API Calls VirtualAllocEx, WriteProcessMemory, CreateRemoteThread
IPv4 Addresses
185.220.101.42 C2
194.165.16.77 C2
Domains
update.microsoft-cdn.net SUSPICIOUS
telemetry.windowsupdate.org SUSPICIOUS
Registry Keys
HKCU\Software\Microsoft\Windows\CurrentVersion\Run\svchost32
HKLM\SYSTEM\CurrentControlSet\Services\WinDefend
SHA-256 Hashes
a3f4b2c1d8e9f047ac62b381de5c6f0912b3...
SHA-256FamilyMatchFirst Seen
78de45f9a2c03b...Cobalt Strike94.2%2026-03-15
2c91f30bd507e1...Cobalt Strike87.6%2026-02-28
9a1d72fc4e83b0...Cobalt Strike81.3%2026-01-19
3e8a12bc7f04d9...Beacon Loader67.8%2025-12-04
Auto-generated YARA rule
// Generated by SemSearch ◆ 2026-03-19 ◆ threat_score=0.94

rule SemSearch_CobaltStrike_Loader_FUN_00401a40 {
    meta:
        generated_by  = "SemSearch v1.0"
        sha256        = "a3f4b2c1d8e9f047ac62b381de5c6f09..."
        family        = "CobaltStrike"
        threat_score  = "0.94"
        severity      = "CRITICAL"
        date          = "2026-03-19"

    strings:
        $fn_0  = { 55 8B EC 83 EC 28 56 57 8B 7D 08 6A 00 }
        $fn_1  = { 68 00 30 00 00 6A 40 FF 15 ?? ?? ?? ?? }
        $api_0 = "VirtualAllocEx" ascii nocase
        $api_1 = "WriteProcessMemory" ascii nocase
        $api_2 = "CreateRemoteThread" ascii nocase
        $api_3 = "IsDebuggerPresent" ascii

    condition:
        uint16(0) == 0x5A4D and
        filesize < 2MB and
        2 of ($fn_*) and
        3 of ($api_*)
}

The interactive report preview is best viewed on a desktop browser.

Resize to desktop width to explore all six report panels.

Built by analysts. For analysts.

Depth tooling for every role in the security team ◆ from reverse engineering functions to SOC triage at scale.

Malware Analysts

Eliminate manual reverse engineering. Identify critical malicious functions, spot exact code reuse, and get AI-powered summaries for every dangerous function ◆ with a full call graph.

Full decompilation per function Function call graph ◆ dependency tree Evasion detection across 10 categories Cross-sample code reuse detection

SOC Teams

Real-time alerts, threat scoring, and SIEM-ready integration — purpose-built for high-volume triage where response time matters.

WebSocket push on critical detections SIEM / SOAR integration (in development) Configurable alert rules + cooldowns 0.0◆1.0 composite threat score

Threat Hunters

13 query operators for ad-hoc hunting across all ingested samples and functions. Save, tag, and re-run queries on demand.

13 query operators across all fields 6 built-in hunt templates Saved hunts with run-count tracking Campaign attribution + shared IOCs

Security Researchers

Code reuse detection across the entire corpus. Campaign grouping, YARA generation from clusters, and benign baseline filtering to isolate novel behavior.

Function-level similarity across all samples Automated function clustering Auto-YARA from samples and clusters Benign baseline filtering
In Active Development

Dynamic Analysis ◆ Coming Next

Static analysis is live. Dynamic sandbox analysis is in active development and will add runtime behavioral correlation ◆ connecting what the binary contains with what it does.

Runtime behavioral correlation Sandbox execution traces Network capture analysis Memory dump correlation API call sequence analysis

Intelligence & Classification

Three classification layers, AI-powered function analysis, composite threat scoring, and evasion detection across 10 categories — all running automatically on every sample.

Three-Layer Family Classification

Multi-tier classifiers map behavioral and structural signals to 12 known family profiles. An A/B testing framework and model registry allow live version comparison without downtime. Analyst feedback drives continuous retraining.

12 families supported Rule-based fallback A/B model testing Auto-retraining
AI Function Enrichment

Every critical function gets a natural-language summary, capability tags, risk assessment, and individual threat score. Local-first inference with cloud escalation for high-confidence analysis. Results cached 30 days. Cluster-first optimization reuses existing analysis for functionally identical code.

Local-first AI Cloud escalation 30-day cache Cluster reuse
Composite Threat Scoring

Five signals combined into a single 0.0–1.0 score per sample. Evasion detection adds calibrated score boosts for packed or obfuscated samples. Output: CRITICAL / HIGH / MEDIUM / LOW / MINIMAL with color coding.

5-signal composite CRITICAL ≥ 0.9 Evasion boosts
Evasion Detection (10 Categories)

Detects packing, anti-debug, anti-VM, obfuscation techniques, and 6 more categories. Each finding mapped to MITRE ATT&CK. Packer detection covers 9 named packers via section-name signatures with entropy-based heuristic fallback.

MITRE ATT&CK 9 packer signatures Entropy heuristics

Evasion Categories Detected

01
Packing / Compression
T1027.002
02
Anti-Debug Techniques
T1622
03
Anti-VM / Anti-Sandbox
T1497
04
Code Obfuscation
T1027
05
Process Injection
T1055
06
Reflective Loading
T1620
07
String Encryption
T1027.013
08
Import Table Hiding
T1027.001
09
Timing-Based Evasion
T1497.003
10
Shellcode Staging
T1055.012

Malware Families Classified

EmotetTrickBotRyukLockBitContiQakbotCobalt StrikeDridexAgentTeslaFormBookRemcosAsyncRAT

Outputs & Reports

Every completed analysis produces a structured, exportable report. IOCs, YARA rules, family attribution, function summaries, campaign grouping — all generated automatically, all accessible via API.

IOC Extraction — 14 Types

Automatically extracts indicators across 14 categories from every sample. Sources include function code, API calls, referenced strings, AI summaries, and raw string dumps. Smart exclusions remove known-benign IPs and domains.

IPv4 / IPv6 Domains / URLs Registry keys Mutex names Crypto wallets File hashes Email addresses
YARA Rule Generation

Auto-generates YARA rules from individual samples and function clusters. Full CRUD — create, update, delete, toggle enable/disable. Bulk generation and single-file export for all rules. Match counts tracked per rule.

Sample-based rules Cluster-based rules Bulk export Match tracking
Three Report Templates

Default (full technical — all 11 sections), Executive (high-level briefing for leadership), Technical (deep analyst detail with extended function analysis). Generated as HTML or PDF. Stored in object storage with tracking.

Full technical Executive Deep analyst HTML + PDF export
Campaign Detection

Samples sharing significant function-level code overlap are automatically grouped into campaigns. Each campaign tracks: member samples, shared IOCs, confidence score, shared function count, and first/last seen timestamps.

Automated grouping Shared IOCs Confidence scoring Timeline tracking

Integrations & Alerts

Real-time WebSocket alerting with configurable rules and distributed cooldown protection. SIEM / SOAR connectors are in active development.

SIEM
In Development

Structured alert events with threat score, family, severity, and full context pushed directly into your SIEM environment.

Log Analytics
In Development

Standardized event schema to keep correlation consistent across your security analytics stack.

SOAR
In Development

Automatic incident creation when alert rules fire, with full context from the analysis report.

Real-Time Alert Rules

Three Rule Types

Threshold rules fire when threat score exceeds a configurable value. Family rules fire when a specific malware family is detected. Novelty rules fire when a sample has fewer than 5% function matches with existing samples — flagging truly new malware.

Score threshold Family match Novel malware (<5%)
Four Default Rules Pre-Configured

Critical Threat (≥0.9, 5 min cooldown), High Threat (≥0.7, 15 min cooldown), Ransomware Family (Ryuk/LockBit/Conti, 5 min cooldown), Novel Malware (<5% match, 30 min cooldown). All fully editable.

Pre-configured Cooldown protection WebSocket push

Infrastructure

Self-hosted, fully observable, production-hardened. Docker Compose for development and demo. Kubernetes for production with autoscaling, mTLS, sandboxing, and 4 automated backup systems.

Deployment
Docker & Kubernetes
Docker Compose — dev, prod, and demo profiles Kubernetes manifests with full resource management Event-driven autoscaling on pipeline queue depth 100% self-hosted — no external data sharing
Security
Defense in Depth
Container sandboxing for all analysis workloads mTLS between all services Kubernetes network policies — zero default egress Non-root containers, seccomp profiles Row-level security per organization
Observability
Full Stack Monitoring
Metrics collection with visualization dashboards Centralized log aggregation — 30-day retention OpenTelemetry distributed tracing Custom per-stage pipeline tracer Audit log for all analyst actions
Reliability
Automated Backups
4 independent daily backup jobs Database, vector store, cache, and object storage Read replica for database failover High-multiplexing connection pooler
API
100+ Endpoints
Modular API routing JWT + API key dual authentication 4 roles: admin, analyst, viewer, api_consumer Per-key rate limits and scoped permissions Multi-tenant org isolation
Data Pipeline
Event-Driven Architecture
High-throughput message broker Multi-topic, high-partition message broker Async stage-to-stage processing Dead-letter queues for failure isolation
Filename malware_loader.exe
SHA-256 a3f4b2c1d8e9f047ac62b381de5c6f09◆
File Type PE32 ◆ x64
File Size 412 KB
Submitted 2026-03-19 ◆ 14:32 UTC
Status CRITICAL
Threat Score 0.94 / 1.00
Sample Overview

File metadata, submission context, and pipeline execution timeline.

SHA-256
a3f4b2c1d8e9f047ac62b381de5c6f0912b3e4d7c8a91f2b56e7d80a3c4f19e2
File Type
PE32+ ◆ x86-64
File Size
412,672 bytes ◆ 403 KB
Compiler
MSVC ◆ Rich Header present
Packer / Protector
VMProtect 3.x detected
First Seen
2026-03-15 ◆ MalwareBazaar
Import Count
147 imports ◆ 8 DLLs
Section Count
6 sections ◆ .text / .data / .rsrc
Entropy
7.82 / 8.0 ◆ High (packed)
Pipeline Execution Timeline
14:32:01Z
Sample submitted ◆ hash computed, dedup check passed
14:32:04Z
Threat intel feed lookup ◆ 5 feeds queried, 2 hits
14:32:08Z
Disassembly complete ◆ 47 functions extracted
14:32:19Z
Dual AI Encoding complete ◆ structural + semantic representations generated
14:32:22Z
Similarity search ◆ 6 matches above threshold (47 functions queried)
14:32:25Z
AI enrichment ◆ 12 critical functions summarized
14:32:27Z
Family classification ◆ Cobalt Strike 91.4% confidence
14:32:28Z
IOC extraction ◆ 18 indicators extracted across 7 types
14:32:29Z
YARA rules generated ◆ 3 rules, alert dispatched ◆ CRITICAL
Threat Score

Composite score (0.0◆1.0) combining multiple independent signals. Evasion boost applied.

Composite Score
0.94
CRITICAL
Evasion boost applied: +0.25
Base score: 0.69 ? Final: 0.94
Signal Breakdown
AI Risk Score
0.96
ML Confidence
0.91
Similarity Score
0.88
Family Confidence
0.87
Intel Feed Hits
2 / 5
Evasion Detection
EXTREME
Evasion Techniques Detected
Anti-debug
DETECTED
Anti-VM
DETECTED
Packer / Protector
VMProtect 3.x
Process injection
DETECTED
Reflective loading
DETECTED
API hashing
DETECTED
String encryption
DETECTED
Timing checks
DETECTED
Code signing
NOT FOUND
Family Attribution

ML classification result with confidence breakdown and matched behavioral evidence.

Cobalt Strike
ML confidence: 91.4% ◆ Rule-based: matched 8/12 capabilities ◆ Extended Classifier
Cobalt Strike
91.4%
Beacon Loader
6.1%
Brute Ratel
1.5%
Unknown
1.0%
process-injection shellcode-loader anti-debug VMProtect packed reflective-loading C2-beacon PEB-walk API-hashing
Classification Evidence
PEB loader list traversal to resolve kernel32.dll exports without calling LoadLibrary ◆ canonical Cobalt Strike beacon staging method.
XOR-based key rotation before shellcode decryption matches documented Cobalt Strike 4.x payload staging patterns.
Reflective PE loading sequence: VirtualAllocEx → WriteProcessMemory → CreateRemoteThread ◆ observed in 91.4% confidence match against known CS beacon samples.
Rule-based check: 8 of 12 Cobalt Strike capability signatures matched including C2 beacon structure, sleep masking, and HTTP/S malleable profile indicators.
Structural classification score: 0.914 against Cobalt Strike cluster centroid — highest confidence match across all 12 family profiles.
Function Analysis

47 functions extracted and analyzed. Showing 3 highest-severity functions with AI-generated summaries.

All CRITICAL Anti-Debug Injection Networking Crypto
Showing 3 of 47 functions ◆ 12 flagged CRITICAL ◆ 9 flagged HIGH
FUN_00401a40 ◆ ReflectiveLoader_Stage2
CRITICAL ◆ 0.96 anti-debug shellcode-loader
AI Analysis ◆ confidence 0.96

This function implements a reflective PE loader. It resolves kernel32.dll exports by walking the PEB loader list, allocates executable memory, writes a shellcode payload, and invokes it through a remote thread. An IsDebuggerPresent check with a conditional exit path is present ◆ consistent with observed Cobalt Strike beacon staging behavior. The normalization-resistant XOR key rotation before decryption is a known fingerprint of CS 4.x payload loaders.

00401a40pushrbp 00401a41movrbp, rsp 00401a44subrsp, 0x28 00401a48callIsDebuggerPresent; anti-debug check 00401a4dtesteax, eax 00401a4fjnz00401b80; exit if debugger 00401a55callFUN_0040f200; PEB walk 00401a5acallVirtualAllocEx 00401a5fcallWriteProcessMemory 00401a64callCreateRemoteThread
Threat Score 0.96 Instructions 312 API Calls VirtualAllocEx, WriteProcessMemory, CreateRemoteThread Similarity 94.2% match ◆ CS sample 78de45f9
FUN_00407b10 ◆ DecryptPayload_XOR
CRITICAL ◆ 0.91 crypto
AI Analysis ◆ confidence 0.91

XOR decryption loop with a 4-byte rotating key applied to a statically embedded payload blob. The key rotation algorithm is structurally identical to known Cobalt Strike 4.x staging implementations. Entropy of the encrypted blob is 7.72 ◆ consistent with compressed-then-encrypted shellcode.

Threat Score 0.91 Instructions 88 Payload Size 48,640 bytes Key Length 4 bytes rotating
FUN_00403c20 ◆ C2_Beacon_HTTP
CRITICAL ◆ 0.88 networking c2
AI Analysis ◆ confidence 0.88

HTTP C2 beacon implementation. Constructs HTTP GET requests with a hardcoded User-Agent matching Cobalt Strike malleable C2 profile defaults. IP addresses 185.220.101.42 and 194.165.16.77 are statically embedded and used as primary/fallback C2 servers. Sleep interval extracted: 60000ms with 20% jitter.

Threat Score 0.88 Instructions 241 API Calls WinHttpOpen, WinHttpConnect, WinHttpSendRequest Sleep 60s ◆ 20% jitter
IOC Extraction

18 indicators extracted across 7 types. Malicious/suspicious classification applied automatically.

2IP Addresses
2Domains
4Registry Keys
3Mutexes
4File Paths
2URLs
1Hash
IPv4 Addresses 2
185.220.101.42 C2 SERVER
194.165.16.77 C2 SERVER
Domains 2
update.microsoft-cdn.net SUSPICIOUS
telemetry.windowsupdate.org SUSPICIOUS
URLs 2
hxxp://185.220.101.42/content/uploads/jquery.min.js DEFANGED
hxxp://194.165.16.77/api/v2/telemetry DEFANGED
Registry Keys 4
HKCU\Software\Microsoft\Windows\CurrentVersion\Run\svchost32 PERSISTENCE
HKLM\SYSTEM\CurrentControlSet\Services\WinDefend TAMPER
HKCU\Software\Classes\ms-settings\shell\open\command
HKLM\SOFTWARE\Microsoft\Windows Defender\Exclusions\Paths
Mutexes 3
Global\CobaltStrike-{3a9f17bc} KNOWN CS
Local\MSIMutex_7f3b2e
Global\WindowsInstaller_Updater
File Paths 4
%APPDATA%\Microsoft\Windows\svchost32.exe DROP
%TEMP%\~tmp4af7.dat
C:\Windows\System32\wbem\wmic.exe
C:\Windows\SysWOW64\cmd.exe
SHA-256 Hashes 1
a3f4b2c1d8e9f047ac62b381de5c6f0912b3e4d7c8a91f2b56e7d80a3c4f19e2 SELF
Similar Samples

6 samples with significant function-level overlap found in the corpus. Match method shown per result.

Top Match
94.2%
Functions Matched
34 / 47
Avg Match (top 6)
80.1%
Campaign Match
Yes
SHA-256 Family Match Method First Seen
78de45f9a2c03b91◆ Cobalt Strike 94.2% Hybrid 2026-03-15
2c91f30bd507e1a4◆ Cobalt Strike 87.6% Hybrid 2026-02-28
9a1d72fc4e83b021◆ Cobalt Strike 81.3% Semantic 2026-01-19
3e8a12bc7f04d9e6◆ Beacon Loader 74.2% Structural 2025-12-04
b7f23e10c8a49d30◆ Cobalt Strike 68.8% Semantic 2025-11-12
47ca910e2db63f51◆ Brute Ratel 61.4% Structural 2025-10-30
YARA Rules

3 rules auto-generated from this sample ◆ function-level byte signatures, API call patterns, and family string indicators.

3
Rules Generated
2
From Functions
1
From Sample
SemSearch_CobaltStrike_Loader_FUN_00401a40
Function source threat_score: 0.96
// Generated by SemSearch ◆ 2026-03-19 ◆ function: FUN_00401a40

rule SemSearch_CobaltStrike_Loader_FUN_00401a40 {
    meta:
        generated_by  = "SemSearch v1.0"
        sha256        = "a3f4b2c1d8e9f047ac62b381de5c6f09..."
        family        = "CobaltStrike"
        threat_score  = "0.96"
        severity      = "CRITICAL"
        date          = "2026-03-19"

    strings:
        $fn_0  = { 55 48 89 E5 48 83 EC 28 48 89 7D F8 }
        $fn_1  = { FF 15 ?? ?? ?? ?? 85 C0 75 ?? 48 8D }
        $api_0 = "VirtualAllocEx" ascii nocase
        $api_1 = "WriteProcessMemory" ascii nocase
        $api_2 = "CreateRemoteThread" ascii nocase
        $api_3 = "IsDebuggerPresent" ascii

    condition:
        uint16(0) == 0x5A4D and
        filesize < 2MB and
        2 of ($fn_*) and
        3 of ($api_*)
}
SemSearch_CobaltStrike_C2Beacon_FUN_00403c20
Function source threat_score: 0.88
// Generated by SemSearch ◆ 2026-03-19 ◆ function: FUN_00403c20

rule SemSearch_CobaltStrike_C2Beacon_FUN_00403c20 {
    meta:
        generated_by  = "SemSearch v1.0"
        family        = "CobaltStrike"
        threat_score  = "0.88"
        date          = "2026-03-19"

    strings:
        $c2_0  = "185.220.101.42" ascii
        $c2_1  = "194.165.16.77"  ascii
        $ua    = "Mozilla/5.0 (compatible; MSIE 9.0" ascii
        $api_0 = "WinHttpOpen" ascii
        $api_1 = "WinHttpConnect" ascii

    condition:
        uint16(0) == 0x5A4D and
        1 of ($c2_*) and
        all of ($api_*)
}
SemSearch_CobaltStrike_Sample_a3f4b2c1
Sample source threat_score: 0.94
// Generated by SemSearch ◆ 2026-03-19 ◆ sample-level rule

rule SemSearch_CobaltStrike_Sample_a3f4b2c1 {
    meta:
        generated_by  = "SemSearch v1.0"
        sha256        = "a3f4b2c1d8e9f047ac62b381de5c6f09..."
        family        = "CobaltStrike"
        threat_score  = "0.94"
        severity      = "CRITICAL"

    strings:
        $mutex   = "Global\\CobaltStrike-{" ascii
        $path    = "%APPDATA%\\Microsoft\\Windows\\svchost32" ascii wide
        $reg     = "CurrentVersion\\Run\\svchost32" ascii wide
        $fn_load = { 55 48 89 E5 48 83 EC 28 }

    condition:
        uint16(0) == 0x5A4D and
        filesize < 2MB and
        2 of ($mutex, $path, $reg) and
        $fn_load
}
Campaign Attribution

This sample has been grouped into an active campaign based on shared code, infrastructure, and behavioral overlap.

Campaign Identification
Operation SilverBeacon
Campaign ID: CAMP-2026-0041 ◆ Cobalt Strike cluster
14
Samples
3
C2 IPs
78%
Code Overlap
91d
Active Span
Campaign Timeline
2025-12-18
First sample ◆ initial loader (47ca910e) detected via MalwareBazaar submission. Low similarity, no family match at time of detection.
2026-01-19
Cluster formed ◆ 9a1d72fc identified with 81.3% function-level overlap to 47ca910e. Campaign attribution triggered at 3-sample threshold.
2026-0203
Active expansion ◆ 8 additional samples added. Shared C2 infrastructure confirmed. Cobalt Strike family attribution stabilized at >90%.
2026-03-19
This sample ◆ a3f4b2c1 added to campaign. 94.2% match to cluster centroid. 3 shared IOCs confirmed.
Shared Indicators Across Campaign
185.220.101.42 ◆ C2 (13/14 samples)
194.165.16.77 ◆ C2 (9/14 samples)
Global\CobaltStrike-{* mutex pattern
svchost32.exe drop path pattern
XOR key rotation algorithm (structural)
PEB-walk export resolver (semantic)

Malware analysis, end to end, without the manual work

SemSearch is a self-hosted malware intelligence platform that takes raw binaries — PE files — and runs them through a fully automated multi-stage analysis pipeline. The output is a structured, searchable intelligence report: threat score, malware family attribution, function-level AI summaries, extracted IOCs, auto-generated YARA rules, similar sample matches, and campaign attribution.

The core idea is function-level analysis. Rather than treating a binary as a single opaque blob, SemSearch decomposes it into its individual functions, analyzes each one independently, and builds intelligence bottom-up. A function that implements a reflective PE loader is the same function regardless of which binary it appears in — even if that binary has been recompiled, repacked, or had its imports obfuscated. That invariance is the foundation SemSearch is built on.

Two independent AI models run on every function: a structural model that captures the shape of the code, and a semantic model that captures what the code means. The results are combined to power similarity search, family classification, and campaign grouping — using methods that hold up against obfuscation and evasion in ways that signature-based approaches don't.

"The code that makes malware work doesn't change when the hash does. We analyze the code."

Seven things that break traditional malware analysis

The security industry has built most of its detection infrastructure around artifacts that are trivially changed. Threat actors know this, and they exploit it constantly. SemSearch was designed around seven specific failure modes in traditional analysis:

01
Hash-only detection fails after one recompile

A file hash identifies an exact byte sequence. Recompile the same source with a different compiler flag and the hash changes completely — even though the logic, behavior, and threat are identical.

02
Signatures don't survive obfuscation

Static signatures match against byte patterns. Packers, encryption, and code mutation break these patterns trivially. Threat actors routinely cycle packing configurations specifically to burn signature hits.

03
Manual reverse engineering doesn't scale

A skilled analyst can triage tens of samples per day. Modern threat operations produce thousands. The gap between analyst capacity and sample volume is structural — it cannot be closed by hiring more analysts.

04
No systematic code reuse detection

Malware authors reuse proven code heavily — loaders, C2 implementations, crypters, anti-debug stubs. Without function-level comparison across the corpus, this reuse is invisible to most analysis platforms.

05
Family attribution is inconsistent

Different analysts and different tools disagree on malware family labels. SemSearch uses ML classifiers trained on behavioral and structural features to produce reproducible, confidence-scored classifications.

06
Campaign connection requires manual correlation

Connecting related samples across a campaign — same actor, same tooling, different hashes and file names — requires hours of manual correlation that most teams can't afford on every incident.

07
Intelligence lives in analysts' heads

Hard-won knowledge from a reverse engineering session — what a function does, what family it belongs to, what campaign it connects to — rarely persists in a queryable form. SemSearch stores and indexes everything.

Built for the work, not for the pitch deck

SemSearch is not enterprise security software. There is no enterprise sales team, no SOC2-compliance-first roadmap, no feature list designed to check procurement checkboxes. It is a tool built by people who do malware analysis, for the people who do malware analysis.

That means a few specific things. Self-hosted first — your samples don't leave your infrastructure. API-first — every capability is accessible programmatically, making it integrable into any existing pipeline. Depth over breadth — we would rather do function-level analysis extremely well than offer a shallow version of everything. And honest documentation — no features listed that don't exist, no "coming soon" treated as shipped.

Your data stays yours

Fully self-hosted — Docker Compose or Kubernetes. Your samples, results, and intelligence never leave your environment. No cloud telemetry, no sample sharing, no vendor lock-in to a hosted service.

API-first, automation-ready

100+ REST API endpoints. Every submission, result, search, alert, and export operation is available programmatically. Built to drop into existing pipelines — no GUI required for any workflow.

Function-level depth, always

Every binary is disassembled down to individual functions. Every function is AI-analyzed, AI-enriched, cross-searched, and threat-scored. Analysis at this granularity is the non-negotiable core of what SemSearch does.

No vaporware

Every feature in the documentation is implemented, tested, and ships with the platform. Features in active development are clearly listed as coming soon — nothing else is implied to exist. The report you see on this site is what the platform actually produces.

What's coming next

Static analysis — the full automated pipeline from binary ingestion through family classification, IOC extraction, YARA generation, and campaign attribution — is live. The next phase is dynamic analysis: correlating what the binary contains statically with what it actually does at runtime.

In Active Development
Dynamic Analysis Sandbox

Runtime behavioral correlation synchronized with existing static analysis results. Every behavioral event — API call, network connection, file operation, registry modification — linked back to the function and code region that produced it.

Runtime behavioral event capture correlated to static function-level analysis
API call sequence analysis and behavioral pattern matching
Network capture with automatic IOC extraction from live traffic
Memory dump ingestion and correlation with disassembly results
Static + dynamic threat score fusion for higher-confidence attribution