Banshee Documentation

About Banshee

AI-assisted OSINT and dorking for continuous discovery, analysis, and learning.

What it does

Generates focused dorks from prompts, research, random modes, and SMART follow-ups.
Queries Google CSE and Brave Search in parallel with dedupe and pacing.
Enriches results with tech detection, Wayback intelligence, and monitoring cycles.
Optionally analyzes documents, HTTP responses, and inline code via gemini-cli.

How it works (in code)

Run() loads config, initializes caches, and wires a Config pipeline.
Targets flow from stdin into the dork pipeline; queries are built from flags and AI modules.
Search helpers fetch results, then dedupe, filter, and enrich them in worker pools.
Analysis modules summarize findings and persist intelligence in ~/.config/banshee.

Findings it can surface

Exposed admin panels, dashboards, debug endpoints, and misconfigured services.
Credentials, tokens, API keys, and secrets in responses or public documents.
Leaked configs, backups, and sensitive files (PDF, DOCX, XLSX, etc.).
Legacy or forgotten endpoints via Wayback and CVE-mapped exposures.
Paste-site leaks tied to your keywords or organization.

Operational boundaries

Banshee is a discovery and analysis tool, not an exploitation framework.
Always validate findings manually and respect legal scope.
Use oos.txt and exclusion flags to stay in-scope.
Need support? Email gorkem@cyberpars.com.

How Banshee Works

From input to intelligence: each run follows a predictable pipeline.

1

Ingest Targets

Domains arrive from stdin (single domain or list). Optional inputs include dork files, dictionaries, extensions, and prompts. Use --find-apex to expand a hostname into tenant apex domains before scanning.

2

Generate Dorks

Use manual queries, AI prompts, random mode, research mode, or SMART follow-ups. Multi-lang and date operators can expand coverage.

3

Search Engines

Google CSE and Brave Search run in parallel with adaptive pacing, key rotation, and retry logic. Results are de-duplicated immediately.

4

Filter and Enrich

Out-of-scope patterns, exclusion rules, and dedupe logic reduce noise. Tech detection and intelligence layers enrich high-value URLs.

5

Analyze

Optional AI analysis: document review, response analysis, inline JS inspection, and vulnerability reasoning. Findings can be scored.

6

Persist Intelligence

Learn mode writes intelligence JSON and success patterns. Wayback cache and AI cache shorten future runs.

Use Cases

Practical scenarios where Banshee delivers high-signal results.

Exposure discovery

Find admin panels, dashboards, and exposed controls across a target's web surface.

Start with a focused dork and expand with SMART follow ups.
Combine with tech detection to prioritize vulnerable stacks.

echo example.com | banshee -q "inurl:admin intitle:login" -a --tech-detect --smart

Sensitive document hunting

Identify public PDFs and office documents, then analyze for secrets or PII.

Use extensions to focus results and reduce noise.
Filter to only sensitive documents after analysis.

echo example.com | banshee -e pdf,docx,xlsx --analyze-docs --filter-docs --learn

API and endpoint recon

Find exposed APIs, test endpoints, and config leaks with response analysis.

Pair with --dedupe to avoid repeated analysis.
Use targeted dorks like inurl:api or filetype:json.

echo example.com | banshee -q "inurl:api intext:token" --dedupe --analyze-responses

Asset inventory and subdomain expansion

Enumerate subdomains and build an asset inventory.

Enable deep mode for recursive discovery.
Use learn mode to persist and reuse discovered assets.
Works with TLD input for broad mapping (for example, .gov).

echo example.com | banshee -s --deep --learn

Tenant apex expansion

Resolve Microsoft tenant domains and pivot into apex targets tied to a single hostname.

Uses the OpenID configuration to extract the tenant ID.
Expands stdin targets before AI and targeted dorking.

echo login.example.com | banshee --find-apex -ai "find admin portals" --smart

Tech and CVE mapping

Update the CVE database and generate specialized dorks per CVE.

Scope updates with --cve-year and --severity.
Add --nvd-api-key to speed up CVE updates.

banshee --update-cve-db --cve-year 2025 --severity critical --ai-dork-generation

Cloud asset enumeration

Hunt for exposed cloud storage buckets and asset endpoints tied to your target.

Random cloud focus generates provider-specific dorks.
Useful for S3, Azure Blob, GCP Storage, and similar services.

echo example.com | banshee -random cloud -quantity 15 -v

Research mode deep dive

Use AI to research target context and generate higher-signal dorks.

Increase depth when you need broader coverage.
Combine with --multi-lang for international targets.

echo example.com | banshee --research --research-depth 3 --multi-lang --learn

Historical attack surface

Mine Wayback history for hidden paths, old assets, and forgotten endpoints.

Filter by status codes with --wmc.
Use --smart to follow discovered patterns.

echo example.com | banshee --foresee --wmc 200,301,302 --smart

Continuous monitoring

Run repeated scans with diffing, filtering, and optional analysis per cycle.

Use --monitor-time to set the cycle interval (minutes).
Enable --filter-mon to avoid repeating URLs across cycles.

cat domains.txt | banshee --monitor "exposed credentials" --monitor-time 60 --filter-mon --analyze-mon

Leak detection

Scan paste sites for exposed credentials and sensitive strings.

Provide keywords to focus on specific brands or products.
Use AI modes for deeper classification and summarization.

echo example.com | banshee --check-leaks --keywords "company api key"

Effective Usage

Practical guidance for high-signal, low-noise runs.

First pass workflow

Start narrow with a low page count to validate signal.
Expand with AI or SMART once the baseline looks good.
Turn on learn mode to persist findings for the next run.

echo example.com | banshee -q "inurl:admin" --pages 5 --adaptive
echo example.com | banshee -ai "admin panels, dashboards, internal tools" --smart --learn

Scope control

Define exclusions in ~/.config/banshee/oos.txt (wildcards supported).
Use -x or --exclusions for quick skip lists or files.
Add --include-dates when you only want recent results.

Performance and quotas

Limit engines with -engine google or -engine brave when keys are scarce.
Scale with --workers and control pacing with --delay.
Keep delay=0 when using --adaptive so the scheduler can adjust.

AI cost control

Use --dedupe before response analysis to avoid repeated AI calls.
Use --filter-docs to keep only sensitive documents after analysis.
Stable prompts improve cache hits in ~/.config/banshee/.ai_cache.json (24 hour TTL).

Random mode hygiene

Random mode writes used dorks to ~/.config/banshee/.ignore.file.
Override with --ignore-file or bypass with --flush.
Use --quantity to bound the number of generated dorks.

Monitor mode runbook

--monitor builds its own dorks, so do not combine with -q, -ai, or -random.
--filter-mon dedupes across cycles; --analyze-mon enables per-cycle analysis.
--monitor is not compatible with --dedupe (use --filter-mon instead).

cat domains.txt | banshee --monitor "api tokens" --monitor-time 45 --filter-mon --analyze-mon

Analysis constraints

--analyze-responses requires --dedupe.
--inline-code-analysis cannot be combined with doc or response analysis.
--analyze-response-only expects URLs from stdin; --analyze-code-only expects code.

Requirements

What you need for full functionality and optional AI acceleration.

Runtime

Built binary for your OS (Go 1.24.5 is used to build in this repo).
Network access to Google CSE, Brave Search, and optional NVD API.
stdin-based input for targets and URL lists.

AI Features

Node.js + npm installed (required for gemini-cli).
gemini-cli installed (npm package: @google/generative-ai-cli).
Gemini API key (or OAuth auth) for AI generation and analysis.
Optional model override with --model.

API Keys and Setup

Step-by-step instructions for every required service.

Google Custom Search API (required for Google engine)

Create a Google Cloud project.
Enable the Custom Search API for the project.
Create one or more API keys.
Create a Custom Search Engine (CSE) and enable "Search the entire web".
Save one key per line in ~/.config/banshee/keys.txt. Do not add comments.
Add multiple keys to increase quota and let Banshee rotate automatically.
Validate with -engine google and -v for verbose output.

AIzaSyExampleKeyOne
AIzaSyExampleKeyTwo

Banshee uses a built-in CSE ID. If you need a custom CX, update the source constant and rebuild.

Brave Search API (required for Brave engine)

Create a Brave Search API account.
Generate a subscription token.
Save one token per line in ~/.config/banshee/brave-keys.txt. No comments.
Add multiple tokens to increase throughput and let Banshee rotate keys.
Validate with -engine brave and -v for verbose output.

BRAVE_TOKEN_ONE
BRAVE_TOKEN_TWO

Brave free tier is limited to one page per query. Banshee auto limits accordingly.

NVD API Key (optional, for CVE updates)

Request a free API key: https://nvd.nist.gov/developers/request-an-api-key
Save it to ~/.config/banshee/nvd-api-key.txt or pass --nvd-api-key.
Run CVE updates with --update-cve-db (optionally --cve-year and --severity).

YOUR_NVD_API_KEY

Without a key, NVD rate limits will slow CVE updates significantly.

Gemini CLI (required for AI modes)

Install: npm install -g @google/generative-ai-cli
Authenticate with OAuth: gemini-cli auth (or add an API key file).
For API keys, put one key per line in ~/.config/banshee/gemini-api-key.txt.
Lines starting with # are ignored; the first valid key is used.
Leave the file empty if you only use OAuth.

# one key per line, comments allowed
AIzaSyExampleKeyOne
AIzaSyExampleKeyTwo

Banshee passes the key as GOOGLE_AI_API_KEY when invoking gemini-cli. Use --model to override the default model.

Quickstart

Ready-to-run commands that cover common workflows.

Single target dork

Target from stdin + custom query.

echo example.com | banshee -q "inurl:admin" -a --tech-detect

AI prompt

Generate dorks with AI and run them immediately.

echo example.com | banshee -ai "sensitive dashboards and exposed APIs" --smart --suggestions

Random mode

Generate focused dorks without writing a prompt.

echo example.com | banshee -random sqli --quantity 12 --learn

Monitor mode

Continuous scanning with analysis.

cat domains.txt | banshee --monitor "sensitive pdf" --monitor-time 60 --filter-mon --analyze-mon

Document analysis

Analyze docs and keep only sensitive hits.

echo example.com | banshee -e pdf,docx --analyze-docs --filter-docs

Response analysis only

Analyze raw responses from stdin.

echo https://example.com/api | banshee --analyze-response-only

Practical Examples

Command + output pairs to make the flags feel concrete.

Response Analysis

Analyze a single response

Use --analyze-response-only when you already have a URL.

Command

echo https://gateway.example.nz:9443/portal/login | banshee --analyze-response-only -v

Output (sanitized)

[stdin] Processing single domain: https://gateway.example.nz:9443/portal/login
[RESPONSE-ANALYSIS] Analyzing batch 1/1 (1 URLs)
https://gateway.example.nz:9443/portal/login | [RA] - AWS credentials exposed; Slack webhook exposed | Sensitive: AWS_KEY:AKIA...; AWS_SECRET:3x4mple...; SLACK:https://hooks.slack.com/services/T00/EXAMPLE/ABC123

Learning

SMART pattern learning

Combine --smart and --learn for adaptive follow ups.

Command

echo acme.example | banshee --smart --learn -v

Output (abridged)

[SMART] Analyzing 12 successful URLs from past scans...
[SMART] Generated focused dorks from successful URL patterns
[LEARN] Loaded intelligence: 5 previous scans, 18 successful dorks

Monitoring

Monitor a fleet

Schedule repeated scans with --monitor and --filter-mon.

Command

cat domains.txt | banshee --monitor "exposed invoices" --monitor-time 30 --filter-mon

Output (abridged)

[MONITOR] Starting monitor mode (interval 30 minutes)
[MONITOR] Cycle 1/inf: running 3 targets
[MONITOR] New URLs this cycle: 27 (filtered 198)

Document Analysis

Filter sensitive documents

Pair --analyze-docs with --filter-docs to keep only hits.

Command

echo example.com | banshee -e pdf,docx --analyze-docs --filter-docs

Output

Document contains sensitive information: https://docs.example.com/finance/q2/board-report.pdf | Found bank account numbers and internal cost centers

Interactive

Interactive mode

Launch the TUI for guided workflows and quick commands.

Command

banshee --interactive

Output (abridged)

Good evening, operator | IP: 203.0.113.24 | Time: 19:44:12 | Session: SID-1234 | Status: READY
Quick commands: /execute, /monitor, /exit

Outputs

Append results to file

Use -o to append only new unique URLs to a file.

Command

echo example.com | banshee -q "inurl:admin" -a -o results.txt

Output file (new lines only)

https://example.com/admin/login
https://portal.example.com/admin/users

Feature Map

All major features with the exact flags to enable them.

SMART chaining and optimization

Generates follow-up dorks based on discovered assets and patterns.

Flags

--smart --suggestions --no-followup --max-followup --correlation

Learn mode intelligence

Saves successful patterns and boosts future scans.

Flags

--learn --view-intel --export-intel

Research mode (OSINT)

Pre-scan research to generate high-signal dorks.

Flags

--research --research-depth 1-4

Multi-language dorking

Detects target language and generates localized dorks.

Flags

--multi-lang --multi-lang-multiplier

Wayback foresee mode

Uses Wayback intelligence to map architecture and find hidden patterns.

Flags

--foresee --wmc --no-wayback-cache --clear-wayback-cache

Tech detection and CVE mapping

Detects stack, maps CVEs, and can generate CVE-specific dorks.

Flags

--tech-detect --update-cve-db --ai-dork-generation

Document analysis

Downloads documents, extracts text, and uses AI to identify sensitive data.

Flags

--analyze-docs --filter-docs

Response analysis

Captures HTTP responses and analyzes them with AI.

Flags

--dedupe --analyze-responses --analyze-response-only

Inline code analysis

Extracts inline JS from HTML and checks for risky patterns.

Flags

--inline-code-analysis --analyze-code-only

Monitor cycles

Schedules repeated dorking cycles and diffed results.

Flags

--monitor --monitor-time --filter-mon --analyze-mon

Leak detection

Scans paste sites for leaked credentials and secrets.

Flags

--check-leaks --keywords

Rate control and budgets

Adaptive throttling, scoring, and budget optimization for quotas.

Flags

--adaptive --scoring --budget

Analysis Pipeline

Deep analysis options for documents, responses, and inline code.

Document analysis

Supports PDF, DOCX, XLSX, PPTX, and text formats.
Uses AI to classify sensitive findings and store results in sensitive-pdfs.txt.
Use --filter-docs to remove non-sensitive docs from output.

Response analysis

Collects HTTP responses for discovered URLs.
Requires --dedupe to avoid duplicates.
AI summarizes secrets, admin endpoints, and risky patterns.
Use --analyze-response-only for direct URLs from stdin.

Inline code analysis

Extracts inline JS and DOM handlers for XSS risk signals.
Requires AI mode (-ai, -random, --smart, or --learn).
Cannot be combined with --analyze-docs or --analyze-response-only.

Code-only analysis

Pipe code directly to Banshee for AI review.
Use --analyze-code-only to bypass dorking.

cat script.js | banshee --analyze-code-only --model gemini-2.0-flash-exp

Monitoring and Automation

Continuous recon with dedupe, diffing, and AI analysis.

Monitor mode

Provide an intent string and Banshee will continuously run dorks on a schedule.

--monitor "intent" starts the scheduler.
--monitor-time sets cycle interval (minutes).
--filter-mon dedupes URLs across cycles.
--analyze-mon turns on doc + response analysis per cycle.

Monitor examples

cat domains.txt | banshee --monitor "sensitive pdf" --monitor-time 45
cat domains.txt | banshee --monitor "api tokens" --filter-mon --analyze-mon

Monitor mode is compatible with AI, SMART, LEARN, research, and multi-lang flows.

Intelligence and Learning

Persistent intelligence increases accuracy with every run.

Learn mode

Stores successful URLs, detected tech, paths, and patterns.
Improves SMART follow-ups and AI dork generation.
Use --view-intel to inspect intelligence per target.
Use --export-intel to save JSON snapshots.

Wayback foresee

Queries the Wayback Machine for historical URLs.
Extracts subdomains, parameters, file patterns, and admin paths.
Controls cache with --no-wayback-cache or --clear-wayback-cache.
Optional status code filters via --wmc.

Internal Files

What Banshee writes to disk, why it matters, and how each file is used.

INTEL

Learning + intelligence

Persistent signals used by SMART, learn mode, and Wayback.

~/.config/banshee/.intel/<target-hash>.json

Per-target intelligence for learn mode, view, and export.

~/.config/banshee/successful.txt

Seeds SMART pattern learning with confirmed high-value URLs.

~/.config/banshee/sensitive-pdfs.txt

Sensitive document log with timestamps, summaries, and details.

~/.config/banshee/.intel/<domain>_wayback.json

Processed Wayback intelligence for historical patterns.

CACHE

Caches

Performance boosters that reduce repeated requests and AI calls.

~/.config/banshee/.ai_cache.json

AI response cache (24 hour TTL) for prompt reuse.

~/.config/banshee/research-cache.json

OSINT research cache used by --research.

~/.config/banshee/.wayback_cache/<domain_hash>.json

Raw Wayback cache to reduce API load.

~/.config/banshee/.response_cache/<url_hash>.json

Response analysis cache controlled by config TTL.

CTRL

Operational control files

Settings and scope controls to keep runs precise.

~/.config/banshee/.config

Default settings; CLI flags override this file.

~/.config/banshee/oos.txt

Out-of-scope patterns and wildcards.

~/.config/banshee/.ignore.file

Random dork history to prevent duplicates.

Tip

Delete caches to force refresh; recreate key files if removed.

KEYS

API key files

Key stores for search engines and AI tooling.

~/.config/banshee/keys.txt

Google CSE API keys (one per line, no comments).

~/.config/banshee/brave-keys.txt

Brave tokens (one per line, no comments).

~/.config/banshee/gemini-api-key.txt

Gemini keys with comments; first valid key is used.

~/.config/banshee/nvd-api-key.txt

NVD API key for CVE updates.

Configuration

Defaults live in ~/.config/banshee/.config. CLI flags override config values.

Example config snippet

Only key lines shown here. The file is fully commented by default.

engine=both
verbose=false
recursive=false
insecure=false
pages=10
delay=0
workers=5
quantity=10
monitor-time=60
oos-file=~/.config/banshee/oos.txt
model=
simplify=false
multi-lang-multiplier=25
research=false
research-depth=1
learn=false
smart=false
smart-timeout=150
suggestions=false
no-followup=false
max-followup=5
correlation=false
max-correlation=10
waf-bypass=false
save=false
include-dates=false
tech-detect=false
adaptive=true
deep=false
scoring=false
budget=false
flush=false

Outputs and Caches

Where Banshee writes intelligence, analysis, and history.

Result output

Without -o, unique URLs are printed to stdout (sorted).
With -o, Banshee appends only new unique URLs (anew style).
The output file is created if missing; use -v to log preloaded URLs.
Ideal for long-running or repeated scans where you want one growing, de-duplicated file.
--no-colors produces clean output for piping into other tools.
--export-intel writes a JSON snapshot; --view-intel prints to stdout.

Analysis artifacts

Document analysis logs sensitive hits to sensitive-pdfs.txt.
Learn mode stores per-target intelligence and upgrades future scans.
SMART uses successful.txt patterns to generate focused dorks.

See Internal Files for full paths and formats.

Cache behavior

AI responses are cached to reduce repeated prompts (default 24 hour TTL).
Wayback caches reduce API load; TTL is configurable in ~/.config/banshee/.config.
Clear caches when you need a full refresh of results.

Flag Reference

Filter by keyword to find a flag quickly.

Try: monitor, ai, dedupe, wayback, docs

Core

-h, --help Show help

-p, --pages Pages per query

-d, --delay Delay between requests

--workers Parallel worker count

-v, --verbose Verbose logging

-o, --output Write unique URLs to file

-engine both, google, brave

--model AI model override

Search and Scope

-q, --query Custom dork or file of dorks

-e, --extensions Extensions or file

-w, --word Dictionary paths or file

-c, --contents Search file contents

-x, --exclusions Exclude targets

--oos-file Out of scope patterns

-a, --recursive Include subdomains

-s, --subdomains Subdomain enumeration

--find-apex Resolve tenant apex domains

-deep Recursive subdomain discovery

-save Stop pagination on low yield

-include-dates Add date operators

AI and Random

-ai AI prompt for dork generation

-random Generate random dorks (focus)

-quantity Dorks to generate

-ignore-file Ignore dorks from file

-flush Ignore ignore-file

-simplify Simplify AI prompts

SMART and Learn

-smart Context-aware dork chaining

--suggestions Show optimization ideas

--no-followup Disable follow-up dorks

--max-followup Limit follow-up dorks

-correlation Layered correlation analysis

-max-correlation Limit correlation dorks

-learn Persist intelligence

-waf-bypass Obfuscated dorks

Research and Language

-research OSINT research mode

-research-depth Depth 1-4

--multi-lang Target language dorks

--multi-lang-multiplier Language mix percent

Analysis

--dedupe Intelligent deduplication

--analyze-responses AI response analysis

--analyze-response-only Analyze stdin URLs

--analyze-docs Document analysis

--filter-docs Filter safe docs

--inline-code-analysis Inline JS analysis

--analyze-code-only Analyze stdin code

Monitor and Automation

--monitor Monitor intent string

--monitor-time Minutes between cycles

--filter-mon Deduplicate across cycles

--analyze-mon Analyze in monitor mode

Wayback Foresee

--foresee Wayback intelligence

--wmc Status code filter

--no-wayback-cache Skip cache

--auto-cleanup-cache Auto-remove old cache

--clear-wayback-cache Clear cache

CVE Database

--update-cve-db Update CVE database

--cve-year Filter by year

--severity Filter by severity

--cve-results-per-page NVD page size

--ai-dork-generation AI CVE dorks

--nvd-api-key NVD API key

--view-intel View saved intelligence

--export-intel Export intelligence JSON

Network and Proxy

-r, --proxy HTTP or SOCKS proxy

-insecure Skip TLS validation

-adaptive Adaptive delay control

Misc

-tech-detect Wappalyzer fingerprinting

-scoring AI scoring

-budget Predict dork success

-check-leaks Paste site leak scan

-keywords Leak search keywords

--interactive Interactive TUI

FAQ

Answers to common operational questions.

Why am I seeing no results?

Check API keys, ensure targets are provided via stdin, and verify out-of-scope filters are not too strict. Use -v to inspect query behavior.

Do I need both Google and Brave keys?

No. Use one engine at a time with -engine. If both keys are available, Banshee will use both.

Why does --analyze-responses require --dedupe?

Response analysis is expensive. Deduplication prevents repeated analysis of similar endpoints and reduces AI token usage.

How do I speed up scanning?

Increase --workers, reduce --delay (or keep delay=0 with --adaptive), and use --save to stop low-yield pagination.

Where are the caches and intelligence files stored?

Everything lives under ~/.config/banshee. See Internal Files for the full list; caches can be deleted to force a refresh.

How should I use successful.txt?

Add one URL per line (start with http), optionally with notes after the URL. Banshee ignores lines starting with # and uses the file for SMART pattern learning. Clear the file or skip SMART mode if you do not want it applied.

How do I run a TLD mass scan?

Provide a TLD via stdin (for example echo .gov | banshee --tech-detect). Banshee will discover domains and run tech detection.

End User License Agreement (EULA)

Read this carefully before using Banshee. By using the tool, you agree to these terms.

License

Banshee is provided for personal, educational, and authorized security testing. You may install and use it for lawful assessments where you have explicit permission.

Restrictions

You must not use Banshee to target systems without authorization, violate laws, or bypass access controls. Automated scanning must respect rate limits and program scope rules.

Data Responsibility

You are responsible for securing output data, API keys, and any intelligence gathered. Do not store sensitive findings in public or shared locations.

Warranty Disclaimer

Banshee is provided "as is" without warranties of any kind. Results may be incomplete, noisy, or inaccurate.

Limitation of Liability

The authors and organization are not liable for any direct or indirect damages arising from use or misuse of this tool.

Termination

Violation of these terms immediately terminates your license to use Banshee. Continued use after termination is prohibited.

This tool is intended for educational purposes and authorized red teaming only. If you do not agree to these terms, do not use Banshee.