Introduction
During my final year at Reykjavik University, I tackled a problem that many security teams face: how to efficiently map and monitor the attack surface of organizations. The result was AASM (Automated Attack Surface Mapping) - a scalable system that discovers subdomains, endpoints, vulnerabilities, and more through automated scanning.
In this post, I'll walk through the architecture decisions, technical challenges, and lessons learned from building a production-ready security scanning platform.
Technology Stack
| Layer | Technology | Purpose |
|---|---|---|
| Backend API | FastAPI | RESTful API with async support |
| Task Queue | Celery | Distributed task processing |
| Message Broker | Redis | Task queue and caching |
| Database | PostgreSQL (Supabase) | Relational data storage |
| Frontend | React/Next.js | User interface |
| Security Tools | Subfinder, Httpx, Nuclei, Masscan, Nmap, Gowitness | Attack surface scanning |
| Containerization | Docker | Deployment and dependency management |
The Problem
Organizations often don't have full visibility into their external attack surface. New services get deployed, subdomains are created, and infrastructure changes - all without a centralized view of what's exposed to the internet. Manual discovery is time-consuming and quickly becomes outdated.
I needed to build a system that could:
- Automatically discover and map attack surfaces
- Scale to handle multiple concurrent scans
- Process long-running security scans efficiently
- Provide real-time visibility into results
- Store historical data for trend analysis
Architecture Overview
The system follows a distributed architecture with several key components working together to provide scalable, asynchronous attack surface mapping:
FastAPI Backend
I chose FastAPI for the REST API because of its:
- Performance: Built on Starlette and Pydantic, it's one of the fastest Python frameworks
- Type Safety: Automatic validation and serialization with Pydantic models
- Documentation: Auto-generated OpenAPI docs
- Async Support: Native async/await for handling concurrent requests
pythonfrom fastapi import FastAPI, BackgroundTasks from pydantic import BaseModel app = FastAPI(title="AASM API") class ScanRequest(BaseModel): target: str scan_types: list[str] @app.post("/api/scans") async def create_scan(request: ScanRequest, background_tasks: BackgroundTasks): # Queue scan task task = scan_task.delay(request.target, request.scan_types) return {"task_id": task.id, "status": "queued"}
Redis + Celery Task Queue
For handling long-running scans, I implemented a task queue using Redis as the message broker and Celery as the distributed task processor. This architecture allows:
- Asynchronous Processing: Scans run in background workers
- Scalability: Add more workers to handle increased load
- Reliability: Task retries and error handling
- Monitoring: Real-time task status tracking
pythonfrom celery import Celery from celery.signals import task_prerun, task_postrun celery_app = Celery( 'aasm', broker='redis://localhost:6379/0', backend='redis://localhost:6379/1' # Store task results ) # Configure task settings celery_app.conf.update( task_serializer='json', accept_content=['json'], result_expires=3600, task_track_started=True, task_time_limit=3600, # 1 hour hard limit task_soft_time_limit=3300 # 55 min soft limit ) @celery_app.task(bind=True, max_retries=3) def scan_task(self, target, scan_types): """Main scanning task that orchestrates all scan types""" try: # Update task state to show progress self.update_state(state='PROGRESS', meta={'stage': 'initializing'}) results = perform_scan(target, scan_types) store_results(results) return {"status": "completed", "results": results} except ScanTimeout as exc: # Don't retry on timeout return {"status": "failed", "error": "Scan timed out"} except Exception as exc: # Exponential backoff: 60s, 120s, 240s self.retry(exc=exc, countdown=60 * (2 ** self.request.retries))
PostgreSQL Database
I used PostgreSQL (via Supabase) for storing scan results because:
- Relational Data: Complex relationships between targets, scans, and findings
- JSONB Support: Flexible storage for varying scan result formats
- Full-Text Search: Quick searching through findings
- Performance: Efficient indexing for large datasets
Database Schema Design
The schema is designed to efficiently handle complex relationships while maintaining query performance:
sql-- Organizations/Targets CREATE TABLE organizations ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name VARCHAR(255) NOT NULL, domain VARCHAR(255) UNIQUE NOT NULL, created_at TIMESTAMP DEFAULT NOW() ); -- Scans CREATE TABLE scans ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), organization_id UUID REFERENCES organizations(id) ON DELETE CASCADE, status VARCHAR(50) NOT NULL, -- queued, running, completed, failed scan_types TEXT[] NOT NULL, started_at TIMESTAMP DEFAULT NOW(), completed_at TIMESTAMP, error_message TEXT, task_id VARCHAR(255) UNIQUE -- Celery task ID ); -- Subdomains discovered CREATE TABLE subdomains ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), scan_id UUID REFERENCES scans(id) ON DELETE CASCADE, organization_id UUID REFERENCES organizations(id) ON DELETE CASCADE, subdomain VARCHAR(255) NOT NULL, ip_addresses INET[], http_status INTEGER, title TEXT, technologies TEXT[], discovered_at TIMESTAMP DEFAULT NOW(), UNIQUE(organization_id, subdomain) ); -- Vulnerabilities found CREATE TABLE findings ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), scan_id UUID REFERENCES scans(id) ON DELETE CASCADE, subdomain_id UUID REFERENCES subdomains(id) ON DELETE CASCADE, severity VARCHAR(20) NOT NULL, -- critical, high, medium, low, info title VARCHAR(500) NOT NULL, description TEXT, tool VARCHAR(100), -- nuclei, nmap, custom evidence JSONB, -- Flexible storage for tool-specific data cvss_score DECIMAL(3,1), cve_id VARCHAR(50), discovered_at TIMESTAMP DEFAULT NOW() ); -- Indexes for performance CREATE INDEX idx_scans_org ON scans(organization_id); CREATE INDEX idx_scans_status ON scans(status); CREATE INDEX idx_subdomains_org ON subdomains(organization_id); CREATE INDEX idx_findings_severity ON findings(severity); CREATE INDEX idx_findings_scan ON findings(scan_id); CREATE INDEX idx_findings_evidence ON findings USING GIN (evidence);
Key design decisions:
- UUIDs over integers: Better for distributed systems and prevents enumeration attacks
- JSONB for evidence: Each security tool returns different data structures; JSONB allows flexible storage while still being queryable
- Cascading deletes: When a scan is deleted, all associated findings are automatically removed
- GIN index on JSONB: Enables fast queries on vulnerability evidence fields
- Array types: PostgreSQL native arrays for storing multiple IPs, technologies, and scan types efficiently
Key Technical Decisions
Building a production-ready security scanning platform required careful consideration of architecture, tooling, and reliability. Here are the most important technical decisions and their implementations:
1. Tool Integration
The system integrates multiple industry-standard security tools to provide comprehensive attack surface mapping:
- Subfinder: Subdomain discovery
- Httpx: HTTP probing and metadata collection
- Nuclei: Vulnerability scanning
- Masscan/Nmap: Port scanning
- Gowitness: Screenshot capture
Each tool runs as a subprocess, and I parse their output to extract structured data:
pythonimport subprocess import json def run_subfinder(domain: str) -> list[str]: result = subprocess.run( ['subfinder', '-d', domain, '-json'], capture_output=True, text=True ) subdomains = [] for line in result.stdout.splitlines(): data = json.loads(line) subdomains.append(data['host']) return subdomains
2. Error Handling and Retries
Security tools can fail for various reasons (timeouts, rate limits, network issues). I implemented:
- Exponential backoff for retries
- Partial result storage (don't lose data if one tool fails)
- Comprehensive logging for debugging
3. Real-Time Updates
The frontend uses polling to check task status:
typescriptasync function pollTaskStatus(taskId: string) { const interval = setInterval(async () => { const response = await fetch(`/api/tasks/${taskId}`); const data = await response.json(); if (data.status === 'completed' || data.status === 'failed') { clearInterval(interval); updateUI(data); } }, 2000); }
Challenges & Solutions
Every complex system comes with its own set of challenges. Here's how I tackled the major obstacles during development:
Challenge 1: Managing Tool Dependencies
Problem: Each security tool has different installation requirements and versions.
Solution: I containerized the entire application with Docker, ensuring consistent environments across development and production. Each worker container includes all necessary tools.
Challenge 2: Scan Performance
Problem: Running all scans sequentially was too slow for large targets.
Solution: Implemented parallel execution using Celery groups and chords:
pythonfrom celery import group, chord def full_scan(target): # Run scans in parallel scan_tasks = group( subdomain_scan.s(target), port_scan.s(target), vulnerability_scan.s(target) ) # Aggregate results when all complete callback = aggregate_results.s() chord(scan_tasks)(callback)
Challenge 3: Rate Limiting
Problem: External services and targets may rate limit our scans.
Solution: Implemented configurable delays between requests and respect for robots.txt:
pythonimport time from urllib.robotparser import RobotFileParser def respect_rate_limits(domain: str, delay: int = 1): rp = RobotFileParser() rp.set_url(f"https://{domain}/robots.txt") rp.read() crawl_delay = rp.crawl_delay("*") if crawl_delay: time.sleep(crawl_delay) else: time.sleep(delay)
Performance Metrics & Results
The system was tested against various targets to validate performance and scalability:
Scan Performance Benchmarks
| Target Size | Subdomains Found | Scan Duration | Worker Count | Memory Usage |
|---|---|---|---|---|
| Small (1-10 subdomains) | 8 | 2m 34s | 2 | 512 MB |
| Medium (10-50 subdomains) | 43 | 8m 17s | 4 | 1.2 GB |
| Large (50-200 subdomains) | 187 | 24m 52s | 8 | 3.1 GB |
| Extra Large (200+ subdomains) | 2,143 | 47m 18s | 12 | 5.8 GB |
Tool Execution Times (Average)
codeSubfinder (subdomain discovery): ~45s for 100 subdomains Httpx (HTTP probing): ~2s per subdomain Nuclei (vulnerability scanning): ~15s per subdomain Masscan (port scanning): ~3m for /24 network Gowitness (screenshots): ~5s per URL
Key Achievements
- Discovered 2,000+ subdomains across test targets in controlled environments
- Identified 150+ vulnerabilities (CVEs and misconfigurations) during testing
- Processed concurrent scans with up to 12 parallel workers without performance degradation
- Database query performance:
- Subdomain lookup: <10ms average
- Finding aggregation: <50ms for 1000+ records
- Full-text search: <100ms across 10,000+ findings
- API response times:
- Create scan endpoint: <200ms
- Task status check: <50ms
- Results retrieval: <300ms (with pagination)
Scalability Testing
The system was tested under load to verify horizontal scalability:
- Concurrent scans: Successfully handled 25+ simultaneous scans with 12 Celery workers
- Task throughput: Processed 100+ tasks/minute during peak load
- Redis queue latency: Maintained <10ms even under heavy load
- Database connection pool: Efficiently managed with 20 connections using pgBouncer
Real-World Impact
The project received recognition during thesis defense for its practical approach to solving attack surface management challenges. The system demonstrated how combining modern async frameworks, distributed task queues, and robust database design can create production-ready security automation tools.
Lessons Learned
1. Start Simple, Scale Later
I initially tried to build a complex microservices architecture. This was overkill for the requirements. Starting with a monolithic FastAPI app and adding Celery workers as needed was much more pragmatic.
2. Tool Output Parsing is Fragile
Security tools change their output formats. I learned to:
- Version-lock tool dependencies
- Add extensive validation for parsed data
- Have fallback parsing strategies
3. Async is Powerful but Tricky
FastAPI's async capabilities are great, but mixing sync and async code requires care. I had to ensure all I/O operations (database, Redis, external APIs) used async clients.
4. Monitoring is Essential
For a distributed system, observability is crucial. I added:
- Structured logging with request IDs
- Celery Flower for task monitoring
- Database query logging for performance tuning
Future Enhancements
While the current system is functional and performant, there are several enhancements that would make it production-ready for enterprise use:
Short-term Improvements
-
WebSocket Support for Real-time Updates
- Replace polling with WebSocket connections using FastAPI's native WebSocket support
- Stream scan progress updates directly to the frontend
- Reduce server load and improve user experience with instant notifications
-
Advanced Scheduling System
- Implement cron-style recurring scans using Celery Beat
- Allow users to configure daily/weekly/monthly scan schedules
- Track changes over time to identify new attack surface exposure
-
Alerting & Notifications
- Email/Slack/Discord webhooks for critical severity findings
- Configurable alert rules based on severity, CVE scores, or custom criteria
- Digest reports summarizing scan results
Long-term Enhancements
-
Multi-tenancy & Authentication
- User authentication with JWT tokens
- Role-based access control (admin, analyst, viewer)
- Organization-level isolation for scan data
- API key management for programmatic access
-
Enhanced Reporting
- PDF/HTML export with executive summaries
- Trend analysis showing attack surface changes over time
- Customizable report templates
- Integration with ticketing systems (Jira, Linear)
-
Machine Learning Integration
- False positive reduction using ML models
- Anomaly detection for unusual patterns
- Risk scoring based on historical data
- Predictive analytics for vulnerability trends
Conclusion
Building AASM was an intensive journey into distributed systems, async programming, and security automation. What started as a thesis project evolved into a functional platform that demonstrated how modern Python frameworks can be combined to solve real-world security challenges.
Key Takeaways
- FastAPI + Celery is a powerful combination for building async, task-based systems
- Proper database design (UUIDs, JSONB, indexes) is crucial for scalability
- Containerization simplifies complex tool dependency management
- Starting simple and scaling incrementally beats over-engineering from the start
- Monitoring and observability are essential for distributed systems
The system successfully processed thousands of scans, discovered substantial attack surface data, and proved that automated security scanning can be both efficient and scalable when built with the right architecture.
Resources
- Full Technical Report: Available on Skemman
- Technologies Used: FastAPI, Celery, Redis, PostgreSQL, Docker, Subfinder, Nuclei, Httpx
Questions or feedback? I'm always happy to discuss distributed systems architecture, security automation, or lessons learned from this project. Feel free to reach out!