TechnicalMarch 12, 202512 min read

Building AASM: Automated Attack Surface Mapping

How I built an automated attack surface mapping system using FastAPI, Redis, Celery, and PostgreSQL for my thesis project at Reykjavik University.

PythonFastAPIRedisCeleryPostgreSQLSecurityAutomation

Introduction

During my final year at Reykjavik University, I tackled a problem that many security teams face: how to efficiently map and monitor the attack surface of organizations. The result was AASM (Automated Attack Surface Mapping) - a scalable system that discovers subdomains, endpoints, vulnerabilities, and more through automated scanning.

In this post, I'll walk through the architecture decisions, technical challenges, and lessons learned from building a production-ready security scanning platform.

Technology Stack

LayerTechnologyPurpose
Backend APIFastAPIRESTful API with async support
Task QueueCeleryDistributed task processing
Message BrokerRedisTask queue and caching
DatabasePostgreSQL (Supabase)Relational data storage
FrontendReact/Next.jsUser interface
Security ToolsSubfinder, Httpx, Nuclei, Masscan, Nmap, GowitnessAttack surface scanning
ContainerizationDockerDeployment and dependency management

The Problem

Organizations often don't have full visibility into their external attack surface. New services get deployed, subdomains are created, and infrastructure changes - all without a centralized view of what's exposed to the internet. Manual discovery is time-consuming and quickly becomes outdated.

I needed to build a system that could:

  • Automatically discover and map attack surfaces
  • Scale to handle multiple concurrent scans
  • Process long-running security scans efficiently
  • Provide real-time visibility into results
  • Store historical data for trend analysis

Architecture Overview

The system follows a distributed architecture with several key components working together to provide scalable, asynchronous attack surface mapping:

FastAPI Backend

I chose FastAPI for the REST API because of its:

  • Performance: Built on Starlette and Pydantic, it's one of the fastest Python frameworks
  • Type Safety: Automatic validation and serialization with Pydantic models
  • Documentation: Auto-generated OpenAPI docs
  • Async Support: Native async/await for handling concurrent requests
python
from fastapi import FastAPI, BackgroundTasks
from pydantic import BaseModel

app = FastAPI(title="AASM API")

class ScanRequest(BaseModel):
    target: str
    scan_types: list[str]

@app.post("/api/scans")
async def create_scan(request: ScanRequest, background_tasks: BackgroundTasks):
    # Queue scan task
    task = scan_task.delay(request.target, request.scan_types)
    return {"task_id": task.id, "status": "queued"}

Redis + Celery Task Queue

For handling long-running scans, I implemented a task queue using Redis as the message broker and Celery as the distributed task processor. This architecture allows:

  • Asynchronous Processing: Scans run in background workers
  • Scalability: Add more workers to handle increased load
  • Reliability: Task retries and error handling
  • Monitoring: Real-time task status tracking
python
from celery import Celery
from celery.signals import task_prerun, task_postrun

celery_app = Celery(
    'aasm',
    broker='redis://localhost:6379/0',
    backend='redis://localhost:6379/1'  # Store task results
)

# Configure task settings
celery_app.conf.update(
    task_serializer='json',
    accept_content=['json'],
    result_expires=3600,
    task_track_started=True,
    task_time_limit=3600,  # 1 hour hard limit
    task_soft_time_limit=3300  # 55 min soft limit
)

@celery_app.task(bind=True, max_retries=3)
def scan_task(self, target, scan_types):
    """Main scanning task that orchestrates all scan types"""
    try:
        # Update task state to show progress
        self.update_state(state='PROGRESS', meta={'stage': 'initializing'})

        results = perform_scan(target, scan_types)
        store_results(results)

        return {"status": "completed", "results": results}
    except ScanTimeout as exc:
        # Don't retry on timeout
        return {"status": "failed", "error": "Scan timed out"}
    except Exception as exc:
        # Exponential backoff: 60s, 120s, 240s
        self.retry(exc=exc, countdown=60 * (2 ** self.request.retries))

PostgreSQL Database

I used PostgreSQL (via Supabase) for storing scan results because:

  • Relational Data: Complex relationships between targets, scans, and findings
  • JSONB Support: Flexible storage for varying scan result formats
  • Full-Text Search: Quick searching through findings
  • Performance: Efficient indexing for large datasets

Database Schema Design

The schema is designed to efficiently handle complex relationships while maintaining query performance:

sql
-- Organizations/Targets
CREATE TABLE organizations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) NOT NULL,
    domain VARCHAR(255) UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Scans
CREATE TABLE scans (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    organization_id UUID REFERENCES organizations(id) ON DELETE CASCADE,
    status VARCHAR(50) NOT NULL, -- queued, running, completed, failed
    scan_types TEXT[] NOT NULL,
    started_at TIMESTAMP DEFAULT NOW(),
    completed_at TIMESTAMP,
    error_message TEXT,
    task_id VARCHAR(255) UNIQUE -- Celery task ID
);

-- Subdomains discovered
CREATE TABLE subdomains (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    scan_id UUID REFERENCES scans(id) ON DELETE CASCADE,
    organization_id UUID REFERENCES organizations(id) ON DELETE CASCADE,
    subdomain VARCHAR(255) NOT NULL,
    ip_addresses INET[],
    http_status INTEGER,
    title TEXT,
    technologies TEXT[],
    discovered_at TIMESTAMP DEFAULT NOW(),
    UNIQUE(organization_id, subdomain)
);

-- Vulnerabilities found
CREATE TABLE findings (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    scan_id UUID REFERENCES scans(id) ON DELETE CASCADE,
    subdomain_id UUID REFERENCES subdomains(id) ON DELETE CASCADE,
    severity VARCHAR(20) NOT NULL, -- critical, high, medium, low, info
    title VARCHAR(500) NOT NULL,
    description TEXT,
    tool VARCHAR(100), -- nuclei, nmap, custom
    evidence JSONB, -- Flexible storage for tool-specific data
    cvss_score DECIMAL(3,1),
    cve_id VARCHAR(50),
    discovered_at TIMESTAMP DEFAULT NOW()
);

-- Indexes for performance
CREATE INDEX idx_scans_org ON scans(organization_id);
CREATE INDEX idx_scans_status ON scans(status);
CREATE INDEX idx_subdomains_org ON subdomains(organization_id);
CREATE INDEX idx_findings_severity ON findings(severity);
CREATE INDEX idx_findings_scan ON findings(scan_id);
CREATE INDEX idx_findings_evidence ON findings USING GIN (evidence);

Key design decisions:

  • UUIDs over integers: Better for distributed systems and prevents enumeration attacks
  • JSONB for evidence: Each security tool returns different data structures; JSONB allows flexible storage while still being queryable
  • Cascading deletes: When a scan is deleted, all associated findings are automatically removed
  • GIN index on JSONB: Enables fast queries on vulnerability evidence fields
  • Array types: PostgreSQL native arrays for storing multiple IPs, technologies, and scan types efficiently

Key Technical Decisions

Building a production-ready security scanning platform required careful consideration of architecture, tooling, and reliability. Here are the most important technical decisions and their implementations:

1. Tool Integration

The system integrates multiple industry-standard security tools to provide comprehensive attack surface mapping:

  • Subfinder: Subdomain discovery
  • Httpx: HTTP probing and metadata collection
  • Nuclei: Vulnerability scanning
  • Masscan/Nmap: Port scanning
  • Gowitness: Screenshot capture

Each tool runs as a subprocess, and I parse their output to extract structured data:

python
import subprocess
import json

def run_subfinder(domain: str) -> list[str]:
    result = subprocess.run(
        ['subfinder', '-d', domain, '-json'],
        capture_output=True,
        text=True
    )

    subdomains = []
    for line in result.stdout.splitlines():
        data = json.loads(line)
        subdomains.append(data['host'])

    return subdomains

2. Error Handling and Retries

Security tools can fail for various reasons (timeouts, rate limits, network issues). I implemented:

  • Exponential backoff for retries
  • Partial result storage (don't lose data if one tool fails)
  • Comprehensive logging for debugging

3. Real-Time Updates

The frontend uses polling to check task status:

typescript
async function pollTaskStatus(taskId: string) {
  const interval = setInterval(async () => {
    const response = await fetch(`/api/tasks/${taskId}`);
    const data = await response.json();

    if (data.status === 'completed' || data.status === 'failed') {
      clearInterval(interval);
      updateUI(data);
    }
  }, 2000);
}

Challenges & Solutions

Every complex system comes with its own set of challenges. Here's how I tackled the major obstacles during development:

Challenge 1: Managing Tool Dependencies

Problem: Each security tool has different installation requirements and versions.

Solution: I containerized the entire application with Docker, ensuring consistent environments across development and production. Each worker container includes all necessary tools.

Challenge 2: Scan Performance

Problem: Running all scans sequentially was too slow for large targets.

Solution: Implemented parallel execution using Celery groups and chords:

python
from celery import group, chord

def full_scan(target):
    # Run scans in parallel
    scan_tasks = group(
        subdomain_scan.s(target),
        port_scan.s(target),
        vulnerability_scan.s(target)
    )

    # Aggregate results when all complete
    callback = aggregate_results.s()
    chord(scan_tasks)(callback)

Challenge 3: Rate Limiting

Problem: External services and targets may rate limit our scans.

Solution: Implemented configurable delays between requests and respect for robots.txt:

python
import time
from urllib.robotparser import RobotFileParser

def respect_rate_limits(domain: str, delay: int = 1):
    rp = RobotFileParser()
    rp.set_url(f"https://{domain}/robots.txt")
    rp.read()

    crawl_delay = rp.crawl_delay("*")
    if crawl_delay:
        time.sleep(crawl_delay)
    else:
        time.sleep(delay)

Performance Metrics & Results

The system was tested against various targets to validate performance and scalability:

Scan Performance Benchmarks

Target SizeSubdomains FoundScan DurationWorker CountMemory Usage
Small (1-10 subdomains)82m 34s2512 MB
Medium (10-50 subdomains)438m 17s41.2 GB
Large (50-200 subdomains)18724m 52s83.1 GB
Extra Large (200+ subdomains)2,14347m 18s125.8 GB

Tool Execution Times (Average)

code
Subfinder (subdomain discovery):     ~45s for 100 subdomains
Httpx (HTTP probing):                ~2s per subdomain
Nuclei (vulnerability scanning):     ~15s per subdomain
Masscan (port scanning):             ~3m for /24 network
Gowitness (screenshots):             ~5s per URL

Key Achievements

  • Discovered 2,000+ subdomains across test targets in controlled environments
  • Identified 150+ vulnerabilities (CVEs and misconfigurations) during testing
  • Processed concurrent scans with up to 12 parallel workers without performance degradation
  • Database query performance:
    • Subdomain lookup: <10ms average
    • Finding aggregation: <50ms for 1000+ records
    • Full-text search: <100ms across 10,000+ findings
  • API response times:
    • Create scan endpoint: <200ms
    • Task status check: <50ms
    • Results retrieval: <300ms (with pagination)

Scalability Testing

The system was tested under load to verify horizontal scalability:

  • Concurrent scans: Successfully handled 25+ simultaneous scans with 12 Celery workers
  • Task throughput: Processed 100+ tasks/minute during peak load
  • Redis queue latency: Maintained <10ms even under heavy load
  • Database connection pool: Efficiently managed with 20 connections using pgBouncer

Real-World Impact

The project received recognition during thesis defense for its practical approach to solving attack surface management challenges. The system demonstrated how combining modern async frameworks, distributed task queues, and robust database design can create production-ready security automation tools.

Lessons Learned

1. Start Simple, Scale Later

I initially tried to build a complex microservices architecture. This was overkill for the requirements. Starting with a monolithic FastAPI app and adding Celery workers as needed was much more pragmatic.

2. Tool Output Parsing is Fragile

Security tools change their output formats. I learned to:

  • Version-lock tool dependencies
  • Add extensive validation for parsed data
  • Have fallback parsing strategies

3. Async is Powerful but Tricky

FastAPI's async capabilities are great, but mixing sync and async code requires care. I had to ensure all I/O operations (database, Redis, external APIs) used async clients.

4. Monitoring is Essential

For a distributed system, observability is crucial. I added:

  • Structured logging with request IDs
  • Celery Flower for task monitoring
  • Database query logging for performance tuning

Future Enhancements

While the current system is functional and performant, there are several enhancements that would make it production-ready for enterprise use:

Short-term Improvements

  1. WebSocket Support for Real-time Updates

    • Replace polling with WebSocket connections using FastAPI's native WebSocket support
    • Stream scan progress updates directly to the frontend
    • Reduce server load and improve user experience with instant notifications
  2. Advanced Scheduling System

    • Implement cron-style recurring scans using Celery Beat
    • Allow users to configure daily/weekly/monthly scan schedules
    • Track changes over time to identify new attack surface exposure
  3. Alerting & Notifications

    • Email/Slack/Discord webhooks for critical severity findings
    • Configurable alert rules based on severity, CVE scores, or custom criteria
    • Digest reports summarizing scan results

Long-term Enhancements

  1. Multi-tenancy & Authentication

    • User authentication with JWT tokens
    • Role-based access control (admin, analyst, viewer)
    • Organization-level isolation for scan data
    • API key management for programmatic access
  2. Enhanced Reporting

    • PDF/HTML export with executive summaries
    • Trend analysis showing attack surface changes over time
    • Customizable report templates
    • Integration with ticketing systems (Jira, Linear)
  3. Machine Learning Integration

    • False positive reduction using ML models
    • Anomaly detection for unusual patterns
    • Risk scoring based on historical data
    • Predictive analytics for vulnerability trends

Conclusion

Building AASM was an intensive journey into distributed systems, async programming, and security automation. What started as a thesis project evolved into a functional platform that demonstrated how modern Python frameworks can be combined to solve real-world security challenges.

Key Takeaways

  • FastAPI + Celery is a powerful combination for building async, task-based systems
  • Proper database design (UUIDs, JSONB, indexes) is crucial for scalability
  • Containerization simplifies complex tool dependency management
  • Starting simple and scaling incrementally beats over-engineering from the start
  • Monitoring and observability are essential for distributed systems

The system successfully processed thousands of scans, discovered substantial attack surface data, and proved that automated security scanning can be both efficient and scalable when built with the right architecture.

Resources

  • Full Technical Report: Available on Skemman
  • Technologies Used: FastAPI, Celery, Redis, PostgreSQL, Docker, Subfinder, Nuclei, Httpx

Questions or feedback? I'm always happy to discuss distributed systems architecture, security automation, or lessons learned from this project. Feel free to reach out!

VE

Veigar Elí Grétarsson

Full-Stack Developer based in Reykjavik, Iceland