DEV Community

Applying a SAST Tool to a Real Application: Finding and Fixing 10 Vulnerabilities with Bandit

Why static analysis catches what code review often doesn't

A SAST tool doesn't understand what your application does - it doesn't know what a "discount" or an "order" means. What it knows is a large catalog of dangerous patterns: eval() on user input, subprocess calls with shell=True, MD5 used where a password hash is expected, pickle.loads() on data that could come from outside the process.

A human reviewer skimming a pull request can miss these because they're often one line buried in otherwise-correct logic. A SAST tool reads every line, every time, without getting tired.

The real-world example: an order service with 7 intentional mistakes

The starting point is a small Flask service with the kind of issues that show up in real codebases - not contrived examples, but the exact shape of mistakes that get written under deadline pressure:

# order_service.py
DB_PASSWORD = "Sup3rSecret!2024"

def find_order_by_customer(customer_name):
    conn = get_connection()
    cursor = conn.cursor()
    query = "SELECT * FROM orders WHERE customer_name = '%s'" % customer_name
    cursor.execute(query)
    return cursor.fetchall()

def calculate_discount_expression(expression):
    return eval(expression)

def export_orders_to_csv(filename):
    subprocess.call("cp orders.db /tmp/" + filename, shell=True)

def hash_password(password):
    return hashlib.md5(password.encode()).hexdigest()

def load_cached_cart(serialized_cart):
    return pickle.loads(serialized_cart)

Running Bandit against it

pip install bandit
bandit order_service.py

Real, unedited output:

Run metrics:
Total issues (by severity):
  Low: 3    Medium: 4    High: 3

>> Issue: [B608:hardcoded_sql_expressions] Possible SQL injection vector through string-based query construction.
   Severity: Medium   Confidence: Low   CWE: CWE-89
   Location: ./order_service.py:31

>> Issue: [B307:blacklist] Use of possibly insecure function - consider using safer ast.literal_eval.
   Severity: Medium   Confidence: High   CWE: CWE-78
   Location: ./order_service.py:39

>> Issue: [B602:subprocess_popen_with_shell_equals_true] subprocess call with shell=True identified, security issue.
   Severity: High   Confidence: High   CWE: CWE-78
   Location: ./order_service.py:44

>> Issue: [B324:hashlib] Use of weak MD5 hash for security. Consider usedforsecurity=False
   Severity: High   Confidence: High   CWE: CWE-327
   Location: ./order_service.py:49

>> Issue: [B301:blacklist] Pickle and modules that wrap it can be unsafe when used to deserialize untrusted data.
   Severity: Medium   Confidence: High   CWE: CWE-502
   Location: ./order_service.py:54

>> Issue: [B201:flask_debug_true] A Flask app appears to be run with debug=True, which exposes the Werkzeug debugger and allows the execution of arbitrary code.
   Severity: High   Confidence: Medium   CWE: CWE-94
   Location: ./order_service.py:71

Ten findings total - three Low, four Medium, three High. Note that Bandit doesn't just flag the call site; it attaches a CWE (Common Weakness Enumeration) ID to each one, which is what lets a finding map directly to a recognized vulnerability category instead of being just an opinion.

Fixing each finding

Finding Fix
Hardcoded password & API key Read from environment variables (os.environ.get(...))
SQL injection via % formatting Parameterized query: cursor.execute(query, (customer_name,))
eval() on user input ast.literal_eval(), which only evaluates literals, never arbitrary code
subprocess with shell=True shutil.copy() with no shell, plus os.path.basename() to strip path traversal
MD5 for password hashing bcrypt.hashpw()
pickle.loads() on untrusted data json.loads() instead - JSON can't execute code during deserialization
Flask debug=True + bind to 0.0.0.0 debug driven by an explicit environment flag, bind to 127.0.0.1 by default
# order_service_fixed.py
def find_order_by_customer(customer_name):
    conn = get_connection()
    cursor = conn.cursor()
    query = "SELECT * FROM orders WHERE customer_name = ?"
    cursor.execute(query, (customer_name,))
    return cursor.fetchall()

def calculate_discount_expression(expression):
    try:
        return ast.literal_eval(expression)
    except (ValueError, SyntaxError):
        raise ValueError("Invalid expression")

def hash_password(password):
    return bcrypt.hashpw(password.encode(), bcrypt.gensalt())

The re-scan caught something the manual fix missed

Running Bandit again after the fixes:

$ bandit order_service_fixed.py

>> Issue: [B108:hardcoded_tmp_directory] Probable insecure usage of temp file/directory.
   Severity: Medium   Confidence: Medium   CWE: CWE-377
   Location: ./order_service_fixed.py:47

The shell-injection fix had replaced subprocess with shutil.copy(), but it still wrote to a hardcoded /tmp path - a pattern that's vulnerable to race conditions on shared systems. This is the actual value of a SAST tool in a pipeline: it doesn't just catch the obvious first pass, it catches what a human fixing five things in a row reasonably overlooks on the sixth.

Swapping in tempfile.gettempdir() resolved it:

$ bandit order_service_fixed.py
Test results: No issues identified.
Run metrics:
Total issues (by severity):
  Low: 0    Medium: 0    High: 0

Wiring it into CI

The same command that ran locally becomes a merge gate:

# .github/workflows/sast.yml
name: SAST scan (Bandit)
on:
  push:
    branches: ["main"]
  pull_request:

jobs:
  bandit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install bandit
      - run: bandit -r . -x ./tests -ll

The -ll flag tells Bandit to fail the build only on Medium severity and above, which is a reasonable starting threshold - failing on every Low finding tends to train a team to ignore the tool's output entirely.

What a SAST tool is and isn't good for

Bandit found real, valid issues here - but it's worth being precise about its limits. It works on syntax and known-dangerous-call patterns, so it won't catch a logic flaw like an authorization check that's present but wrong, and it can produce false positives (notice the SQL-injection finding above was flagged at "Confidence: Low" - the tool is also telling you how sure it is). SAST is one layer: it belongs alongside code review, dependency scanning, and - for anything handling real user data - a second pair of human eyes on anything it flags as High severity.

Conclusion

Every one of the 10 findings here came from code that compiles, runs, and would pass a casual review - that's exactly the kind of mistake static analysis exists to catch. The more interesting result wasn't the first scan, it was the second one: fixing six vulnerabilities by hand still left a seventh, smaller one in the remediation itself. That's the actual argument for running a SAST tool in CI rather than as a one-time audit - it checks the fix the same way it checked the original bug.

Comments

No comments yet. Start the discussion.