Applying a SAST Tool to a Real Application: Finding and Fixing 10 Vulnerabilities with Bandit
Why static analysis catches what code review often doesn't
A SAST tool doesn't understand what your application does - it doesn't know what a "discount" or an "order" means. What it knows is a large catalog of dangerous patterns: eval() on user input, subprocess calls with shell=True, MD5 used where a password hash is expected, pickle.loads() on data that could come from outside the process.
A human reviewer skimming a pull request can miss these because they're often one line buried in otherwise-correct logic. A SAST tool reads every line, every time, without getting tired.
The real-world example: an order service with 7 intentional mistakes
The starting point is a small Flask service with the kind of issues that show up in real codebases - not contrived examples, but the exact shape of mistakes that get written under deadline pressure:
# order_service.py
DB_PASSWORD = "Sup3rSecret!2024"
def find_order_by_customer(customer_name):
conn = get_connection()
cursor = conn.cursor()
query = "SELECT * FROM orders WHERE customer_name = '%s'" % customer_name
cursor.execute(query)
return cursor.fetchall()
def calculate_discount_expression(expression):
return eval(expression)
def export_orders_to_csv(filename):
subprocess.call("cp orders.db /tmp/" + filename, shell=True)
def hash_password(password):
return hashlib.md5(password.encode()).hexdigest()
def load_cached_cart(serialized_cart):
return pickle.loads(serialized_cart)
Running Bandit against it
pip install bandit
bandit order_service.py
Real, unedited output:
Run metrics:
Total issues (by severity):
Low: 3 Medium: 4 High: 3
>> Issue: [B608:hardcoded_sql_expressions] Possible SQL injection vector through string-based query construction.
Severity: Medium Confidence: Low CWE: CWE-89
Location: ./order_service.py:31
>> Issue: [B307:blacklist] Use of possibly insecure function - consider using safer ast.literal_eval.
Severity: Medium Confidence: High CWE: CWE-78
Location: ./order_service.py:39
>> Issue: [B602:subprocess_popen_with_shell_equals_true] subprocess call with shell=True identified, security issue.
Severity: High Confidence: High CWE: CWE-78
Location: ./order_service.py:44
>> Issue: [B324:hashlib] Use of weak MD5 hash for security. Consider usedforsecurity=False
Severity: High Confidence: High CWE: CWE-327
Location: ./order_service.py:49
>> Issue: [B301:blacklist] Pickle and modules that wrap it can be unsafe when used to deserialize untrusted data.
Severity: Medium Confidence: High CWE: CWE-502
Location: ./order_service.py:54
>> Issue: [B201:flask_debug_true] A Flask app appears to be run with debug=True, which exposes the Werkzeug debugger and allows the execution of arbitrary code.
Severity: High Confidence: Medium CWE: CWE-94
Location: ./order_service.py:71
Ten findings total - three Low, four Medium, three High. Note that Bandit doesn't just flag the call site; it attaches a CWE (Common Weakness Enumeration) ID to each one, which is what lets a finding map directly to a recognized vulnerability category instead of being just an opinion.
Fixing each finding
| Finding | Fix |
|---|---|
| Hardcoded password & API key | Read from environment variables (os.environ.get(...)) |
SQL injection via % formatting |
Parameterized query: cursor.execute(query, (customer_name,)) |
eval() on user input |
ast.literal_eval(), which only evaluates literals, never arbitrary code |
subprocess with shell=True |
shutil.copy() with no shell, plus os.path.basename() to strip path traversal |
| MD5 for password hashing | bcrypt.hashpw() |
pickle.loads() on untrusted data |
json.loads() instead - JSON can't execute code during deserialization |
Flask debug=True + bind to 0.0.0.0 |
debug driven by an explicit environment flag, bind to 127.0.0.1 by default |
# order_service_fixed.py
def find_order_by_customer(customer_name):
conn = get_connection()
cursor = conn.cursor()
query = "SELECT * FROM orders WHERE customer_name = ?"
cursor.execute(query, (customer_name,))
return cursor.fetchall()
def calculate_discount_expression(expression):
try:
return ast.literal_eval(expression)
except (ValueError, SyntaxError):
raise ValueError("Invalid expression")
def hash_password(password):
return bcrypt.hashpw(password.encode(), bcrypt.gensalt())
The re-scan caught something the manual fix missed
Running Bandit again after the fixes:
$ bandit order_service_fixed.py
>> Issue: [B108:hardcoded_tmp_directory] Probable insecure usage of temp file/directory.
Severity: Medium Confidence: Medium CWE: CWE-377
Location: ./order_service_fixed.py:47
The shell-injection fix had replaced subprocess with shutil.copy(), but it still wrote to a hardcoded /tmp path - a pattern that's vulnerable to race conditions on shared systems. This is the actual value of a SAST tool in a pipeline: it doesn't just catch the obvious first pass, it catches what a human fixing five things in a row reasonably overlooks on the sixth.
Swapping in tempfile.gettempdir() resolved it:
$ bandit order_service_fixed.py
Test results: No issues identified.
Run metrics:
Total issues (by severity):
Low: 0 Medium: 0 High: 0
Wiring it into CI
The same command that ran locally becomes a merge gate:
# .github/workflows/sast.yml
name: SAST scan (Bandit)
on:
push:
branches: ["main"]
pull_request:
jobs:
bandit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.12"
- run: pip install bandit
- run: bandit -r . -x ./tests -ll
The -ll flag tells Bandit to fail the build only on Medium severity and above, which is a reasonable starting threshold - failing on every Low finding tends to train a team to ignore the tool's output entirely.
What a SAST tool is and isn't good for
Bandit found real, valid issues here - but it's worth being precise about its limits. It works on syntax and known-dangerous-call patterns, so it won't catch a logic flaw like an authorization check that's present but wrong, and it can produce false positives (notice the SQL-injection finding above was flagged at "Confidence: Low" - the tool is also telling you how sure it is). SAST is one layer: it belongs alongside code review, dependency scanning, and - for anything handling real user data - a second pair of human eyes on anything it flags as High severity.
Conclusion
Every one of the 10 findings here came from code that compiles, runs, and would pass a casual review - that's exactly the kind of mistake static analysis exists to catch. The more interesting result wasn't the first scan, it was the second one: fixing six vulnerabilities by hand still left a seventh, smaller one in the remediation itself. That's the actual argument for running a SAST tool in CI rather than as a one-time audit - it checks the fix the same way it checked the original bug.
Comments
No comments yet. Start the discussion.