I built a phishing detector into Chrome using Claude AI. Here's exactly how.
My mother called me last week. Someone had sent her an SMS claiming to be from DHL, asking her to pay a £2.99 customs fee via a link. She almost clicked it. That was enough. I spent a weekend building a Chrome extension that lets you paste any suspicious message and get an instant verdict. Here's how it works. The architecture (and why Cloudflare Workers) The obvious approach is to call the Claude API directly from the extension. Don't do this. Your API key lives in the extension code, which anyone can extract from the Chrome Web Store in about 30 seconds. The right pattern: extension → Cloudflare Worker → Claude API. The Worker lives server-side, holds the API key as an environment variable, and acts as a proxy. Cloudflare's free tier handles 100,000 requests/day, which is more than enough. The Worker export default { async fetch(request, env) { const { prompt } = await request.json(); const response = await fetch('https://api.anthropic.com/v1/messages', { method: 'POST', headers: { 'x-api-key': env.ANTHROPIC_API_KEY, 'anthropic-version': '2023-06-01', 'content-type': 'application/json' }, body: JSON.stringify({ model: 'claude-haiku-4-5-20251001', max_tokens: 350, messages: [{ role: 'user', content: prompt }] }) }); return response; } } I'm using Haiku, not Opus. For a classification task like this - is this phishing or not - Haiku is faster, 10x cheaper, and gets the same result. Opus is overkill. The prompt After a dozen iterations, this is what actually works: "You are an expert cybersecurity analyst specializing in phishing detection. Analyze the following message and determine if it is PHISHING, SUSPICIOUS, or LEGITIMATE. Pay special attention to impersonation of financial institutions (PayPal, Chase, Barclays), government agencies (IRS, HMRC, DVLA), delivery services (UPS, FedEx, Royal Mail) and major tech companies (Amazon, Apple, Microsoft, Netflix). Respond ONLY in this format: VERDICT: [PHISHING / SUSPICIOUS / LEGITIMATE] CONFIDENCE: [High / Medium / Low] SIGNALS: [comma-separated list, max 4] ADVICE: [one clear action sentence]" One thing worth knowing: parse only the VERDICT line, not the whole response. Otherwise txt.includes("PHISHING") will always return true because the word appears in the template itself. const verdictLine = txt.split('\n') .find(l => l.startsWith('VERDICT:')) || ''; const isPhishing = verdictLine.includes('PHISHING'); Obvious in hindsight. Took me longer than I'd like to admit. Results Tested against 50 real phishing attempts. Claude got 48 right. The two it missed were unusually well-crafted - legitimate-looking domains with no obvious red flags. For anything with a suspicious link or an urgency pattern, it's essentially perfect. If you want the full source code - extension, Worker, and deploy instructions - I packaged it here: https://carlosdevlop.gumroad.com/l/ai-phishing-detector-bundle
My mother called me last week. Someone had sent her an SMS claiming to be from DHL, asking her to pay a £2.99 customs fee via a link. She almost clicked it. That was enough. I spent a weekend building a Chrome extension that lets you paste any suspicious message and get an instant verdict. Here's how it works. The architecture (and why Cloudflare Workers) The obvious approach is to call the Claude API directly from the extension. Don't do this. Your API key lives in the extension code, which anyone can extract from the Chrome Web Store in about 30 seconds. The right pattern: extension → Cloudflare Worker → Claude API. The Worker lives server-side, holds the API key as an environment variable, and acts as a proxy. Cloudflare's free tier handles 100,000 requests/day, which is more than enough. The Worker export default { async fetch(request, env) { const { prompt } = await request.json(); const response = await fetch('https://api.anthropic.com/v1/messages', { method: 'POST', headers: { 'x-api-key': env.ANTHROPIC_API_KEY, 'anthropic-version': '2023-06-01', 'content-type': 'application/json' }, body: JSON.stringify({ model: 'claude-haiku-4-5-20251001', max_tokens: 350, messages: [{ role: 'user', content: prompt }] }) }); return response; } } I'm using Haiku, not Opus. For a classification task like this - is this phishing or not - Haiku is faster, 10x cheaper, and gets the same result. Opus is overkill. The prompt After a dozen iterations, this is what actually works: "You are an expert cybersecurity analyst specializing in phishing detection. Analyze the following message and determine if it is PHISHING, SUSPICIOUS, or LEGITIMATE. Pay special attention to impersonation of financial institutions (PayPal, Chase, Barclays), government agencies (IRS, HMRC, DVLA), delivery services (UPS, FedEx, Royal Mail) and major tech companies (Amazon, Apple, Microsoft, Netflix). Respond ONLY in this format: VERDICT: [PHISHING / SUSPICIOUS / LEGITIMATE] CONFIDENCE: [High / Medium / Low] SIGNALS: [comma-separated list, max 4] ADVICE: [one clear action sentence]" One thing worth knowing: parse only the VERDICT line, not the whole response. Otherwise txt.includes("PHISHING") will always return true because the word appears in the template itself. const verdictLine = txt.split('\n') .find(l => l.startsWith('VERDICT:')) || ''; const isPhishing = verdictLine.includes('PHISHING'); Obvious in hindsight. Took me longer than I'd like to admit. Results Tested against 50 real phishing attempts. Claude got 48 right. The two it missed were unusually well-crafted - legitimate-looking domains with no obvious red flags. For anything with a suspicious link or an urgency pattern, it's essentially perfect. If you want the full source code - extension, Worker, and deploy instructions - I packaged it here: https://carlosdevlop.gumroad.com/l/ai-phishing-detector-bundle Top comments (0)
Comments
No comments yet. Start the discussion.