Your Trending Repos Script Broke. Again.
DEV Community Grade 10 2h ago

Your Trending Repos Script Broke. Again.

Your Trending Repos Script Broke. Again. You set up a GitHub Action to scrape trending TypeScript repos. It ran at 9 AM. You got an empty JSON array. No errors. Just [] . Sound familiar? Here's what happened: GitHub's trending page changed its HTML structure. Your cheerio selector stopped matching. The cron job ran happily, found zero repos, and overwrote your daily digest with nothing. Nobody noticed until someone asked "why wasn't X trending yesterday?" This sucks. I know. I've debugged this exact thing three times in the last year. Why It Breaks GitHub's trending page doesn't have an API. You're scraping HTML. The issue isn't your code — it's that you're parsing a document designed for humans, not machines. // Your current approach — fragile as hell const $ = cheerio . load ( html ); const repos = $ ( ' .Box article.Box-row ' ). map (( i , el ) => { const name = $ ( el ). find ( ' h2 a ' ). text (). trim (); return { name }; }). get (); That selector worked in January. In February, GitHub added a wrapper div. In March, they changed h2 to h3 . Your script silently failed every time. The real problem: You have no feedback loop . The script runs, produces nothing, and you only find out when someone complains. No alert. No trace. Just silence. Manual Fix — Make It Resilient Let's fix this properly. First, add validation. Fail loudly. async function fetchTrending () { const res = await fetch ( ' https://github.com/trending/typescript?since=daily ' ); const html = await res . text (); // Validate we got real content if ( ! html . includes ( ' trending ' )) { throw new Error ( ' HTML structure changed — trending section missing ' ); } const $ = cheerio . load ( html ); const repos = []; // Try multiple selectors — be defensive const articles = $ ( ' article.Box-row ' ). length ? $ ( ' article.Box-row ' ) : $ ( ' div[class*="Box-row"] ' ); if ( articles . length === 0 ) { throw new Error ( ' No repos found — selector likely broken ' ); } articles . each (( i , el ) => { repos . push ({ name : $ ( el ). find ( ' h2 a, h3 a ' ). first (). text (). trim (), stars : $ ( el ). find ( ' .octicon-star ' ). parent (). text (). trim (), description : $ ( el ). find ( ' p ' ). text (). trim () }); }); return repos ; } Now you get errors instead of silence. But you still don't know why it broke. You'll SSH into the box, run it manually, stare at the HTML, and curse. The Better Way — TracePilot Add one import. Wrap your fetch. Now every run is recorded, including the raw HTML that broke your parser. import { TracePilot } from ' tracepilot-sdk ' ; const tp = new TracePilot ( process . env . TRACEPILOT_API_KEY ); async function fetchTrending () { await tp . startTrace ( ' trending-repos-scraper ' ); const res = await fetch ( ' https://github.com/trending/typescript?since=daily ' ); const html = await res . text (); const { result , spanId } = await tp . wrapToolCall ( ' parse-trending-page ' , () => { const $ = cheerio . load ( html ); const repos = []; const articles = $ ( ' article.Box-row ' ); if ( articles . length === 0 ) { throw new Error ( ' No repos found ' ); } articles . each (( i , el ) => { repos . push ({ name : $ ( el ). find ( ' h2 a ' ). text (). trim () }); }); return repos ; }, null , 1 ); return result ; } When it fails, open your dashboard. You'll see the exact HTML that was returned. Fork the trace. Edit the selector. Replay. No redeployment. No guessing. Guess what happens next? You can even set up alerts: "If zero repos returned, notify the team." Real feedback. Real debugging. The fix isn't better selectors. It's knowing exactly what happened the moment it broke. Debugging AI agents shouldn't feel like reading The Matrix. Join other engineers who are building reliable autonomous workflows in our community: TracePilot Discord

Your Trending Repos Script Broke. Again. You set up a GitHub Action to scrape trending TypeScript repos. It ran at 9 AM. You got an empty JSON array. No errors. Just [] . Sound familiar? Here's what happened: GitHub's trending page changed its HTML structure. Your cheerio selector stopped matching. The cron job ran happily, found zero repos, and overwrote your daily digest with nothing. Nobody noticed until someone asked "why wasn't X trending yesterday?" This sucks. I know. I've debugged this exact thing three times in the last year. Why It Breaks GitHub's trending page doesn't have an API. You're scraping HTML. The issue isn't your code — it's that you're parsing a document designed for humans, not machines. // Your current approach — fragile as hell const $ = cheerio.load(html); const repos = $('.Box article.Box-row').map((i, el) => { const name = $(el).find('h2 a').text().trim(); return { name }; }).get(); That selector worked in January. In February, GitHub added a wrapper div. In March, they changed h2 to h3 . Your script silently failed every time. The real problem: You have no feedback loop. The script runs, produces nothing, and you only find out when someone complains. No alert. No trace. Just silence. Manual Fix — Make It Resilient Let's fix this properly. First, add validation. Fail loudly. async function fetchTrending() { const res = await fetch('https://github.com/trending/typescript?since=daily'); const html = await res.text(); // Validate we got real content if (!html.includes('trending')) { throw new Error('HTML structure changed — trending section missing'); } const $ = cheerio.load(html); const repos = []; // Try multiple selectors — be defensive const articles = $('article.Box-row').length ? $('article.Box-row') : $('div[class*="Box-row"]'); if (articles.length === 0) { throw new Error('No repos found — selector likely broken'); } articles.each((i, el) => { repos.push({ name: $(el).find('h2 a, h3 a').first().text().trim(), stars: $(el).find('.octicon-star').parent().text().trim(), description: $(el).find('p').text().trim() }); }); return repos; } Now you get errors instead of silence. But you still don't know why it broke. You'll SSH into the box, run it manually, stare at the HTML, and curse. The Better Way — TracePilot Add one import. Wrap your fetch. Now every run is recorded, including the raw HTML that broke your parser. import { TracePilot } from 'tracepilot-sdk'; const tp = new TracePilot(process.env.TRACEPILOT_API_KEY); async function fetchTrending() { await tp.startTrace('trending-repos-scraper'); const res = await fetch('https://github.com/trending/typescript?since=daily'); const html = await res.text(); const { result, spanId } = await tp.wrapToolCall( 'parse-trending-page', () => { const $ = cheerio.load(html); const repos = []; const articles = $('article.Box-row'); if (articles.length === 0) { throw new Error('No repos found'); } articles.each((i, el) => { repos.push({ name: $(el).find('h2 a').text().trim() }); }); return repos; }, null, 1 ); return result; } When it fails, open your dashboard. You'll see the exact HTML that was returned. Fork the trace. Edit the selector. Replay. No redeployment. No guessing. Guess what happens next? You can even set up alerts: "If zero repos returned, notify the team." Real feedback. Real debugging. The fix isn't better selectors. It's knowing exactly what happened the moment it broke. Debugging AI agents shouldn't feel like reading The Matrix. Join other engineers who are building reliable autonomous workflows in our community: TracePilot Discord Top comments (0)

Comments

No comments yet. Start the discussion.