fix: parse latest tournament from recent-events list on player page (#24) #25

Merged
shcizo merged 2 commits from fix/parse-recent-events-tournament-24 into main 2026-06-01 09:04:15 +02:00
Owner

Summary

  • Parse PDGA's <li class="recent-events"> element on the player page so the most recently played tournament (which now lingers outside table[id*="player-results"]) is picked up by getNewTournamentRounds.
  • Dedupe URLs across both sources via a Set keyed on normalized absolute URLs (new URL(href, location.origin).href), so an event that later migrates into the table doesn't get double-processed during the overlap window.
  • Fall back to extracting the tournament date from the event page when the recent-events element doesn't carry one (it doesn't, per live HTML), with a sanity check that the parsed date must be strictly newer than afterDate — otherwise skip with a warn rather than store rounds with a wrong date (e.g. a registration deadline captured by the regex).

Closes #24

Test plan

  • Identify a player whose /player/<N> page currently shows a tournament under <li class="recent-events"> that is NOT in any table[id*="player-results"]
  • Reset cooldown: UPDATE players SET last_round_update = NULL WHERE pdga_number = <N>;
  • Snapshot round count, max date, max competition, predicted_rating before refresh
  • Trigger refresh: curl -X POST http://localhost:3000/api/refresh-round-history/<N>
  • Confirm log shows new tournament URL discovery completed with recentEventsMatches >= 1
  • Confirm log shows recent-events: scraping tournament and recent-events: date recovered from event page (debug level)
  • Confirm DB: round count increased, max date advanced to new tournament, predicted_rating recomputed
  • Regression: run against a player with no .recent-events element — confirm recentEventsAnchorsSeen: 0 and no change in round count
## Summary - Parse PDGA's `<li class="recent-events">` element on the player page so the most recently played tournament (which now lingers outside `table[id*="player-results"]`) is picked up by `getNewTournamentRounds`. - Dedupe URLs across both sources via a `Set` keyed on normalized absolute URLs (`new URL(href, location.origin).href`), so an event that later migrates into the table doesn't get double-processed during the overlap window. - Fall back to extracting the tournament date from the event page when the recent-events element doesn't carry one (it doesn't, per live HTML), with a sanity check that the parsed date must be strictly newer than `afterDate` — otherwise skip with a `warn` rather than store rounds with a wrong date (e.g. a registration deadline captured by the regex). Closes #24 ## Test plan - [ ] Identify a player whose `/player/<N>` page currently shows a tournament under `<li class="recent-events">` that is NOT in any `table[id*="player-results"]` - [ ] Reset cooldown: `UPDATE players SET last_round_update = NULL WHERE pdga_number = <N>;` - [ ] Snapshot round count, max date, max competition, predicted_rating before refresh - [ ] Trigger refresh: `curl -X POST http://localhost:3000/api/refresh-round-history/<N>` - [ ] Confirm log shows `new tournament URL discovery completed` with `recentEventsMatches >= 1` - [ ] Confirm log shows `recent-events: scraping tournament` and `recent-events: date recovered from event page` (debug level) - [ ] Confirm DB: round count increased, max date advanced to new tournament, predicted_rating recomputed - [ ] Regression: run against a player with no `.recent-events` element — confirm `recentEventsAnchorsSeen: 0` and no change in round count
shcizo added 2 commits 2026-06-01 09:01:41 +02:00
shcizo merged commit 2561ee12ef into main 2026-06-01 09:04:15 +02:00
Sign in to join this conversation.