refactor(core)!: Add context, config, and structured logging

Introduces `context.Context` to the `FetchCourse` method and its call chain, allowing for cancellable network requests and timeouts. This improves application robustness when fetching remote course data.

A new configuration package centralizes application settings, loading them from environment variables with sensible defaults for base URL, request timeout, and logging.

Standard `log` and `fmt` calls are replaced with a structured logging system built on `slog`, supporting both JSON and human-readable text formats.

This change also includes:
- Extensive benchmarks and example tests.
- Simplified Go doc comments across several packages.

BREAKING CHANGE: The `NewArticulateParser` constructor signature has been updated to accept a logger, base URL, and timeout, which are now supplied via the new configuration system.
This commit is contained in:
2025-11-06 05:14:14 +01:00
parent e6977d3374
commit 37927a36b6
20 changed files with 1409 additions and 104 deletions

View File

@ -24,15 +24,9 @@ func NewHTMLCleaner() *HTMLCleaner {
}
// CleanHTML removes HTML tags and converts entities, returning clean plain text.
// The function parses the HTML into a node tree and extracts only text content,
// which handles edge cases like script tags or attributes better than regex.
// It handles HTML entities automatically through the parser and normalizes whitespace.
//
// Parameters:
// - htmlStr: The HTML content to clean
//
// Returns:
// - A plain text string with all HTML elements and entities removed/converted
// It parses the HTML into a node tree and extracts only text content,
// skipping script and style tags. HTML entities are automatically handled
// by the parser, and whitespace is normalized.
func (h *HTMLCleaner) CleanHTML(htmlStr string) string {
// Parse the HTML into a node tree
doc, err := html.Parse(strings.NewReader(htmlStr))