mirror of
https://github.com/kjanat/articulate-parser.git
synced 2026-01-16 09:02:10 +01:00
refactor(core)!: Add context, config, and structured logging
Introduces `context.Context` to the `FetchCourse` method and its call chain, allowing for cancellable network requests and timeouts. This improves application robustness when fetching remote course data. A new configuration package centralizes application settings, loading them from environment variables with sensible defaults for base URL, request timeout, and logging. Standard `log` and `fmt` calls are replaced with a structured logging system built on `slog`, supporting both JSON and human-readable text formats. This change also includes: - Extensive benchmarks and example tests. - Simplified Go doc comments across several packages. BREAKING CHANGE: The `NewArticulateParser` constructor signature has been updated to accept a logger, base URL, and timeout, which are now supplied via the new configuration system.
This commit is contained in:
@ -24,15 +24,9 @@ func NewHTMLCleaner() *HTMLCleaner {
|
||||
}
|
||||
|
||||
// CleanHTML removes HTML tags and converts entities, returning clean plain text.
|
||||
// The function parses the HTML into a node tree and extracts only text content,
|
||||
// which handles edge cases like script tags or attributes better than regex.
|
||||
// It handles HTML entities automatically through the parser and normalizes whitespace.
|
||||
//
|
||||
// Parameters:
|
||||
// - htmlStr: The HTML content to clean
|
||||
//
|
||||
// Returns:
|
||||
// - A plain text string with all HTML elements and entities removed/converted
|
||||
// It parses the HTML into a node tree and extracts only text content,
|
||||
// skipping script and style tags. HTML entities are automatically handled
|
||||
// by the parser, and whitespace is normalized.
|
||||
func (h *HTMLCleaner) CleanHTML(htmlStr string) string {
|
||||
// Parse the HTML into a node tree
|
||||
doc, err := html.Parse(strings.NewReader(htmlStr))
|
||||
|
||||
Reference in New Issue
Block a user