6 Commits

Author SHA1 Message Date
b56c9fa29f refactor: Align with Go conventions and improve maintainability
Renames the `OriginalUrl` field to `OriginalURL` across media models to adhere to Go's common initialisms convention. The `json` tag is unchanged to maintain API compatibility.

Introduces constants for exporter formats (e.g., `FormatMarkdown`, `FormatDocx`) to eliminate the use of magic strings, enhancing type safety and making the code easier to maintain.

Additionally, this commit includes several minor code quality improvements:
- Wraps file-writing errors in exporters to provide more context.
- Removes redundant package-level comments from test files.
- Applies various minor linting fixes throughout the codebase.
2025-11-06 16:48:00 +01:00
68c6f4e408 chore!: prepare for v1.0.0 release
Bumps the application version to 1.0.0, signaling the first stable release. This version consolidates several new features and breaking API changes.

This commit also includes various code quality improvements:
- Modernizes tests to use t.Setenv for safer environment variable handling.
- Addresses various linter warnings (gosec, errcheck).
- Updates loop syntax to use Go 1.22's range-over-integer feature.

BREAKING CHANGE: The public API has been updated for consistency and to introduce new features like context support and structured logging.
- `GetSupportedFormat()` is renamed to `SupportedFormat()`.
- `GetSupportedFormats()` is renamed to `SupportedFormats()`.
- `FetchCourse()` now requires a `context.Context` parameter.
- `NewArticulateParser()` constructor signature has been updated.
2025-11-06 05:59:52 +01:00
37927a36b6 refactor(core)!: Add context, config, and structured logging
Introduces `context.Context` to the `FetchCourse` method and its call chain, allowing for cancellable network requests and timeouts. This improves application robustness when fetching remote course data.

A new configuration package centralizes application settings, loading them from environment variables with sensible defaults for base URL, request timeout, and logging.

Standard `log` and `fmt` calls are replaced with a structured logging system built on `slog`, supporting both JSON and human-readable text formats.

This change also includes:
- Extensive benchmarks and example tests.
- Simplified Go doc comments across several packages.

BREAKING CHANGE: The `NewArticulateParser` constructor signature has been updated to accept a logger, base URL, and timeout, which are now supplied via the new configuration system.
2025-11-06 05:14:14 +01:00
e6977d3374 refactor(html_cleaner): adopt robust HTML parsing for content cleaning
Replaces the fragile regex-based HTML cleaning logic with a proper HTML parser using `golang.org/x/net/html`. The previous implementation was unreliable and could not correctly handle malformed tags, script content, or a wide range of HTML entities.

This new approach provides several key improvements:
- Skips the content of `
2025-11-06 04:26:51 +01:00
2790064ad5 refactor: Standardize method names and introduce context propagation
Removes the `Get` prefix from exporter methods (e.g., GetSupportedFormat -> SupportedFormat) to better align with Go conventions for simple accessors.

Introduces `context.Context` propagation through the application, starting from `ProcessCourseFromURI` down to the HTTP request in the parser. This makes network operations cancellable and allows for setting deadlines, improving application robustness.

Additionally, optimizes the HTML cleaner by pre-compiling regular expressions for a minor performance gain.
2025-11-06 04:26:41 +01:00
9de7222ec3 Adds DOCX and Markdown export functionality
Introduces a modular exporter pattern supporting DOCX and Markdown formats
by implementing Exporter interfaces and restructuring application logic.

Enhances CI to install UPX for binary compression, excluding recent macOS
binaries due to compatibility issues.

Enables CGO when building binaries for all platforms, addressing potential
cross-platform compatibility concerns.

Bumps version to 0.1.1.
2025-05-25 13:03:21 +02:00