Automated internal link auditing tool for websites. Detects broken links, tracking parameters, empty filters, and other URL issues.
- Broken link detection - Identifies 404 errors and unreachable pages
- Tracking parameter detection - Flags URLs with UTM, analytics, or custom tracking params
- Empty filter detection - Finds URLs with empty filter values
- Useless parameter cleanup - Detects URLs with unnecessary parameters
- Sitemap support - Parse XML sitemaps (including sitemap indexes)
- Multiple notification channels - Slack and Microsoft Teams webhooks
- Configurable - Custom selectors, excluded domains, parameter patterns
# Clone the repository
git clone https://github.com/yourusername/link-auditor.git
cd link-auditor
# Create virtual environment
python -m venv venv
source venv/bin/activate # Linux/Mac
# or
venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Copy environment file
cp .env.example .envEdit .env to configure webhook URLs:
SLACK_WEBHOOK_URL=https://hooks.slack.com/services/YOUR/WEBHOOK/URL
TEAMS_WEBHOOK_URL=https://outlook.office.com/webhook/YOUR/WEBHOOK/URLEdit config/config.json to customize:
timeout- HTTP request timeout in secondscontent_selectors- CSS selectors for main content areasexcluded_selectors- CSS selectors to exclude (nav, footer, etc.)excluded_domains- Domains to ignore (social media, etc.)tracking_params- Parameters to flag as trackingfilter_params_prefixes- Prefixes for filter parameters
python main.py --url https://example.com/pagepython main.py --urls https://example.com/page1 https://example.com/page2python main.py --urls-file urls.txtpython main.py --sitemap https://example.com/sitemap.xmlpython main.py --sitemap https://example.com/sitemap.xml --sitemap-filter "/blog/" --sitemap-limit 100python main.py --url https://example.com/page --slackpython main.py --url https://example.com/page --teamspython main.py --url https://example.com/page --output audit.jsonpython main.py --sitemap https://example.com/sitemap.xml \
--sitemap-filter "/products/" \
--sitemap-limit 50 \
--slack --teams \
--output audit_report.json \
--log-level DEBUG \
--log-file audit.log| Option | Description |
|---|---|
--url |
Single URL to analyze |
--urls |
List of URLs to analyze |
--urls-file |
Text file containing URLs (one per line) |
--sitemap |
XML sitemap URL to parse |
--sitemap-filter |
Regex pattern to filter sitemap URLs |
--sitemap-limit |
Maximum number of URLs from sitemap |
--config |
Path to JSON config file |
--output |
JSON output file path |
--slack |
Send report to Slack |
--teams |
Send report to Microsoft Teams |
--log-level |
Logging level (DEBUG, INFO, WARNING, ERROR) |
--log-file |
Log file path |
The auditor detects the following issue types:
- Tracking Parameters - Links containing tracking parameters (utm_*, fbclid, gclid, etc.)
- Broken Links - Links returning 404 or unreachable
- Empty Filters - Links with filter parameters but empty values
- Useless Parameters - Links with unnecessary or malformed parameters
{
"domain_name": "example.com",
"success": true,
"data": {
"problematic_links": [
{
"source_page": "https://example.com/blog/article",
"internal_link": "/products/item?utm_source=blog",
"full_url": "https://example.com/products/item?utm_source=blog",
"anchor_text": "Check out this product",
"context": "In this article we discuss...",
"issue_type": "Link with tracking parameters (clean URL recommended)",
"http_status": null,
"scan_date": "2024-01-15"
}
],
"stats": {
"total_links_analyzed": 150,
"internal_links_count": 150,
"problematic_links_count": 3,
"pages_analyzed": 10,
"pages_success": 10,
"pages_failed": 0,
"issues_by_type": {
"Link with tracking parameters (clean URL recommended)": 2,
"Broken link (page not found)": 1
},
"duration_seconds": 45.23
}
},
"metadata": {
"source": "link_auditor",
"version": "1.0.0",
"timestamp": "2024-01-15T14:30:00"
}
}link-auditor/
|-- main.py # CLI entry point
|-- config/
| |-- config.json # Default configuration
|-- core/
| |-- __init__.py
| |-- models.py # Pydantic models
| |-- scraper.py # Link auditor logic
| |-- sitemap_parser.py
| |-- formatters/
| |-- __init__.py
| |-- json_formatter.py
| |-- slack_formatter.py
| |-- teams_formatter.py
|-- .env.example # Environment template
|-- requirements.txt # Python dependencies
|-- LICENSE # MIT License
|-- README.md
- Python 3.8+
- requests
- beautifulsoup4
- pydantic
- tenacity
- python-dotenv
MIT License - see LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request