Music Filename-Fixer & Auto-Tagger

Leggi in italiano | Security

Automatically fix metadata and rename audio files in your music library. Runs nightly as a systemd service or on-demand from the command line.

What It Does

For every audio file in your music folder (recursively):

Skip if already processed — tracks files by checksum, so re-runs are fast
AcoustID fingerprint — identifies the song from the audio waveform itself (most accurate)
MusicBrainz search — looks up artist + title extracted from the filename
Filename parser fallback — uses the filename if nothing is found online

Then it writes correct metadata (title, artist, album, year) and renames the file to Artist-Title.ext.

Before / After

xXx_radiohead_creep_OFFICIAL_2023_[HD].mp3  -->  Radiohead-Creep.mp3
01 - unknown artist - no title.flac         -->  Radiohead-Creep.flac
some_random_hash_a8f3e2.ogg                 -->  Massive Attack-Teardrop.ogg

Architecture

%%{init: {'theme': 'default'}}%%
graph LR
  tagger["music_tagger.py<br/>Orchestrator"]:::core

  subgraph modules["Source Modules"]
    direction TB
    config["config.py<br/>Configuration"]:::data
    parser["parser.py<br/>Filename Parsing"]:::engine
    lookup["lookup.py<br/>API Lookup"]:::engine
    tags["tags.py<br/>Metadata R/W"]:::engine
    state["state.py<br/>Checksum Tracking"]:::data
  end

  subgraph external["External Services"]
    direction TB
    acoustid["AcoustID API"]:::ext
    musicbrainz["MusicBrainz API"]:::ext
    audio_files[("Audio Files")]:::ext
    state_file[("processed.json")]:::ext
  end

  tagger --> config
  tagger --> parser
  tagger --> lookup
  tagger --> tags
  tagger --> state

  lookup -->|"fingerprint"| acoustid
  lookup -->|"metadata"| musicbrainz
  tags -->|"read/write tags"| audio_files
  tags -->|"rename"| audio_files
  state -->|"load/save"| state_file

  classDef core fill:#2563eb,stroke:#1d4ed8,color:#fff
  classDef data fill:#d97706,stroke:#b45309,color:#fff
  classDef ext fill:#6b7280,stroke:#4b5563,color:#fff
  classDef engine fill:#059669,stroke:#047857,color:#fff

Supported Formats

MP3, FLAC, M4A, AAC, OGG, Opus, WMA

Requirements

Linux (tested on Ubuntu/Xubuntu 22.04+)
Python 3.11+
ffmpeg and chromaprint-tools (installed automatically by install.sh)

Note: The Python code is cross-platform. The automated install.sh is Linux-specific, but step-by-step guides for macOS and Windows are provided below.

Installation

Quick Setup (recommended)

git clone https://github.com/AndreaBonn/audio-filename-fixer.git
cd audio-filename-fixer

# Install everything — requires sudo for apt packages
bash install.sh /path/to/your/music

The installer handles everything:

Installs system dependencies (ffmpeg, chromaprint-tools)
Installs uv if not present, then syncs the Python environment
Creates config.env with your music directory
Sets up a systemd user service with a nightly timer (see Scheduling)
Runs a dry-run test to verify the setup

Manual Setup (Linux)

If you prefer to set things up yourself on Linux:

cd audio-filename-fixer

# Install system dependencies
sudo apt-get install -y ffmpeg chromaprint-tools

# Create Python environment
uv sync

# Create config file
cp .env.example config.env
# Edit config.env with your settings

macOS Installation

Click to expand the macOS step-by-step guide

Step 1: Install Homebrew (if you don't have it)

Homebrew is a package manager for macOS — think of it as an "app store for developer tools". Open Terminal (you can find it in Applications > Utilities, or search for it with Spotlight) and paste:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Follow the on-screen instructions. When it finishes, close and reopen Terminal.

To verify it worked, type:

brew --version

You should see something like Homebrew 4.x.x.

Step 2: Install system dependencies

Still in Terminal, run:

brew install ffmpeg chromaprint

This installs ffmpeg (audio decoder) and fpcalc (audio fingerprinting tool). It may take a few minutes.

Verify both are installed:

ffmpeg -version
fpcalc -version

Both commands should print version information (not "command not found").

Step 3: Install uv (Python package manager)

curl -LsSf https://astral.sh/uv/install.sh | sh

Close and reopen Terminal, then verify:

uv --version

Step 4: Download and set up the project

git clone https://github.com/AndreaBonn/audio-filename-fixer.git
cd audio-filename-fixer
uv sync

Step 5: Create your configuration file

cp .env.example config.env

Now open config.env with any text editor (TextEdit, VS Code, nano...) and set your music folder path:

nano config.env

Change MUSIC_DIR to point to your music folder, for example:

MUSIC_DIR=/Users/yourname/Music

Save and close (in nano: Ctrl+O, Enter, Ctrl+X).

Step 6: Test it

uv run python music_tagger.py --dry-run --music-dir ~/Music

This runs in preview mode — it shows what would change without touching any file. If you see output listing your audio files, everything works.

Step 7: Run for real

When you're satisfied with the dry-run output:

uv run python music_tagger.py

Optional: Schedule automatic runs on macOS

macOS uses launchd instead of systemd. To run the tagger every night at 3:00 AM:

Create the file ~/Library/LaunchAgents/com.music-tagger.plist:

mkdir -p ~/Library/LaunchAgents
cat > ~/Library/LaunchAgents/com.music-tagger.plist << 'EOF'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    <key>Label</key>
    <string>com.music-tagger</string>
    <key>ProgramArguments</key>
    <array>
        <string>/bin/bash</string>
        <string>-c</string>
        <string>cd "$HOME/audio-filename-fixer" && source config.env && .venv/bin/python music_tagger.py</string>
    </array>
    <key>StartCalendarInterval</key>
    <dict>
        <key>Hour</key>
        <integer>3</integer>
        <key>Minute</key>
        <integer>0</integer>
    </dict>
    <key>StandardOutPath</key>
    <string>/tmp/music-tagger.log</string>
    <key>StandardErrorPath</key>
    <string>/tmp/music-tagger.log</string>
</dict>
</plist>
EOF

Enable it:

launchctl load ~/Library/LaunchAgents/com.music-tagger.plist

To disable it later:

launchctl unload ~/Library/LaunchAgents/com.music-tagger.plist

Windows Installation

Click to expand the Windows step-by-step guide

Step 1: Install Python 3.11+

Go to python.org/downloads and download the latest Python installer
Run the installer
Important: check the box "Add Python to PATH" at the bottom of the first screen
Click "Install Now"

To verify, open PowerShell (search for it in the Start menu) and type:

python --version

You should see Python 3.11.x or higher.

Step 2: Install Git (if you don't have it)

Go to git-scm.com/downloads/win and download the installer
Run it with default settings (click "Next" through all screens)

Verify in PowerShell:

git --version

Step 3: Install ffmpeg

Go to gyan.dev/ffmpeg/builds and download "ffmpeg-release-essentials.zip"
Extract the zip file to C:\ffmpeg (create this folder if it doesn't exist)
Inside you'll find a folder like ffmpeg-7.x-essentials_build — open it and go into the bin folder
Copy the full path to the bin folder (e.g., C:\ffmpeg\ffmpeg-7.1-essentials_build\bin)
Add it to your PATH:
- Press Win + R, type sysdm.cpl, press Enter
- Go to the "Advanced" tab, click "Environment Variables"
- Under "User variables", find "Path", select it, click "Edit"
- Click "New" and paste the path to the bin folder
- Click "OK" on all windows

Close and reopen PowerShell, then verify:

ffmpeg -version

Step 4: Install fpcalc (Chromaprint)

Go to acoustid.org/chromaprint and download the Windows package
Extract the zip file
Copy fpcalc.exe to the same bin folder where you put ffmpeg (e.g., C:\ffmpeg\ffmpeg-7.1-essentials_build\bin), so it's already in your PATH

Verify:

fpcalc -version

Step 5: Install uv (Python package manager)

In PowerShell, run:

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Close and reopen PowerShell, then verify:

uv --version

Step 6: Download and set up the project

git clone https://github.com/AndreaBonn/audio-filename-fixer.git
cd audio-filename-fixer
uv sync

Step 7: Create your configuration file

copy .env.example config.env

Open config.env with Notepad:

notepad config.env

Change MUSIC_DIR to point to your music folder, for example:

MUSIC_DIR=C:\Users\YourName\Music

Save and close Notepad.

Step 8: Load the configuration and test

In PowerShell, you need to load the environment variables from config.env before running:

Get-Content config.env | ForEach-Object {
    if ($_ -match '^([^#][^=]+)=(.*)$') {
        [Environment]::SetEnvironmentVariable($Matches[1], $Matches[2], 'Process')
    }
}

uv run python music_tagger.py --dry-run

This runs in preview mode — it shows what would change without touching any file.

Step 9: Run for real

Load the config and run (same two commands, without --dry-run):

Get-Content config.env | ForEach-Object {
    if ($_ -match '^([^#][^=]+)=(.*)$') {
        [Environment]::SetEnvironmentVariable($Matches[1], $Matches[2], 'Process')
    }
}

uv run python music_tagger.py

Tip: To avoid typing the config loading command every time, you can create a shortcut file. Save this as run.ps1 in the project folder:

# run.ps1 — Run the music tagger on Windows
Get-Content "$PSScriptRoot\config.env" | ForEach-Object {
    if ($_ -match '^([^#][^=]+)=(.*)$') {
        [Environment]::SetEnvironmentVariable($Matches[1], $Matches[2], 'Process')
    }
}
uv run python "$PSScriptRoot\music_tagger.py" @args

Then run it with: powershell -File run.ps1 or powershell -File run.ps1 --dry-run.

Optional: Schedule automatic runs on Windows

Open Task Scheduler (search for it in the Start menu)
Click "Create Basic Task" in the right panel
Name: Music Filename-Fixer & Auto-Tagger, click Next
Trigger: Daily, click Next
Set the time to 3:00 AM, click Next
Action: Start a program, click Next
Program: powershell
Arguments: -ExecutionPolicy Bypass -File "C:\Users\YourName\audio-filename-fixer\run.ps1"
Click Finish

To test it immediately: right-click the task and select "Run".

Configuration

Edit config.env after installation:

MUSIC_DIR=/home/user/Music
ACOUSTID_API_KEY=your-key-here

Variable	Required	Description
`MUSIC_DIR`	Yes	Path to your music folder (scanned recursively)
`ACOUSTID_API_KEY`	No	AcoustID API key for audio fingerprinting
`STATE_FILE`	No	Path to state file (default: `state/processed.json`)
`LOG_FILE`	No	Path to log file (default: `logs/tagger.log`)

AcoustID API Key (free, recommended)

AcoustID identifies songs from the audio waveform — it works even when the filename is completely wrong or meaningless. Without it, the tagger relies only on filename parsing and MusicBrainz text search.

Go to acoustid.org and create a free account
Register a new application
Copy the API key into config.env

Usage

Basic Commands

# Dry run — preview changes without modifying any file
bash run.sh --dry-run

# Run normally — fix tags and rename files
bash run.sh

# Force reprocessing of all files (ignores state)
bash run.sh --reset-state

# Process a different folder (temporary override)
bash run.sh --music-dir /other/path

What Happens During a Run

The tagger scans MUSIC_DIR recursively for audio files
Already-processed files (tracked by SHA-1 checksum) are skipped
Files with complete tags and a clean filename are marked as done
For each remaining file, it tries the 3-step lookup pipeline
On success: writes metadata tags and renames the file
On failure: logs a warning and moves to the next file
State is saved to processed.json for future runs

Dry Run

Always run with --dry-run first on a new music folder. It shows exactly what would change without touching any file:

2024-03-15 10:30:01 [INFO] → xXx_radiohead_creep_HD.mp3
2024-03-15 10:30:02 [INFO]   AcoustID match (score=0.95): Radiohead - Creep
2024-03-15 10:30:02 [INFO]   [DRY RUN] Tag: {'title': 'Creep', 'artists': ['Radiohead'], ...}
2024-03-15 10:30:02 [INFO]   [DRY RUN] Rename: xXx_radiohead_creep_HD.mp3 -> Radiohead-Creep.mp3

Scheduling

Automatic Nightly Runs (systemd)

The install.sh script sets up a systemd user timer that runs the tagger every night at 03:00. If the machine was off at that time, it runs as soon as it boots (thanks to Persistent=true).

# Check timer status
systemctl --user status music-tagger.timer

# View next scheduled run
systemctl --user list-timers music-tagger.timer

# Manually trigger the service
systemctl --user start music-tagger.service

# Disable automatic runs
systemctl --user disable --now music-tagger.timer

# Re-enable automatic runs
systemctl --user enable --now music-tagger.timer

Manual Scheduling (cron)

If you prefer cron over systemd:

# Edit your crontab
crontab -e

# Add this line to run every night at 3:00 AM
0 3 * * * cd /path/to/music-tagger && bash run.sh >> logs/cron.log 2>&1

Project Structure

audio-filename-fixer/
├── music_tagger.py       # Entry point — orchestrates the pipeline
├── src/
│   ├── config.py         # Centralized configuration (dataclass + env vars)
│   ├── lookup.py         # AcoustID + MusicBrainz API integration
│   ├── parser.py         # Filename parsing, slugify, artist splitting
│   ├── state.py          # Checksum tracking + atomic JSON persistence
│   └── tags.py           # Read/write audio metadata (mutagen) + rename
├── tests/                # Mirrors src/ — 65 tests
│   ├── test_config.py
│   ├── test_lookup.py
│   ├── test_parser.py
│   ├── test_state.py
│   └── test_tags.py
├── install.sh            # One-command setup (deps + venv + systemd)
├── run.sh                # Manual run wrapper
├── config.env            # Your configuration (gitignored)
├── .env.example          # Configuration template
├── pyproject.toml        # Project config (uv, ruff, pytest)
├── logs/
│   └── tagger.log        # All operations logged here
└── state/
    └── processed.json    # Tracks processed files by checksum

How It Works

File Processing

%%{init: {'theme': 'default'}}%%
graph TD
  scan(["Scan audio file"]):::core
  check_state{"Already processed?<br/>checksum match"}
  skip_done(["Skip"]):::data
  read_tags["Read existing tags"]:::engine
  check_tags{"Tags complete AND<br/>filename OK?"}
  mark_done(["Mark done, skip"]):::data
  acoustid{"AcoustID fingerprint<br/>score #gt;= 0.5?"}:::engine
  mb_search{"MusicBrainz search<br/>score #gt;= 70?"}:::engine
  fallback["Filename parser<br/>fallback"]:::engine
  write_tags["Write tags + rename"]:::core
  log_warn(["Log warning, skip"]):::ext
  save_state["Update state"]:::data

  scan --> check_state
  check_state -->|"Yes"| skip_done
  check_state -->|"No"| read_tags
  read_tags --> check_tags
  check_tags -->|"Yes"| mark_done
  check_tags -->|"No"| acoustid
  acoustid -->|"Yes"| write_tags
  acoustid -->|"No"| mb_search
  mb_search -->|"Yes"| write_tags
  mb_search -->|"No"| fallback
  fallback -->|"Found"| write_tags
  fallback -->|"Failed"| log_warn
  write_tags --> save_state

  classDef core fill:#2563eb,stroke:#1d4ed8,color:#fff
  classDef data fill:#d97706,stroke:#b45309,color:#fff
  classDef ext fill:#6b7280,stroke:#4b5563,color:#fff
  classDef engine fill:#059669,stroke:#047857,color:#fff

Lookup Pipeline

sequenceDiagram
  participant mt as music_tagger
  participant lk as lookup
  participant fp as fpcalc
  participant ac as AcoustID API
  participant mb as MusicBrainz API
  participant ps as parser

  mt->>+lk: acoustid_lookup(path)
  lk->>+fp: calculate fingerprint
  fp-->>-lk: fingerprint data
  lk->>+ac: fingerprint + api_key
  ac-->>-lk: recording_id or error

  alt score >= 0.5
    lk->>+mb: get recording details
    mb-->>-lk: title, artists, album, year
    lk-->>-mt: metadata found
  else AcoustID failed
    lk-->>mt: no result
    mt->>+lk: mb_search(artists, title)
    lk->>+mb: text search query
    alt score >= 70
      mb-->>-lk: title, artists, album, year
      lk-->>-mt: metadata found
    else MusicBrainz failed
      mb-->>lk: no match
      lk-->>mt: no result
      mt->>+ps: parse_filename(stem)
      ps-->>-mt: parsed artists + title
    end
  end

Filename Parser

Handles common patterns from YouTube downloads, ripped CDs, and messy libraries:

Input	Parsed Artist	Parsed Title
`Radiohead - Creep`	Radiohead	Creep
`Drake feat. Rihanna - Take Care`	Drake, Rihanna	Take Care
`Simon & Garfunkel - The Sound of Silence`	Simon & Garfunkel	The Sound of Silence
`01. Radiohead - Creep [Official Video]`	Radiohead	Creep

The parser intelligently handles feat./ft. collaborations while preserving band names with & (e.g., Simon & Garfunkel stays as one artist).

State Management

Each processed file is tracked by its path and a SHA-1 checksum (first 64KB)
If a file is modified externally, the checksum changes and it gets reprocessed
State is written atomically (write to .tmp, then rename) to prevent corruption
--reset-state clears the state and reprocesses everything

Troubleshooting

Problem	Solution
`fpcalc not found`	Install chromaprint: `sudo apt-get install chromaprint-tools`
AcoustID not matching	Check your API key in `config.env`. The tool still works without it (text search fallback).
Permission errors	Ensure you own the music files: `ls -la /path/to/music`
Slow first run	Normal — MusicBrainz rate limits to ~1 request/second. Subsequent runs skip already-processed files.
Wrong metadata written	Run `--reset-state` to reprocess. Check `logs/tagger.log` for details.
Timer not running	Check: `systemctl --user status music-tagger.timer` and `loginctl show-user $USER \| grep Linger`

Development

# Run tests
uv run pytest -v

# Lint
uv run ruff check .

# Format
uv run ruff format .

# Security audit
uv run bandit -r src/
uv run pip-audit

Support

If you find this project useful, consider giving it a star on GitHub — it helps others discover it and motivates further development.

License

Licensed under the Apache License, Version 2.0. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
badges		badges
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
NOTICE		NOTICE
README.it.md		README.it.md
README.md		README.md
SECURITY.it.md		SECURITY.it.md
SECURITY.md		SECURITY.md
install.sh		install.sh
music_tagger.py		music_tagger.py
pyproject.toml		pyproject.toml
run.sh		run.sh
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Music Filename-Fixer & Auto-Tagger

What It Does

Before / After

Architecture

Supported Formats

Requirements

Installation

Quick Setup (recommended)

Manual Setup (Linux)

macOS Installation

Step 1: Install Homebrew (if you don't have it)

Step 2: Install system dependencies

Step 3: Install uv (Python package manager)

Step 4: Download and set up the project

Step 5: Create your configuration file

Step 6: Test it

Step 7: Run for real

Optional: Schedule automatic runs on macOS

Windows Installation

Step 1: Install Python 3.11+

Step 2: Install Git (if you don't have it)

Step 3: Install ffmpeg

Step 4: Install fpcalc (Chromaprint)

Step 5: Install uv (Python package manager)

Step 6: Download and set up the project

Step 7: Create your configuration file

Step 8: Load the configuration and test

Step 9: Run for real

Optional: Schedule automatic runs on Windows

Configuration

AcoustID API Key (free, recommended)

Usage

Basic Commands

What Happens During a Run

Dry Run

Scheduling

Automatic Nightly Runs (systemd)

Manual Scheduling (cron)

Project Structure

How It Works

File Processing

Lookup Pipeline

Filename Parser

State Management

Troubleshooting

Development

Support

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages