lycheeverse/lychee

By lycheeverse

Updated 6 days ago

⚡ Fast, async, resource-friendly link checker written in Rust.

Image
3

1M+

lycheeverse/lychee repository overview

lychee

Homepage GitHub Marketplace Rust docs.rs Check Links Docker Pulls

⚡ A fast, async, stream-based link checker written in Rust ⚡
Finds broken hyperlinks and mail addresses in websites and Markdown, HTML, and other file formats!
Available as command-line utility, library and GitHub Action.

Lychee demo

Table of Contents

Development

After installing Rust use Cargo for building and testing. For Nix we provide a flake so you can use nix develop and nix build.

Installation

View package manager installation instructions
Arch Linux
pacman -S lychee
OpenSUSE Tumbleweed
zypper in lychee
Ubuntu
snap install lychee
Alpine Linux
 # available for Alpine Edge in testing repositories
apk add lychee
macOS

Via Homebrew:

brew install lychee

Via MacPorts:

sudo port install lychee
Docker
docker pull lycheeverse/lychee

Image tags (each also has an -alpine variant, e.g. latest-alpine):

TagDescription
latestMost recent stable release (recommended)
X.Y.Z / X.YA specific release, e.g. 0.20.0 / 0.20
nightlyBleeding-edge build from the master branch
masterAlias of nightly
sha-<sha>Build for a specific commit
Nix
nix-shell -p lychee

Or let Nix even check a packaged site with testers.lycheeLinkCheck { site = …; }

FreeBSD
pkg install lychee
Termux
pkg install lychee
Conda
conda install lychee -c conda-forge
Windows

Via scoop:

scoop install lychee

Via WinGet:

winget install --id lycheeverse.lychee

Via Chocolatey:

choco install lychee
GitHub Actions

A GitHub Action that runs lychee is available as lycheeverse/lychee-action. See the lychee-action repository for usage instructions.

Pre-built binaries

We provide binaries for Linux, macOS, and Windows for every release. You can download them from the releases page.

You can also use cargo-binstall to install these binaries:

cargo binstall lychee
Cargo
Build dependencies

On APT/dpkg-based Linux distros (e.g. Debian, Ubuntu, Linux Mint and Kali Linux) the following commands will install all required build dependencies, including the Rust toolchain and cargo:

curl -sSf 'https://sh.rustup.rs' | sh
apt install gcc pkg-config libc6-dev libssl-dev
Compile and install lychee
cargo install lychee
Feature flags

Lychee supports the following feature flags:

  • email-check enables checking email addresses using the mailify-lib crate.
  • check_example_domains allows checking example domains such as example.com. This feature is useful for testing.

By default, email-check is enabled. Note that in the past lychee could be configured to use either OpenSSL or Rustls. It was decided to fully switch to Rustls and drop OpenSSL support. Please tell us if this negatively affects you in any way.

Features

This comparison is made on a best-effort basis. Please create a PR to fix outdated information.

See the lychee website for a guide to lychee's features. Also see the command-line flags for the options you can use to customise lychee.

lycheeawesome_botmuffetbroken-link-checkerlinkinatorlinkcheckermarkdown-link-checkfink
LanguageRustRubyGoJSTypeScriptPythonJSPHP
Async/Parallelyesyesyesyesyesyesyesyes
Static binaryyesnoyesnonononono
Check Markdown filesyesyesnononoyesyesno
Check HTML filesyesnonoyesyesnoyesno
Check text filesyesnonononononono
Check a websiteyesnoyesyesyesyesnoyes
File globbingyesyesnonoyesnoyesno
User interface
Progress baryesyesnononoyesyesyes
Colored outputyesmaybeyesmaybeyesyesnoyes
Summaryyesyesyesmaybeyesyesnoyes
Quiet modeyesnononoyesyesyesyes
JSON outputyesnoyesyesyesmaybe(supports CSV)yesyes
Config fileyesnononoyesyesyesno
Use as libraryyesyesnoyesyesnoyesno
Selecting links
Include patternsyesyesnoyesnononono
Exclude patternsyesnoyesyesyesyesyesyes
Filter by schemeyesnonoyesnoyesnono
Skip private domainsyes*nonononononono
HTTP features
Custom user agentyes*nonoyesnoyesnono
Basic Authyes*nonoyesnoyesnono
Filter status codeyes*yesnonononoyesno
Custom headersyes*noyesnononoyesyes
Custom timeoutyesyesyesnoyesyesnoyes
HEAD requestsyes*yesnoyesyesyesnono
Handle redirectsyesyesyesyesyesyesyesyes
Per-host throttlingyesnoyesyesnoyesnono
Respect rate limitsyesnonononononono
Retry and backoffyesnononoyesnoyesno
Ignore insecure SSLyesyesyesnonoyesnoyes
Chunked encodingsyesmaybemaybemaybemaybenoyesyes
GZIP compressionyesmaybemaybeyesmaybeyesmaybeno
Cookiesyesnoyesnonoyesnoyes
URL features
Relative URLsyesyesnoyesyesyesyesyes
URL anchor fragmentsyes*nonononoyesyesno
URL text fragmentsyes*nonononononono
E-mail addressesyes*nonononoyesnono
Other
Recursionnonoyesyesyesyesyesno
Amazing lychee logoyesnonononononono

* May need configuration. Click the yes icon for more information.

Commandline usage

# recursively check all links in supported files inside the current directory
lychee .

# check links in specific local file(s):
lychee README.md test.html info.txt

# check links on a website:
lychee https://endler.dev

For more examples check out our usage guide.

Docker Usage

Here's how to mount a local directory into the container and check some input with lychee.

  • The --init parameter is passed so that lychee can be stopped from the terminal.
  • We also pass -it to start an interactive terminal, which is required to show the progress bar.
  • The --rm removes not used anymore container from the host after the run (self-cleanup).
  • The -w /input points to /input as the default workspace
  • The -v $(pwd):/input does local volume mounting to the container for lychee access.

By default a Debian-based Docker image is used. If you want to run an Alpine-based image, use the latest-alpine tag. For example, lycheeverse/lychee:latest-alpine

Linux/macOS shell command
docker run --init -it --rm -w /input -v $(pwd):/input lycheeverse/lychee README.md
Windows PowerShell command
docker run --init -it --rm -w /input -v ${PWD}:/input lycheeverse/lychee README.md
GitHub Token

To avoid getting rate-limited while checking GitHub links, you can optionally set an environment variable with your GitHub token like so GITHUB_TOKEN=xxxx, or use the --github-token CLI option. It can also be set in the config file. Here is an example config file.

The token can be generated on your GitHub account settings page. A personal access token with no extra permissions is enough to be able to check public repo links.

For more scalable organization-wide scenarios you can consider a GitHub App. It has a higher rate limit than personal access tokens but requires additional configuration steps on your GitHub workflow. Please follow the GitHub App Setup example.

Commandline Parameters

Use lychee --help or man lychee to see all available command line parameters.

View full help message
lychee is a fast, asynchronous link checker which detects broken URLs and mail addresses in local files and websites. It supports Markdown and HTML and works with other file formats.

lychee is powered by lychee-lib, the Rust library for link checking.

Usage: lychee [OPTIONS] [inputs]...

Arguments:
  [inputs]...
          Inputs for link checking (where to get links to check from).
          These can be: files (e.g. `README.md`), file globs (e.g. `'~/git/*/README.md'`),
          remote URLs (e.g. `https://example.com/README.md`), or standard input (`-`).
          Alternatively, use `--files-from` to read inputs from a file.

          NOTE: Use `--` to separate inputs from options that allow multiple arguments.

Options:
  -a, --accept <ACCEPT>
          A List of accepted status codes for valid links

          The following accept range syntax is supported: `[start]..[[=]end]|code`.
          Some valid examples are:

          - 200 (accepts the 200 status code only)
          - ..204 (accepts any status code < 204)
          - ..=204 (accepts any status code <= 204)
          - 200..=204 (accepts any status code from 200 to 204 inclusive)
          - 200..205 (accepts any status code from 200 to 205 excluding 205, same as 200..=204)

          Use `lychee --accept '200..=204, 429, 500' <inputs>...` to provide a comma-
          separated list of accepted status codes. This example will accept 200, 201,
          202, 203, 204, 429, and 500 as valid status codes.

          [default: 100..=103,200..=299]

      --accept-timeouts[=<false|true>]
          Accept timed out requests and return exit code 0 when encountering timeouts but not any other errors

      --archive <ARCHIVE>
          Web archive to use to provide suggestions for `--suggest`.

          [default: wayback]

          [possible values: wayback]

  -b, --base-url <BASE_URL>
          Base URL to use when resolving relative URLs in local files. If specified,
          relative links in local files are interpreted as being relative to the given
          base URL.

          For example, given a base URL of `https://example.com/dir/page`, the link `a`
          would resolve to `https://example.com/dir/a` and the link `/b` would resolve
          to `https://example.com/b`. This behavior is not affected by the filesystem
          path of the file containing these links.

          Note that relative URLs without a leading slash become siblings of the base
          URL. If, instead, the base URL ended in a slash, the link would become a child
          of the base URL. For example, a base URL of `https://example.com/dir/page/` and
          a link of `a` would resolve to `https://example.com/dir/page/a`.

          Basically, the base URL option resolves links as if the local files were hosted
          at the given base URL address.

          The provided base URL value must either be a URL (with scheme) or an absolute path.
          Note that certain URL schemes cannot be used as a base, e.g., `data` and `mailto`.

      --basic-auth <BASIC_AUTH>
          Basic authentication support. E.g. `http://example.com username:password`

  -c, --config <FILE_PATH>
          Configuration file to use. Can be specified multiple times.

          If given multiple times, the configs are merged and later
          occurrences take precedence over previous occurrences.

          [default: lychee.toml]

      --cache[=<false|true>]
          Use request cache stored on disk at `.lycheecache`

      --cache-exclude-status <CACHE_EXCLUDE_STATUS>
          A list of status codes that will be ignored from the cache

          The following exclude range syntax is supported: `[start]..[[=]end]|code`. Some valid
          examples are:

          - 429 (excludes the 429 status code only)
          - 500.. (excludes any status code >= 500)
          - ..100 (excludes any status code < 100)
          - 500..=599 (excludes any status code from 500 to 599 inclusive)
          - 500..600 (excludes any status code from 500 to 600 excluding 600, same as 500..=599)

          Use `lychee --cache-exclude-status '429, 500..502' <inputs>...` to provide a
          comma-separated list of excluded status codes. This example will not cache results
          with a status code of 429, 500 and 501.

      --cookie-jar <COOKIE_JAR>
          Read and write cookies using the given file. Cookies will be stored in the
          cookie jar and sent with requests. New cookies will be stored in the cookie jar
          and existing cookies will be updated.

      --default-extension <EXTENSION>
          This is the default file extension that is applied to files without an extension.

          This is useful for files without extensions or with unknown extensions.
          The extension will be used to determine the file type for processing.

          Examples:
            --default-extension md
            --default-extension html

      --dump[=<false|true>]
          Don't perform any link checking. Instead, dump all the links extracted from inputs that would be checked

      --dump-inputs[=<false|true>]
          Don't perform any link extraction and checking. Instead, dump all input sources from which links would be collected

  -E, --exclude-all-private[=<false|true>]
          Exclude all private IPs from checking.
          Equivalent to `--exclude-private --exclude-link-local --exclude-loopback`

      --exclude <EXCLUDE>
          Exclude URLs and mail addresses from checking. The values are treated as regular expressions

      --exclude-link-local[=<false|true>]
          Exclude link-local IP address range from checking

      --exclude-loopback[=<false|true>]
          Exclude loopback IP address range and localhost from checking

      --exclude-path <EXCLUDE_PATH>
          Exclude paths from getting checked. The values are treated as regular expressions

      --exclude-private[=<false|true>]
          Exclude private IP address ranges from checking

      --extensions <EXTENSIONS>
          A list of file extensions. Files not matching the specified extensions are skipped.

          Multiple extensions can be separated by commas. Note that if you want to check filetypes,
          which have multiple extensions, e.g. HTML files with both .html and .htm extensions, you need to
          specify both extensions explicitly.
          An example is: `--extensions html,htm,php,asp,aspx,jsp,cgi`.

          This is useful when the default extensions are not enough and you don't
          want to provide a long list of inputs (e.g. file1.html, file2.md, etc.)

          [default: md,markdown,mdx,qmd,rmd,mkd,mkdn,mdwn,mdown,mkdown,html,htm,css,txt,xml]

  -f, --format <

Tag summary

Content type

Image

Digest

sha256:71de4409c

Size

11.6 MB

Last updated

6 days ago

docker pull lycheeverse/lychee:sha-467197f-alpine