Aquileo | Releases · kaivid-labs/evret

18 Jun 12:41

lucifertrj

v0.0.3

0ca0069

v0.0.3 Latest

Latest

Added

Added LLM-assisted evaluation dataset generation.
integration: Haystack integration and Elasticsearch retriever support.
example: Streamlit evaluation dashboard example.
Improved retriever, dataset loading, and metric logging.
Updated token-overlap judge defaults

Evaluation Dataset Improvements

QueryExample now supports expected_doc_ids.
When expected_doc_ids are present, Evaluator compares retrieved doc_ids directly instead of relying on answer-text judging. JSON and CSV dataset loading both support this field.

Aliases supported:

expected_doc_ids
relevant_doc_ids

Examples And UI

Added a Streamlit dashboard example for running and visualizing evaluations:

examples/evals-streamlit-dashboard/index.py
examples/evals-streamlit-dashboard/run_evals_ui.py

Dependencies

Updated pyproject.toml:

Version bumped from 0.0.2 to 0.0.3.
Added runtime dependency:
- tqdm>=4.67.0
Added optional extras:
- elasticsearch
- haystack

Full Changelog: v0.0.2...v0.0.3

Assets 2

10 May 22:27

lucifertrj

v0.0.2

ffc7601

v0.0.2

Evret 0.0.2

Added

[new-metric] Added ERR@k metric for cascade-style graded relevance evaluation.
[new-metric] Added RBP@k metric with tunable persistence/user-patience weighting.
Structured logging utilities: get_logger, configure_logging, and JSON log formatting.
Added tracing and monitoring notebook

Changed

design change in evaluation dataset semantics from relevant_doc_ids toward expected_answers.
Improved TokenOverlapJudge matching logic, including negation handling and better overlap scoring.
Reworked quickstart, architecture, dataset-format, metrics, and judge docs.

Full Changelog: v0.0.1...v0.0.2

Assets 2

05 May 11:30

lucifertrj

v0.0.1

671d137

v0.0.1

Evret 0.0.1

Added

Added a pluggable relevance judge system for text-based evaluation.

Added TokenOverlapJudge as the default judge.
Added semantic and LLM judge support with optional extras:
- semantic
- llm-openai
- llm-anthropic
- llm-google
- judges
Added support for expected_answers in evaluation datasets, alongside classic relevant_doc_ids.
Added LangChain adapter support in both directions:
- Evret retriever as a LangChain retriever
- LangChain retriever as an Evret retriever
Added metric helper internals for ranking, DCG, set operations, and validation.
Added full MkDocs documentation site with quickstart, architecture, API docs, metric docs, retriever docs, judge docs, and
integration docs.
Added new examples:
- Qdrant demo
- LangChain integration demo
- Evaluation dataset creation example

Changed

Updated evaluator logic to use judges for matching retrieved content against expected answers or relevant labels.
Improved metric behavior for empty inputs, invalid inputs, top-k handling, and score clamping.
Revamp EvaluationDataset
Updated judge usage, integration install commands, and current API examples.
Updated package metadata to point to kaivid-labs/evret.

Full Changelog: v0.0.1b...v0.0.1

Assets 2

05 May 09:26

lucifertrj

v0.0.1b

ae93992

v0.0.1b Pre-release

Pre-release

Evret v0.0.1b0

Initial beta release of Evret, a lightweight framework for evaluating retrievers in RAG and search systems.

What's Changed

Core IR metrics: Hit Rate, Recall, Precision, MRR, nDCG, Average Precision
Evaluation pipeline with EvaluationDataset, Evaluator, and result exports
Retriever adapters for Qdrant, Chroma, Weaviate, and Milvus
LangChain and LlamaIndex integrations
JSON/CSV dataset loading
Basic examples and test coverage

New Contributors

@lucifertrj made the repo public.

Full Changelog: https://github.com/kaivid-labs/evret/commits/ff250dd

Contributors

lucifertrj

Assets 2

Releases: kaivid-labs/evret

v0.0.3

Added

Evaluation Dataset Improvements

Examples And UI

Dependencies

Uh oh!

v0.0.2

Evret 0.0.2

Added

Changed

Uh oh!

v0.0.1

Evret 0.0.1

Added

Changed

Uh oh!

v0.0.1b

Evret v0.0.1b0

What's Changed

New Contributors

Contributors

Uh oh!