Embeddings library
Unlike the overwhelming majority of alternatives:
- handles long inputs without truncation even if the underlying model has a small context window
- minimal config
- 100% human-written (thus sane & clean) codebase
Tip
Please open an issue if you know of any better alternatives. I would love to archive this repo.
Database semantic search:
from ellama import EllamaDB, Document
db = EllamaDB("test")
db.add_documents([
Document("hello world", id="salutation"),
Document("goodbye and goodnight", id="farewell")])
docs = db.similarity_search("Greetings, Earth!", k=1)
assert len(docs) == 1
assert docs[0].id == "salutation"Embeddings visualisation:
import matplotlib.pyplot as plt
from ellama import EllamaDB, Document
from sklearn.datasets import fetch_20newsgroups
raw = fetch_20newsgroups(data_home='.cache')
db = EllamaDB("20newsgroups")
db.add_documents([
Document(raw.data[i], id=str(i), metadata={'name': raw.target_names[raw.target[i]]})
for i in range(200)])
for group in ['alt.atheism', 'comp', 'misc.forsale', 'rec', 'rec.sport', 'sci',
'soc.religion', 'talk.politics', 'talk.religion']:
db.plot('t-SNE', label=group,
filter=lambda metadata: metadata['name'].startswith(f'{group}.'))
plt.title(f"Newsgroup {db.embeddings.model} embeddings t-SNE")
plt.legend()
plt.show()Low-rank adaptation (LoRA) re-using the newsgroups database created above:
from ellama import EllamaDB
db = EllamaDB("20newsgroups")
db.lora(['name'], "ellama/lora:news", epochs=600)
# create new database using the new model
db_lora = EllamaDB("20newsgroups_lora", "ellama/lora:news")
db_lora.add_documents(db.get_docs({}))pip install "ellama[cpu]" # basic
pip install "ellama[cpu,plot]" # plot('PCA' or 't-SNE')
pip install "ellama[cpu,plot,umap]" # plot('UMAP')
pip install "ellama[lora]" # fine-tuningname: ellama
channels: [pytorch, nvidia, conda-forge]
dependencies:
- langchain 1.*
- langchain-community
- faiss-gpu
- requests
- tqdm
#- matplotlib # ellama plot()
#- scikit-learn # ellama plot()
#- umap-learn # ellama plot('UMAP')
#- unsloth # ellama lora()
- pip
- pip:
- ellama