9.1. 常用¶
Faiss: Faiss 是一个Facebook开发的开源向量索引库,它提供了一些非常快速和高效的近似最近邻搜索算法。虽然它本身不是一个数据库系统,但可以与其他数据库集成,以支持向量检索。
Weaviate: https://weaviate.io/
Annoy: Annoy 是一个小巧而快速的库,专门用于近似最近邻搜索。虽然它不是一个数据库,但你可以将其用于构建自定义向量索引。
Pinecone: https://www.pinecone.io/
Langchain AI Handbook: https://www.pinecone.io/learn/langchain/
9.1.1. Chroma¶
Chroma(Chroma is the open-source embedding database. Chroma makes it easy to build LLM apps by making knowledge, facts, and skills pluggable for LLMs): https://docs.trychroma.com/
quickstart:
# 1. Install pip install chromadb # 2. Get the Chroma Client import chromadb chroma_client = chromadb.Client() # 3. Create a collection collection = chroma_client.create_collection(name="my_collection") # 4. Add some text documents to the collection collection.add( documents=["This is a document", "This is another document"], metadatas=[{"source": "my_source"}, {"source": "my_source"}], ids=["id1", "id2"] ) # or collection.add( embeddings=[[1.2, 2.3, 4.5], [6.7, 8.2, 9.2]], documents=["This is a document", "This is another document"], metadatas=[{"source": "my_source"}, {"source": "my_source"}], ids=["id1", "id2"] ) # 5. Query the collection results = collection.query( query_texts=["This is a query document"], n_results=2 )