主页

索引

模块索引

搜索页面

9.1. 常用

  • Faiss: Faiss 是一个Facebook开发的开源向量索引库,它提供了一些非常快速和高效的近似最近邻搜索算法。虽然它本身不是一个数据库系统,但可以与其他数据库集成,以支持向量检索。

  • Weaviate: https://weaviate.io/

  • Annoy: Annoy 是一个小巧而快速的库,专门用于近似最近邻搜索。虽然它不是一个数据库,但你可以将其用于构建自定义向量索引。

  • Pinecone: https://www.pinecone.io/

9.1.1. Chroma

  • Chroma(Chroma is the open-source embedding database. Chroma makes it easy to build LLM apps by making knowledge, facts, and skills pluggable for LLMs): https://docs.trychroma.com/

  • quickstart:

    # 1. Install
    pip install chromadb
    
    # 2. Get the Chroma Client
    import chromadb
    chroma_client = chromadb.Client()
    
    # 3. Create a collection
    collection = chroma_client.create_collection(name="my_collection")
    
    # 4. Add some text documents to the collection
    collection.add(
        documents=["This is a document", "This is another document"],
        metadatas=[{"source": "my_source"}, {"source": "my_source"}],
        ids=["id1", "id2"]
    )
    # or
    collection.add(
        embeddings=[[1.2, 2.3, 4.5], [6.7, 8.2, 9.2]],
        documents=["This is a document", "This is another document"],
        metadatas=[{"source": "my_source"}, {"source": "my_source"}],
        ids=["id1", "id2"]
    )
    
    # 5. Query the collection
    results = collection.query(
        query_texts=["This is a query document"],
        n_results=2
    )
    

主页

索引

模块索引

搜索页面