模型推理平台
############


Ollama
======

* https://ollama.com/
* GitHub 仓库: https://github.com/ollama/ollama
* ollama(OpenAI compatibility): https://ollama.com/blog/openai-compatibility
* 支持的模型列表： https://ollama.com/library

* Ollama(开源平台，用于在本地运行和管理大型语言模型 (LLM)。它旨在使研究人员和开发人员能够轻松使用 LLM 进行各种任务)
* 定位: Ollama 专注于提供大型语言模型的运行环境，如Llama3、Phi3等。
* Ollama是一个创新的工具，旨在在本地运行像Llama 2和Mistral这样的开源LLM。这个开创性平台通过将模型权重、配置和数据集捆绑到一个由Model文件管理的统一的包中，简化了运行LLM的复杂过程。Ollama模型库提供了广泛的模型选择
* Ollama因其与各种模型的广泛兼容性而脱颖而出，包括Llama 2、Mistral和WizardCoder等著名模型。这种兼容性确保用户可以方便地接触语言建模技术前沿。Ollama包容性的方法简化了探索和使用该领域最新进展的过程，使其成为那些热衷于保持在AI研究和开发前沿的用户的理想平台。
* Ollama没有官方的WebUI，但有几个可用的WebUI选项可以使用。其中一个选项是Ollama WebUI


LLamaFactory
============

.. note:: 详见 :ref:`ui <ai-ui>`


Xinference(Xorbits Inference)
=============================

* https://github.com/xorbitsai/inference
* https://inference.readthedocs.io/
* 一个开源平台，用于简化各种 AI 模型的运行和集成。借助 Xinference，您可以使用任何开源 LLM、嵌入模型和多模态模型在云端或本地环境中运行推理，并创建强大的 AI 应用。
* 支持的模型列表: https://inference.readthedocs.io/en/latest/models/builtin/index.html


LMDeploy
========

* GitHub: https://github.com/InternLM/lmdeploy
* 官网: lmdeploy.readthedocs.io/en/latest/
* LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
* LMDeploy 由 MMDeploy 和 MMRazor 团队联合开发，是涵盖了 LLM 任务的全套轻量化、部署和服务解决方案。





localai
=======


* https://localai.io/
* https://github.com/mudler/LocalAI
* 简介: The free, Open Source OpenAI alternative. Self-hosted, community-driven and local-first. Drop-in replacement for OpenAI running on consumer-grade hardware. No GPU required. Runs gguf, transformers, diffusers and many more models architectures. It allows to generate Text, Audio, Video, Images. Also with voice cloning capabilities.
* 定位: LocalAI 是一个开源的、本地运行的OpenAI替代品，提供与OpenAI API规范兼容的REST API。
* GPU加速：它可以在没有GPU加速的情况下运行，但如果有的话可以利用它。利用GPU加速可以提高计算速度和能效。这种设置还可以适应大型LLM模型。
* 密集型模型管理：LocalAI处理大型语言模型的方法涉及一种手动、详细的方法论。用户需要直接与AutoGPTQ、RWKV、llama.cpp和vLLM等各种后端系统进行交互，这使得可以进行更多的定制和优化。这种管理风格要求细致的配置、定期更新和维护，因此需要更高的技术水平。它提供了对模型的增强控制，使用户可以精确地将其定制以满足特定需求并实现最佳性能。


特征::

    * Local, OpenAI drop-in alternative REST API. You own your data.
    * NO GPU required. NO Internet access is required either
        * Optional, GPU Acceleration is available.
    * Supports multiple models
    * 🏃 Once loaded the first time, it keep models loaded in memory for faster inference
    * ⚡ Doesn’t shell-out, but uses bindings for a faster inference and better performance.

quick-start::

    docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu
    # or, if you have an Nvidia GPU:
    # docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12


FastChat
========

* https://github.com/lm-sys/FastChat
* An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
* 支持的模型列表: https://github.com/lm-sys/FastChat/blob/main/docs/model_support.md


One API
=======

* https://github.com/songquanpeng/one-api


其他
====


Jan
---

* Jan(offline): https://jan.ai/ 
* https://github.com/janhq/jan
* 定位: Jan AI 旨在将用户的计算机转变为AI计算机，提供本地和远程API连接的能力。
* Jan支持Mac M1/ M2/ M3、Mac (Intel)、Windows、Linux系统。除了本地模型，它也支持OpenAI API。
* linux环境需要使用 pwsh 命令




LM Studio
---------

* https://lmstudio.ai/
* https://github.com/lmstudio-ai
* Any model file in one of the supportedarchitectures converted to  gguf  will work in LMStudio.
* 是桌面版应用，非开源项目



区别
----



+-----------------+-------------------------+--------------------------------+
| 特性            | Jan                     | Ollama                         |
+=================+=========================+================================+
| 支持的 LLM 引擎 | llama.cpp, TensorRT-LLM | llama.cpp,Triton, ONNX Runtime |
+-----------------+-------------------------+--------------------------------+
| 性能            | 更快                    | 在某些情况下可能更快           |
+-----------------+-------------------------+--------------------------------+
| 易用性          | 命令行和 Web 界面       | 主要命令行界面                 |
+-----------------+-------------------------+--------------------------------+
| 灵活性          | 更灵活                  | 更简单                         |
+-----------------+-------------------------+--------------------------------+
| 社区            | 较小但活跃              | 较大且拥有更多资源             |
+-----------------+-------------------------+--------------------------------+


* Jan AI 强调本地运行和隐私保护，同时提供远程API连接能力，具有跨平台特性和可定制性。
* Ollama 似乎更专注于提供大型语言模型的运行环境，并且允许用户进行自定义。
* LocalAI 是一个社区驱动的项目，旨在提供一个OpenAI的开源替代品，具有不需要GPU和互联网的独特优势，支持多种AI功能，并且对性能进行了优化。