Text Generation Inference ######################### * From: https://huggingface.co/docs/text-generation-inference/index * GitHub: https://github.com/huggingface/text-generation-inference/ * 专门用于部署和服务高度优化的LLMs进行推理的库。它包括 Transformers 中未包含的面向部署的优化功能,例如用于提高吞吐量的连续批处理和用于多 GPU 推理的张量并行性。 * Hugging Face also provides Text Generation Inference (TGI), a library dedicated to deploying and serving highly optimized LLMs for inference. It includes deployment-oriented optimization features not included in Transformers, such as continuous batching for increasing throughput and tensor parallelism for multi-GPU inference. .. note:: 这个本质其实是做了一个推理的应用。 Getting started =============== Text Generation Inference ------------------------- * Text Generation Inference (TGI) is a toolkit for deploying and serving Large Language Models (LLMs). TGI enables high-performance text generation for the most popular open-source LLMs, including Llama, Falcon, StarCoder, BLOOM, GPT-NeoX, and T5. .. figure:: https://img.zhaoweiguo.com/uPic/2024/10/Js2elD.png Quick Tour ---------- Launching TGI:: model=teknium/OpenHermes-2.5-Mistral-7B volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data \ ghcr.io/huggingface/text-generation-inference:2.4.0 \ --model-id $model Consuming TGI:: import requests headers = { "Content-Type": "application/json", } data = { 'inputs': 'What is Deep Learning?', 'parameters': { 'max_new_tokens': 20, }, } response = requests.post('http://127.0.0.1:8080/generate', headers=headers, json=data) print(response.json()) # {'generated_text': '\n\nDeep Learning is a subset of Machine Learning that is concerned with the development of algorithms that can'} Installation from source ------------------------ Install CLI:: git clone https://github.com/huggingface/text-generation-inference.git && cd text-generation-inference make install .. note:: 需要先安装protobuf和rust