使用ollama + streamlit快速构建本地大模型应用

2024-12-17

字数统计: 920字 | 阅读时长≈ 4分

说明

使用ollama可以很方便的运行本地大模型(包括官方模型和gguf量化模型)，使用streamlit快速构建对话界面。

安装ollama

以linux系统为例

在线安装

在线安装直接按官方命令执行：curl -fsSL https://ollama.com/install.sh | sh ，但鉴于国内网络下不动，可以考虑手动安装

手动安装

下载安装包： curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz
解压： sudo tar -C /usr -xzf ollama-linux-amd64.tgz
运行服务： ollama serve

下载模型文件

此处使用modelscope下载qwen的gguf量化模型

安装modelscope下载工具:pip install -U modelscope

下载模型文件： modelscope download --model=Qwen/Qwen2.5-Coder-32B-Instruct-GGUF --include "qwen2.5-coder-32b-instruct-q5_k_m*.gguf" --local_dir .

创建ModelFile

ModelFile用于ollama构建本地模型，示例如下：

FROM ./QwQ-32B-Preview-GGUF/qwen2.5-coder-32b-instruct-q5_k_m.gguf
# sets the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# sets the context window size to 4096, this controls how many tokens the LLM can use as context to generate the next token
PARAMETER num_ctx 4096

# sets a custom system message to specify the behavior of the chat assistant
SYSTEM You are Mario from super mario bros, acting as an assistant.

其中FROM为本地大模型的路径

构建ollama本地大模型

执行:ollama create mymodel -f ./Modelfile 名字可以随便起，Modelfile为刚才创建的Modelfile文件路径。

创建完成后运行： ollama run mymodel

编写streamlit页面并与ollama对接

依赖

需要安装的依赖如下：

pip install transformers
pip install ctransformers
pip install streamlit
pip install torch

如无法运行可能需要安装torch cuda环境，具体安装此处省略。

编写python代码

创建main.py编写如下代码：


from transformers import AutoTokenizer
from ctransformers import AutoModelForCausalLM
from transformers import TextStreamer
import logging
import torch
import json
import requests

logger = logging.getLogger(__name__)
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.StreamHandler(),  # 输出到控制台
        logging.FileHandler('qwen-chat-gguf.log')  # 输出到文件
    ]
)

import streamlit as st
st.set_page_config(
    page_title="MY AI",
    page_icon="🤖"  
)
"""
AI对话模块，支持流式输出 
"""

class QwenChat:
    def __init__(self):
        self.api_base = "http://localhost:11434"  # Ollama默认地址
        self.model = "mymodel"  # 使用的模型名称
    
    
    def stream_chat(self, prompt,history=None):

        # 构建API请求
        url = f"{self.api_base}/api/generate"
        headers = {
            "Content-Type": "application/json"
        }
        data = {
            "model": self.model,
            "prompt": prompt,
            "stream": True,
            "system": "你是AI助手",
            "messages": history
        }
        

        try:
            # 发送流式请求
            response = requests.post(url, headers=headers, json=data, stream=True)
            response.raise_for_status()
            return response
                        
        except Exception as e:
            logger.error(f"调用Ollama API时发生错误: {str(e)}")
            raise e
        
        
@st.cache_resource
def get_qwen_chat_instance():
    return QwenChat()


if __name__ == "__main__":
    
    if torch.cuda.is_available():
        torch.cuda.empty_cache()
     # 创建 QwenChat 实例
    # 使用共享的 QwenChat 实例
    qwen_chat = get_qwen_chat_instance()
    
    st.title("AI助手")
    st.write("请问有什么可以帮您的？")
    
    
    # 初始化对话历史
    if "messages" not in st.session_state:
        st.session_state.messages = []
    
    # 显示历史对话
    for message in st.session_state.messages:
        with st.chat_message(message["role"]):
            st.markdown(message["content"])
    
    # 用户输入
    if prompt := st.chat_input("请输入您的问题"):
        # 显示用户问题
        with st.chat_message("user"):
            st.markdown(prompt)
        st.session_state.messages.append({"role": "user", "content": prompt})
        
        # 显示AI回答
        with st.chat_message("assistant"):
            message_placeholder = st.empty()
            full_response = ""
            
            full_text = ""
            # 获取历史消息（不包括最新的用户消息）
            history = st.session_state.messages[:-1] if len(st.session_state.messages) > 0 else None
            response = qwen_chat.stream_chat(prompt, history)
            
            for line in response.iter_lines():
                if line:
                    # 解析JSON响应
                    chunk = json.loads(line)
                    if "response" in chunk:
                        text_chunk = chunk["response"]
                        # print(text_chunk)
                        full_text += text_chunk
                        # 更新ui
                        message_placeholder.markdown(full_text + "▌")
                    
                    # 如果生成结束,退出循环    
                    if chunk.get("done", False):
                        break
            
            # 更新最终响应
            message_placeholder.markdown(full_text)
            
            st.session_state.messages.append({"role": "assistant", "content": full_text})

最终运行启动命令： streamlit run main.py

本文作者： reiner
本文链接： https://reiner.host/posts/8d748f6c.html
版权声明： 转载请注明出处，并附上原文链接