微软开源图像交互神器 Visual ChatGPT：已获 2 万多赞！

程序员泥瓦匠 2023-04-18

点蓝字关注，一起程序员弯道超车之路

出品 | OSC开源社区（ID：oschina2013)

github地址:
https://github.com/microsoft/visual-chatgpt
论文地址：
https://arxiv.org/pdf/2303.04671.pdf

除了大力投资 Open AI ，微软还亲自下场大搞 AI 。8 天前，微软开源了 Visual ChatGPT ，这个软件可以连接 ChatGPT 和一系列视觉模型，以实现在 ChatGPT 的聊天过程中发送和接收图像。

众所周知，尽管 ChatGPT 的功能非常强大，甚至可以用来写小说写论文，但目前也仅限于文字交流。

Visual ChatGPT 的出现，就像在以文字交流的 APP 中首次添加了表情包功能，而且还是根据用户输入的文本自动生成的 “定制化表情包”，大大提升了 ChatGPT 的趣味性和应用领域。

一方面，ChatGPT（或 LLM）充当通用界面，提供对图像的理解和用户的交互功能。另一方面，基础图像模型通过提供特定领域的深入知识来充当背后的技术专家。

仓库中列出了技术架构及原理图：

Demo 中共进行了三种不同类型的对话，分别是 Visual ChatGPT 接收用户的图像、Visual ChatGPT 根据用户的文本修改图像并发送给用户，以及 Visual ChatGPT 识别图片，并回答用户的提问。Visual ChatGPT 会根据用户的输入，判断是否需要使用 VFM （Visual Foundation Model，视觉基础模型）来处理该问题。

仓库中还给出了 Visual ChatGPT 所使用的图像模型和显存使用情况：

更详细的内容可以阅读 Visual ChatGPT 的 arxiv 论文：https://arxiv.org/abs/2303.04671

Visual ChatGPT 在 3 月 10 日发布，截至 3 月 16 日早 15 点，该项目已暂获 21.9K Stars ，可谓是火箭式上涨。

相关链接：https://github.com/microsoft/visual-chatgpt

使用

说明：如果计算机配置高，需要显卡，可以进行尝试，或者通过Google Colab来进行配置

环境安装：

conda create -n visgpt python=3.8 #创建环境conda activate visgpt #激活环境pip install -r requirement.txt #准备环境bash download.sh #下载模型

快速开始

# clone the repo
git clone https://github.com/microsoft/visual-chatgpt.git
# Go to directory
cd visual-chatgpt
# create a new environment
conda create -n visgpt python=3.8
# activate the new environment
conda activate visgpt
#  prepare the basic environments
pip install -r requirements.txt
# prepare your private OpenAI key (for Linux)
export OPENAI_API_KEY={Your_Private_Openai_Key}
# prepare your private OpenAI key (for Windows)
set OPENAI_API_KEY={Your_Private_Openai_Key}

# Start Visual ChatGPT !
# You can specify the GPU/CPU assignment by "--load", the parameter indicates which 
# Visual Foundation Model to use and where it will be loaded to
# The model and device are sperated by underline '_', the different models are seperated by comma ','
# The available Visual Foundation Models can be found in the following table
# For example, if you want to load ImageCaptioning to cpu and Text2Image to cuda:0
# You can use: "ImageCaptioning_cpu,Text2Image_cuda:0"

# Advice for CPU Users
python visual_chatgpt.py --load ImageCaptioning_cpu,Text2Image_cpu
# Advice for 1 Tesla T4 15GB  (Google Colab)                       
python visual_chatgpt.py --load "ImageCaptioning_cuda:0,Text2Image_cuda:0"  
# Advice for 4 Tesla V100 32GB                            
python visual_chatgpt.py --load "ImageCaptioning_cuda:0,ImageEditing_cuda:0,
    Text2Image_cuda:1,Image2Canny_cpu,CannyText2Image_cuda:1,
    Image2Depth_cpu,DepthText2Image_cuda:1,VisualQuestionAnswering_cuda:2,
    InstructPix2Pix_cuda:2,Image2Scribble_cpu,ScribbleText2Image_cuda:2,
    Image2Seg_cpu,SegText2Image_cuda:2,Image2Pose_cpu,PoseText2Image_cuda:2,
    Image2Hed_cpu,HedText2Image_cuda:3,Image2Normal_cpu,
    NormalText2Image_cuda:3,Image2Line_cpu,LineText2Image_cuda:3"

↑ 点击即可关注 ↑

关于我的近况

目前在 SaaS 创业中，如果你想成为技术高管或技术转创业，那必不可少的要懂商业、营销、产品等等。

也可以点击下方去阅读我 SaaS 创业的原创公号分享

又忘关摄像头了！村官和妇女主任激情戏再度上演……

西安8家物业服务企业违法违规被处罚！

“民生系”金融圈隐秘大佬屡次陷内斗，究竟是宿命还是另有隐情？

美国《内战》，击毙曱甴

请关注玉溪大营老清真寺所遭遇的不公正对待！

微软开源图像交互神器 Visual ChatGPT：已获 2 万多赞！

出品 | OSC开源社区（ID：oschina2013)

使用

您可能也对以下帖子感兴趣

又忘关摄像头了！村官和妇女主任激情戏再度上演……

西安8家物业服务企业违法违规被处罚！

“民生系”金融圈隐秘大佬屡次陷内斗，究竟是宿命还是另有隐情？

美国《内战》，击毙曱甴

请关注玉溪大营老清真寺所遭遇的不公正对待！

生成图片，分享到微信朋友圈

微软开源图像交互神器 Visual ChatGPT：已获 2 万多赞！

出品 | OSC开源社区（ID：oschina2013)

使用

您可能也对以下帖子感兴趣