Instructions to use microsoft/Florence-2-large-ft with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/Florence-2-large-ft with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="microsoft/Florence-2-large-ft", trust_remote_code=True)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("microsoft/Florence-2-large-ft", trust_remote_code=True) model = AutoModelForImageTextToText.from_pretrained("microsoft/Florence-2-large-ft", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use microsoft/Florence-2-large-ft with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "microsoft/Florence-2-large-ft" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Florence-2-large-ft", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/microsoft/Florence-2-large-ft
- SGLang
How to use microsoft/Florence-2-large-ft with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "microsoft/Florence-2-large-ft" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Florence-2-large-ft", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "microsoft/Florence-2-large-ft" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Florence-2-large-ft", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use microsoft/Florence-2-large-ft with Docker Model Runner:
docker model run hf.co/microsoft/Florence-2-large-ft
Failed to run on MacBook: requiring flash_attn
Seems like the same problem as: https://huggingface.co/microsoft/phi-1_5/discussions/72
I'm trying the provided workaround with modeling_florence2.py (still downloading the model it didn't crash so far):
import os
from unittest.mock import patch
import requests
from PIL import Image
from transformers import AutoModelForCausalLM, AutoProcessor
from transformers.dynamic_module_utils import get_imports
def fixed_get_imports(filename: str | os.PathLike) -> list[str]:
"""Work around for https://huggingface.co/microsoft/phi-1_5/discussions/72."""
if not str(filename).endswith("/modeling_florence2.py"):
return get_imports(filename)
imports = get_imports(filename)
imports.remove("flash_attn")
return imports
with patch("transformers.dynamic_module_utils.get_imports", fixed_get_imports):
model = AutoModelForCausalLM.from_pretrained("microsoft/Florence-2-large-ft", trust_remote_code=True)
processor = AutoProcessor.from_pretrained("microsoft/Florence-2-large-ft", trust_remote_code=True)
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
image = Image.open(requests.get(url, stream=True).raw)
def run_example(prompt):
inputs = processor(text=prompt, images=image, return_tensors="pt")
generated_ids = model.generate(
input_ids=inputs["input_ids"],
pixel_values=inputs["pixel_values"],
max_new_tokens=1024,
num_beams=3,
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
parsed_answer = processor.post_process_generation(generated_text, task=prompt, image_size=(image.width, image.height))
print(parsed_answer)
prompt = "<MORE_DETAILED_CAPTION>"
run_example(prompt)
Edit: With this workaround, it works on my MacBook!
I was trying to run on Windows and wondering why the patch wasn't working, until I realized the forward slash wouldn't match a windows file path.
For a more general solution that should work on any platform, replace this line:
if not str(filename).endswith("/modeling_florence2.py"):
with
if os.path.basename(filename) != "modeling_florence2.py":
Thank you for this patch, it works well on Mac.
I've duplicated the Zero-GPU space for the large model, changed to the base-ft model, applied your patch and it works both locally on Mac as well as on a CPU based space
https://huggingface.co/spaces/Norod78/Florence-2-base-ft
Cheers!
Hi, I tried using your patch but I keep running into this error
TypeError: Object of type Florence2LanguageConfig is not JSON serializable
on line
model = AutoModelForCausalLM.from_pretrained("microsoft/Florence-2-large-ft", trust_remote_code=True).
Is there something I'm missing?
I'm also on mac and I have never seen this error before. Try making sure your transformers is recent, your numpy is on 1.x and etc
Here is my pip dump from the environment this worked on, Python 3.10.13 (Miniconda) this env is for general purpose and has lots of stuff in it, but perhaps you could find a version difference in one of the major frameworks
https://pastebin.com/5rgmUtwL
Note that this trick also works when choosing "MPS" (Apple Silicon) as your torch backend.
https://huggingface.co/spaces/Norod78/Florence-2-base-ft/blob/main/app.py