3,394 questions
0
votes
0
answers
56
views
How can I run Flux2 inference on 2 GPUs?
I try to run Flux2 inference on 2 GPUs as follows:
import torch
from diffusers import Flux2Pipeline
from accelerate import PartialState
import argparse
from pathlib import Path
def main():
parser ...
0
votes
0
answers
26
views
Attribution Error when using Huggingface transformers Trainer with FSDP
I am now trying to use FSDP in Huggingface transformers Trainer. The training script is something like
train_dataset = Mydataset(...)
args = TrainingArguments(...)
model = LlamaForCausalLM....
0
votes
0
answers
42
views
Hugging Face /hf-inference/v1/chat/completions returns 422 in Eclipse plugin [closed]
I'm building an Eclipse plugin for code completion using the Hugging Face API. My plugin sends a prompt to the endpoint:
https://router.huggingface.co/hf-inference/v1/chat/completions
I replaced the ...
2
votes
1
answer
63
views
Transformers LlamaForCasualLM class: base_model Attribute Mystery
Question:
I'm experiencing a question with the transformers library, specifically with the pipeline initialization. When I access the base_model attribute of a LlamaForCausalLM model, it seems to ...
0
votes
0
answers
78
views
IndexError: index -1 is out of bounds for dimension 0 with size 0
I am currently experimenting with modifying the KV cache of the LLaVA model in order to perform controlled interventions during generation (similar to cache-steering methods in recent research). The ...
1
vote
0
answers
163
views
Transformers 'could not import module pipeline' to jupyter notebook
I need to to run a series of pre-trained fine-tuned models from Hugging Face to Jupyter notebook. I have updated to the latest version of both PyTorch and Transformers, but when I run the code
from ...
1
vote
1
answer
79
views
Xcode Can't Find swift-transformers Package
I'm trying to implement Speech-to-Text transcription in my Swift app using Hugging Face's swift-transformers package to run Whisper models locally.
I've added the package to my Xcode project, but when ...
0
votes
1
answer
74
views
Generating response with KV Cached System Prompt throws error when Input Tokens are less than Prompt Tokens
I am trying to run Mistral-7B-Instruct-v0.2.
Each run is PROMPT + details[i].
PROMPT has instructions on how to generate JSON based on details.
As the prefix part of each input is same; kind of like a ...
0
votes
0
answers
103
views
Transformers with Python 3.12.3 produce lots of errors
I got Python 3.12.3 on an Ubuntu server. I tried to install transformers, tokenizers, datasets and accelerate to use the Seq2SeqTrainer in the transformers.
I used a virtual environment for the ...
0
votes
0
answers
36
views
T5-small generates only padding tokens during validation/test in PyTorch Lightning
I'm fine-tuning T5-small using PyTorch Lightning and encountering a strange issue during validation and test steps.
The Problem:
During validation_step and test_step, model.generate() consistently ...
3
votes
0
answers
112
views
How does one log the operations done on a GPU during the execution of Python code?
I have encountered a particular problem while executing a function from the transformers library of huggingface on an Intel GPU wheel of torch. Since I am doing something I normally shouldn't be ...
1
vote
0
answers
68
views
How to pass P_map: dict[str, torch.Tensor] to PEFT (LoRA)?
My proxy goal is to change LoRA from h = (W +BA)x to h = (W + BAP)x. Preliminary code attached for your reference
My actual goal is to train a model with the following loss: 〖Θ ̃=(arg min)┬Δ ̂ 〗〖‖𝑓_(...
1
vote
2
answers
181
views
How to fix “Expected all tensors to be on the same device” when running inference with Qwen3-VL-4B-Instruct?
I am trying to run the code example for run some inference on the model Qwen/Qwen3-VL-4B-Instruct model:
from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
# default: Load the ...
-1
votes
2
answers
99
views
LangChain HuggingFace ChatHuggingFace raises StopIteration with any model
I’m trying to use LangChain’s Hugging Face integration to chat with the model TinyLlama/TinyLlama-1.1B-Chat-v1.0 for the very first time, but I’m getting a StopIteration error when calling .invoke().
...
9
votes
2
answers
2k
views
RemoteEntryNotFoundError with downloading models from Hugging Face in Kaggle
Recently i have started to get some strange errors, for example RemoteEntryNotFoundError: 404 Client Error. (Request ID: Root=1-68e82630-293b962044bc3e6c1453ec73;43987a97-e033-4590-951e-829a3c87d2cb) ...
3
votes
2
answers
194
views
Multimodal embedding requires video first, then image - why?
I am working with OmniEmbed model (https://huggingface.co/Tevatron/OmniEmbed-v0.1), which is built on Qwen2.5 7B. My goal is to get a multimodal embedding for images and videos. I have the following ...
0
votes
0
answers
91
views
How to solve device mismatch issue when using offloading with QwenImageEditPlus pipeline and GGUF weights
After failing to make the QwenImageEditPlus run (https://huggingface.co/spaces/discord-community/README/discussions/9#68d260e32053323e6bfab30c), I tried a different approach (thanks to all the example ...
0
votes
0
answers
103
views
pippy examples: torch._dynamo.exc.UserError: It looks like one of the outputs with type <class transformers.cache_utils.DynamicCache> is not supported
when the program starts to initialize pipeline object, a unexpected error was thrown:
[rank0]: Traceback (most recent call last):
[rank0]: File "/root/anaconda3/envs/polar/lib/python3.12/site-...
2
votes
1
answer
153
views
RuntimeError: Expected all tensors to be on the same device when using local HuggingFace model in LangChain Agent
I'm building a simple agent using LangChain that leverages a locally-hosted HuggingFace model (gpt-oss-20b). I'm using the transformers pipeline and wrapping it in LangChain's HuggingFacePipeline.
The ...
3
votes
0
answers
59
views
Azure ML Endpoint Fails with HFValidationError even after using pathlib.Path
I am trying to deploy a fine-tuned Mistral-7B model on an Azure ML Online Endpoint. The deployment repeatedly fails during the init() phase of the scoring script with an huggingface_hub.errors....
0
votes
1
answer
87
views
PermissionError: [Errno 13] Permission denied: 'Qwen3-0.6B-SFT'
I am getting the following error when running training, using the TRL library in the following HuggingFace space: vishaljoshi24/trl-4-dnd.
My SDK is Docker and as far as I'm aware there are not ...
-1
votes
1
answer
597
views
ModuleNotFoundError for transformers.pipeline after installing PyTorch for CUDA
I'm a bit stumped on an issue that just popped up. My code, which uses the transformers library, was running perfectly fine until I tried to install a CUDA-compatible version of PyTorch.
Everything ...
1
vote
1
answer
92
views
KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'
raise KeyError(f"Cache only has {len(self)} layers, attempted to access layer with index {layer_idx}")
KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'
When I try ...
0
votes
0
answers
226
views
Cannot import `QwenForCausalLM` after installing `v4.51.3-Qwen2.5-Omni-preview` tag; pip installs 4.52.0.dev0 instead
Description:
I am trying to install the Hugging Face Transformers version that supports the Qwen2.5-Omni model. According to the official docs, the correct tag to install is v4.51.3-Qwen2.5-Omni-...
1
vote
0
answers
65
views
ValueError when resuming LoRA fine-tuning with sentence-transformers CrossEncoderTrainer: "Unrecognized model" error
I'm fine-tuning a CrossEncoder model with LoRA using sentence-transformers library on Kaggle (12-hour limit). I need to resume training from a checkpoint, but I'm getting a ValueError when trying to ...
0
votes
0
answers
61
views
How do I compute validation loss for a fine-tuned Qwen model in Hugging Face Transformers during evaluation?
I trained a Qwen model on my own dataset. Now I need to evaluate my trained model using the loss function, but I don’t know how to do it. I saw examples for other metrics such as accuracy and ...
0
votes
0
answers
272
views
ModuleNotFoundError: 'triton.ops' when loading 4-bit quantized model with bitsandbytes on Kaggle
I have this code:
import os
import torch
from datasets import Dataset
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
TrainingArguments,
)
from peft ...
1
vote
2
answers
399
views
How can I match the token count used by BGE-M3 embedding model before embedding?
For my particular project, it would be very helpful to know how many tokens the BGE-M3 embedding model would break a string down into before I embed the text. I could embed the string and count the ...
2
votes
0
answers
90
views
How to Run an Open-Source 20B Model locally? [closed]
I have gpt oss 20b model's weights locally.
What are the necessary steps to run a 20B model using transformers.
in files that I downloaded is multi safetensor files. and also a .bin file.
which one of ...
2
votes
1
answer
109
views
How to stop hugging face pipeline operation
I need to stop hugging face pipeline operation. I tried to achieve this using a method from the following question, but it didn't work. I set the breakpoint on the line return flag and expected ...
0
votes
0
answers
164
views
optuna, huggingface-transformers: RuntimeError, "Tensor.item() cannot be called on meta tensors" when n_jobs > 1
I'm trying to use optuna to find good hyperparameters for a fine-tuning task I'm doing with some different language models. My actual code is more complex, but here's a MWE:
import torch
import optuna
...
0
votes
0
answers
49
views
The data type of the llava model uncontrollably changes to float32
I am using the llama-8b-llava model. I have made some modifications to the model, which are non-structural and do not introduce any parameters. During the model loading process, I used the torch....
2
votes
1
answer
1k
views
mutex.cc : 452 RAW: Lock blocking in HuggingFace/sententce-transformers [closed]
I'm in python 3.11.13 with these versions:
huggingface-hub 0.31.4
transformers 4.52.4
sentence-transformers 5.1.0
And this OS (Mac):
Darwin G9XFDK7K6J 24....
1
vote
2
answers
172
views
How to interpret cosine similarity using EmbeddingSimilarityEvaluator
I am reading about Text embeddings in LLM from the book Hands-On Large Language Models. It is mentioned that as follows:
from sentence_transformers.evaluation import EmbeddingSimilarityEvaluator
from ...
1
vote
0
answers
807
views
KeyError when loading GPT-OSS-20B locally with transformers on CPU
I’m trying to load gpt-oss-20b locally using Hugging Face transformers with CPU only. Minimal code:
from transformers import pipeline
model_path = "/mnt/d/Projects/models/gpt-oss-20b"
pipe = ...
0
votes
0
answers
92
views
RuntimeError: Failed to import transformers.training_args due to missing module 'triton.ops' when using bitsandbytes with PEFT and TRL
I'm trying to perform LoRA fine-tuning using the transformers, trl, and peft libraries in a Google Colab environment with a T4 GPU. My goal is to load the model in 8-bit using bitsandbytes.
I ...
2
votes
3
answers
172
views
FastAPI endpoint stream LLM output word for word
I have a FastAPI endpoint (/generateStreamer) that generates responses from an LLM model. I want to stream the output so users can see the text as it’s being generated, rather than waiting for the ...
1
vote
0
answers
53
views
BLIP Fine-Tuning: Special Token Always Biased to One Class in Generated Caption
I'm trying to fine-tune Hugging Face BLIP (Bootstrapped Language-Image Pretraining) to classify pizza boxes as either recyclable (clean) or non-recyclable (contaminated) by generating captions that ...
3
votes
1
answer
233
views
Why does the ProtBERT model generate identical embeddings for all non-whitespace-separated (single token?) inputs?
Why do non-identical inputs to ProtBERT generate identical embeddings when non-whitespace-separated?
I've looked at answers here etc. but they appear to be different cases where the slicing of the out....
0
votes
0
answers
47
views
TypeError: 'NoneType' object is not iterable when using ChatHuggingFace with TinyLlama/TinyLlama-1.1B-Chat-v1.0 in LangChain
I'm trying to use the TinyLlama/TinyLlama-1.1B-Chat-v1.0 model from Hugging Face with LangChain using the langchain_huggingface integration. My goal is to get a simple response from the model using ...
0
votes
0
answers
70
views
SPECTER2 similarity performs poorly
I'm trying to compute a measure of semantic similarity between titles of scientific publications using SPECTER2, but the model performs poorly.
Here is my code:
from transformers import AutoTokenizer
...
0
votes
1
answer
318
views
Hugging Face sentence-transformers model not loading
I'm trying to load in a huggingface sentence transformers model like this:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2") ##I've also ...
0
votes
0
answers
40
views
How can I add more embeddings to T5?
I’m using Hugging Face’s T5ForConditionalGeneration and want to add a per‑token NE‑type embedding alongside the standard token embeddings.
tok_embeds = model.encoder.embed_tokens(input_ids)
ne_embeds ...
0
votes
1
answer
241
views
RuntimeError: CUDA error: named symbol not found when using TorchAoConfig with Qwen2.5-VL-7B-Instruct model
I'm trying to load the Qwen2.5-VL-7B-Instruct model from hugging face with 4-bit weight-only quantization using TorchAoConfig (similar to how its mentioned in the documentation here), but I'm getting ...
1
vote
1
answer
27
views
ModuleNotFoundError: module not found by named 'bert_opinion' through hf_hub_download
I'm trying to import modules from bert_opinion.py and post.py after downloading them from the Hugging Face Hub using hf_hub_download, as described for my chosen model on the Hugging Face website. Here'...
0
votes
0
answers
69
views
Adding EarlyStopping() to transformers Trainer() error
I'm using this code for fine-tuning a LoRA model:
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
...
0
votes
0
answers
70
views
How can I get pooled projected output from clip from transformers library where I dont have token embeddings?
I want to use text_embeddings and combine them with output of an intermediate layer of the text_encoder of the clip. My input to the text_encoder is a learnable prompt embeddings which is intialized ...
1
vote
1
answer
67
views
How come tokenization and generation of model behaves differently accross different versions of transformers
I downloaded an old custom model based on Llava that runs on transformer 4.31.0 and I tried to use it together with a Qwen model which uses transformer 4.53.1. After updating transformers the Llava ...
0
votes
0
answers
169
views
How to transcribe local audio File/Blob with Transformers.js pipeline? (JSON.parse error)
I'm working on a browser-based audio transcription app using Transformers.js by Xenova. I'm trying to transcribe a .wav file selected by the user using the following code:
import { pipeline } from '@...
1
vote
1
answer
101
views
Trained Huggingface EncoderDecoderModel.generate() produces only bos-tokens
I am working on a Huggingface transformers EncoderDecoderModel consisting of a frozen BERT-Encoder (answerdotai-ModernBERT-base) and a trainable GPT2-Decoder. Due to the different architectures for ...