Newest 'huggingface-transformers' Questions

0 votes

0 answers

56 views

How can I run Flux2 inference on 2 GPUs?

I try to run Flux2 inference on 2 GPUs as follows: import torch from diffusers import Flux2Pipeline from accelerate import PartialState import argparse from pathlib import Path def main(): parser ...

Franck Dernoncourt

84.7k

asked yesterday

0 votes

0 answers

26 views

Attribution Error when using Huggingface transformers Trainer with FSDP

I am now trying to use FSDP in Huggingface transformers Trainer. The training script is something like train_dataset = Mydataset(...) args = TrainingArguments(...) model = LlamaForCausalLM....

MR_Xhao

11

asked Nov 28 at 4:11

0 votes

0 answers

42 views

Hugging Face /hf-inference/v1/chat/completions returns 422 in Eclipse plugin [closed]

I'm building an Eclipse plugin for code completion using the Hugging Face API. My plugin sends a prompt to the endpoint: https://router.huggingface.co/hf-inference/v1/chat/completions I replaced the ...

kiruba T

1

asked Nov 27 at 7:26

2 votes

1 answer

63 views

Transformers LlamaForCasualLM class: base_model Attribute Mystery

Question: I'm experiencing a question with the transformers library, specifically with the pipeline initialization. When I access the base_model attribute of a LlamaForCausalLM model, it seems to ...

Hank Wang

21

asked Nov 16 at 5:34

0 votes

0 answers

78 views

IndexError: index -1 is out of bounds for dimension 0 with size 0

I am currently experimenting with modifying the KV cache of the LLaVA model in order to perform controlled interventions during generation (similar to cache-steering methods in recent research). The ...

Pulkit Mittal

25

asked Nov 7 at 7:41

1 vote

0 answers

163 views

Transformers 'could not import module pipeline' to jupyter notebook

I need to to run a series of pre-trained fine-tuned models from Hugging Face to Jupyter notebook. I have updated to the latest version of both PyTorch and Transformers, but when I run the code from ...

Alex Colville

11

asked Nov 4 at 9:16

1 vote

1 answer

79 views

Xcode Can't Find swift-transformers Package

I'm trying to implement Speech-to-Text transcription in my Swift app using Hugging Face's swift-transformers package to run Whisper models locally. I've added the package to my Xcode project, but when ...

Zaid

449

asked Nov 2 at 15:07

0 votes

1 answer

74 views

Generating response with KV Cached System Prompt throws error when Input Tokens are less than Prompt Tokens

I am trying to run Mistral-7B-Instruct-v0.2. Each run is PROMPT + details[i]. PROMPT has instructions on how to generate JSON based on details. As the prefix part of each input is same; kind of like a ...

acdhemtos

1

asked Oct 28 at 22:54

0 votes

0 answers

103 views

Transformers with Python 3.12.3 produce lots of errors

I got Python 3.12.3 on an Ubuntu server. I tried to install transformers, tokenizers, datasets and accelerate to use the Seq2SeqTrainer in the transformers. I used a virtual environment for the ...

Raptor

54.4k

asked Oct 28 at 4:35

0 votes

0 answers

36 views

T5-small generates only padding tokens during validation/test in PyTorch Lightning

I'm fine-tuning T5-small using PyTorch Lightning and encountering a strange issue during validation and test steps. The Problem: During validation_step and test_step, model.generate() consistently ...

GeraniumCat

21

asked Oct 20 at 20:11

3 votes

0 answers

112 views

How does one log the operations done on a GPU during the execution of Python code?

I have encountered a particular problem while executing a function from the transformers library of huggingface on an Intel GPU wheel of torch. Since I am doing something I normally shouldn't be ...

Logarithmnepnep

31

asked Oct 17 at 11:19

1 vote

0 answers

68 views

How to pass P_map: dict[str, torch.Tensor] to PEFT (LoRA)?

My proxy goal is to change LoRA from h = (W +BA)x to h = (W + BAP)x. Preliminary code attached for your reference My actual goal is to train a model with the following loss: 〖Θ ̃=(arg min)┬Δ ̂ 〗⁡〖‖𝑓_(...

Jason Rich Darmawan

2,193

asked Oct 15 at 5:25

1 vote

2 answers

181 views

How to fix “Expected all tensors to be on the same device” when running inference with Qwen3-VL-4B-Instruct?

I am trying to run the code example for run some inference on the model Qwen/Qwen3-VL-4B-Instruct model: from transformers import Qwen3VLForConditionalGeneration, AutoProcessor # default: Load the ...

Franck Dernoncourt

84.7k

asked Oct 15 at 1:28

-1 votes

2 answers

99 views

LangChain HuggingFace ChatHuggingFace raises StopIteration with any model

I’m trying to use LangChain’s Hugging Face integration to chat with the model TinyLlama/TinyLlama-1.1B-Chat-v1.0 for the very first time, but I’m getting a StopIteration error when calling .invoke(). ...

forstudy

73

asked Oct 10 at 15:36

9 votes

2 answers

2k views

RemoteEntryNotFoundError with downloading models from Hugging Face in Kaggle

Recently i have started to get some strange errors, for example RemoteEntryNotFoundError: 404 Client Error. (Request ID: Root=1-68e82630-293b962044bc3e6c1453ec73;43987a97-e033-4590-951e-829a3c87d2cb) ...

Алиса Алексеевна

41

asked Oct 9 at 21:39

3 votes

2 answers

194 views

Multimodal embedding requires video first, then image - why?

I am working with OmniEmbed model (https://huggingface.co/Tevatron/OmniEmbed-v0.1), which is built on Qwen2.5 7B. My goal is to get a multimodal embedding for images and videos. I have the following ...

n_arch

76

asked Oct 2 at 15:07

0 votes

0 answers

91 views

How to solve device mismatch issue when using offloading with QwenImageEditPlus pipeline and GGUF weights

After failing to make the QwenImageEditPlus run (https://huggingface.co/spaces/discord-community/README/discussions/9#68d260e32053323e6bfab30c), I tried a different approach (thanks to all the example ...

Siladittya

1,215

asked Sep 24 at 7:36

0 votes

0 answers

103 views

pippy examples: torch._dynamo.exc.UserError: It looks like one of the outputs with type <class transformers.cache_utils.DynamicCache> is not supported

when the program starts to initialize pipeline object, a unexpected error was thrown: [rank0]: Traceback (most recent call last): [rank0]: File "/root/anaconda3/envs/polar/lib/python3.12/site-...

Aerith

1

asked Sep 24 at 0:05

2 votes

1 answer

153 views

RuntimeError: Expected all tensors to be on the same device when using local HuggingFace model in LangChain Agent

I'm building a simple agent using LangChain that leverages a locally-hosted HuggingFace model (gpt-oss-20b). I'm using the transformers pipeline and wrapping it in LangChain's HuggingFacePipeline. The ...

meysam

194

asked Sep 12 at 17:49

3 votes

0 answers

59 views

Azure ML Endpoint Fails with HFValidationError even after using pathlib.Path

I am trying to deploy a fine-tuned Mistral-7B model on an Azure ML Online Endpoint. The deployment repeatedly fails during the init() phase of the scoring script with an huggingface_hub.errors....

User

157

asked Sep 12 at 5:05

0 votes

1 answer

87 views

PermissionError: [Errno 13] Permission denied: 'Qwen3-0.6B-SFT'

I am getting the following error when running training, using the TRL library in the following HuggingFace space: vishaljoshi24/trl-4-dnd. My SDK is Docker and as far as I'm aware there are not ...

Vishal Joshi

1

asked Sep 10 at 15:48

-1 votes

1 answer

597 views

ModuleNotFoundError for transformers.pipeline after installing PyTorch for CUDA

I'm a bit stumped on an issue that just popped up. My code, which uses the transformers library, was running perfectly fine until I tried to install a CUDA-compatible version of PyTorch. Everything ...

meysam

194

asked Sep 8 at 12:47

1 vote

1 answer

92 views

KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'

raise KeyError(f"Cache only has {len(self)} layers, attempted to access layer with index {layer_idx}") KeyError: 'Cache only has 0 layers, attempted to access layer with index 0' When I try ...

OctSky

11

asked Sep 3 at 21:45

0 votes

0 answers

226 views

Cannot import `QwenForCausalLM` after installing `v4.51.3-Qwen2.5-Omni-preview` tag; pip installs 4.52.0.dev0 instead

Description: I am trying to install the Hugging Face Transformers version that supports the Qwen2.5-Omni model. According to the official docs, the correct tag to install is v4.51.3-Qwen2.5-Omni-...

Promit Dey Sarker Arjan

1

asked Sep 3 at 10:17

1 vote

0 answers

65 views

ValueError when resuming LoRA fine-tuning with sentence-transformers CrossEncoderTrainer: "Unrecognized model" error

I'm fine-tuning a CrossEncoder model with LoRA using sentence-transformers library on Kaggle (12-hour limit). I need to resume training from a checkpoint, but I'm getting a ValueError when trying to ...

Tuan Anh Pham

11

asked Sep 3 at 10:10

0 votes

0 answers

61 views

How do I compute validation loss for a fine-tuned Qwen model in Hugging Face Transformers during evaluation?

I trained a Qwen model on my own dataset. Now I need to evaluate my trained model using the loss function, but I don’t know how to do it. I saw examples for other metrics such as accuracy and ...

Kathi Meyer

1

asked Sep 3 at 8:05

0 votes

0 answers

272 views

ModuleNotFoundError: 'triton.ops' when loading 4-bit quantized model with bitsandbytes on Kaggle

I have this code: import os import torch from datasets import Dataset from transformers import ( AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, ) from peft ...

Santhosh

1

asked Sep 3 at 7:48

1 vote

2 answers

399 views

How can I match the token count used by BGE-M3 embedding model before embedding?

For my particular project, it would be very helpful to know how many tokens the BGE-M3 embedding model would break a string down into before I embed the text. I could embed the string and count the ...

ManBearPigeon

13

asked Sep 2 at 18:38

2 votes

0 answers

90 views

How to Run an Open-Source 20B Model locally? [closed]

I have gpt oss 20b model's weights locally. What are the necessary steps to run a 20B model using transformers. in files that I downloaded is multi safetensor files. and also a .bin file. which one of ...

miky

21

asked Sep 2 at 12:13

2 votes

1 answer

109 views

How to stop hugging face pipeline operation

I need to stop hugging face pipeline operation. I tried to achieve this using a method from the following question, but it didn't work. I set the breakpoint on the line return flag and expected ...

Intolighter

422

asked Aug 31 at 8:36

0 votes

0 answers

164 views

optuna, huggingface-transformers: RuntimeError, "Tensor.item() cannot be called on meta tensors" when n_jobs > 1

I'm trying to use optuna to find good hyperparameters for a fine-tuning task I'm doing with some different language models. My actual code is more complex, but here's a MWE: import torch import optuna ...

Jigsaw

449

asked Aug 30 at 5:49

0 votes

0 answers

49 views

The data type of the llava model uncontrollably changes to float32

I am using the llama-8b-llava model. I have made some modifications to the model, which are non-structural and do not introduce any parameters. During the model loading process, I used the torch....

ILOT

23

asked Aug 29 at 13:26

2 votes

1 answer

1k views

mutex.cc : 452 RAW: Lock blocking in HuggingFace/sententce-transformers [closed]

I'm in python 3.11.13 with these versions: huggingface-hub 0.31.4 transformers 4.52.4 sentence-transformers 5.1.0 And this OS (Mac): Darwin G9XFDK7K6J 24....

Clovis

4,483

asked Aug 19 at 0:38

1 vote

2 answers

172 views

How to interpret cosine similarity using EmbeddingSimilarityEvaluator

I am reading about Text embeddings in LLM from the book Hands-On Large Language Models. It is mentioned that as follows: from sentence_transformers.evaluation import EmbeddingSimilarityEvaluator from ...

venkysmarty

11.5k

asked Aug 15 at 11:49

1 vote

0 answers

807 views

KeyError when loading GPT-OSS-20B locally with transformers on CPU

I’m trying to load gpt-oss-20b locally using Hugging Face transformers with CPU only. Minimal code: from transformers import pipeline model_path = "/mnt/d/Projects/models/gpt-oss-20b" pipe = ...

mindlesscoding

1

asked Aug 14 at 20:00

0 votes

0 answers

92 views

RuntimeError: Failed to import transformers.training_args due to missing module 'triton.ops' when using bitsandbytes with PEFT and TRL

I'm trying to perform LoRA fine-tuning using the transformers, trl, and peft libraries in a Google Colab environment with a T4 GPU. My goal is to load the model in 8-bit using bitsandbytes. I ...

ays

11

asked Aug 6 at 13:21

2 votes

3 answers

172 views

FastAPI endpoint stream LLM output word for word

I have a FastAPI endpoint (/generateStreamer) that generates responses from an LLM model. I want to stream the output so users can see the text as it’s being generated, rather than waiting for the ...

sander

1,490

asked Aug 6 at 8:05

1 vote

0 answers

53 views

BLIP Fine-Tuning: Special Token Always Biased to One Class in Generated Caption

I'm trying to fine-tune Hugging Face BLIP (Bootstrapped Language-Image Pretraining) to classify pizza boxes as either recyclable (clean) or non-recyclable (contaminated) by generating captions that ...

Wow Wow

11

asked Aug 4 at 20:47

3 votes

1 answer

233 views

Why does the ProtBERT model generate identical embeddings for all non-whitespace-separated (single token?) inputs?

Why do non-identical inputs to ProtBERT generate identical embeddings when non-whitespace-separated? I've looked at answers here etc. but they appear to be different cases where the slicing of the out....

Maximilian Press

397

asked Jul 31 at 17:17

0 votes

0 answers

47 views

TypeError: 'NoneType' object is not iterable when using ChatHuggingFace with TinyLlama/TinyLlama-1.1B-Chat-v1.0 in LangChain

I'm trying to use the TinyLlama/TinyLlama-1.1B-Chat-v1.0 model from Hugging Face with LangChain using the langchain_huggingface integration. My goal is to get a simple response from the model using ...

Simran Dalvi

1

asked Jul 28 at 20:42

0 votes

0 answers

70 views

SPECTER2 similarity performs poorly

I'm trying to compute a measure of semantic similarity between titles of scientific publications using SPECTER2, but the model performs poorly. Here is my code: from transformers import AutoTokenizer ...

robertspierre

5,383

asked Jul 25 at 9:38

0 votes

1 answer

318 views

Hugging Face sentence-transformers model not loading

I'm trying to load in a huggingface sentence transformers model like this: from sentence_transformers import SentenceTransformer model = SentenceTransformer("all-MiniLM-L6-v2") ##I've also ...

simulacrum

1

asked Jul 24 at 22:37

0 votes

0 answers

40 views

How can I add more embeddings to T5?

I’m using Hugging Face’s T5ForConditionalGeneration and want to add a per‑token NE‑type embedding alongside the standard token embeddings. tok_embeds = model.encoder.embed_tokens(input_ids) ne_embeds ...

Analu Ramos

9

asked Jul 23 at 0:24

0 votes

1 answer

241 views

RuntimeError: CUDA error: named symbol not found when using TorchAoConfig with Qwen2.5-VL-7B-Instruct model

I'm trying to load the Qwen2.5-VL-7B-Instruct model from hugging face with 4-bit weight-only quantization using TorchAoConfig (similar to how its mentioned in the documentation here), but I'm getting ...

Sankalp Dhupar

73

asked Jul 21 at 23:41

1 vote

1 answer

27 views

ModuleNotFoundError: module not found by named 'bert_opinion' through hf_hub_download

I'm trying to import modules from bert_opinion.py and post.py after downloading them from the Hugging Face Hub using hf_hub_download, as described for my chosen model on the Hugging Face website. Here'...

Muhammad Abdullah

476

asked Jul 20 at 23:03

0 votes

0 answers

69 views

Adding EarlyStopping() to transformers Trainer() error

I'm using this code for fine-tuning a LoRA model: bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, ...

Keithx

3,158

asked Jul 20 at 18:20

0 votes

0 answers

70 views

How can I get pooled projected output from clip from transformers library where I dont have token embeddings?

I want to use text_embeddings and combine them with output of an intermediate layer of the text_encoder of the clip. My input to the text_encoder is a learnable prompt embeddings which is intialized ...

user22617340

asked Jul 16 at 12:36

1 vote

1 answer

67 views

How come tokenization and generation of model behaves differently accross different versions of transformers

I downloaded an old custom model based on Llava that runs on transformer 4.31.0 and I tried to use it together with a Qwen model which uses transformer 4.53.1. After updating transformers the Llava ...

Raymond Li

57

asked Jul 11 at 11:03

0 votes

0 answers

169 views

How to transcribe local audio File/Blob with Transformers.js pipeline? (JSON.parse error)

I'm working on a browser-based audio transcription app using Transformers.js by Xenova. I'm trying to transcribe a .wav file selected by the user using the following code: import { pipeline } from '@...

piyush

1

asked Jul 10 at 8:44

1 vote

1 answer

101 views

Trained Huggingface EncoderDecoderModel.generate() produces only bos-tokens

I am working on a Huggingface transformers EncoderDecoderModel consisting of a frozen BERT-Encoder (answerdotai-ModernBERT-base) and a trainable GPT2-Decoder. Due to the different architectures for ...

soosmann

119

asked Jul 9 at 10:47

Collectives™ on Stack Overflow