Skip to main content
Filter by
Sorted by
Tagged with
1 vote
1 answer
715 views

I want to use deepspeed for training LLMs along with Huggingface Trainer. But when I use deepspeed along with trainer I get error "AttributeError: 'DummyOptim' object has no attribute 'step'&...
0 votes
3 answers
5k views

If I use the following Dockerfile: FROM python:3.11-bullseye ENV APP_HOME /app WORKDIR $APP_HOME COPY requirements.txt /app RUN pip install uv && uv pip install --system --no-cache -r ...
1 vote
1 answer
70 views

When I want to accelerate the model training by using deepspeed, a problem occured when I want to evaluate the model on validation dataset. Here is the problem code snippet: def evaluate(self, ...
0 votes
0 answers
158 views

I'm trying to train a small llm on my local computer which has a single gpu with 16gb vram. I kept encoutering vram oom, so I was looking for a way to reduce vram use. DeepSpeed seemed interesting, so ...
0 votes
0 answers
93 views

I want to log my model's accuracy after each epoch and its final accuracy at the end but I cannot find a simple way of doing this. I am following this tutorial: https://www.youtube.com/watch?v=...
1 vote
0 answers
492 views

I'm training my model with accelerate package which uses deepspeed internally. But I can't understand gradient_accumulation_steps param in its configuration. In my knowledge, ...
3 votes
1 answer
6k views

Currently, I am trying to fine tune the Korean Llama model(13B) on a private dataset through DeepSpeed and Flash Attention 2, TRL SFTTrainer. I am using 2 * A100 80G GPUs for the fine-tuning, however, ...
0 votes
0 answers
2k views

When I try to install the deepspeed library in the conda virtual environment, the following error occurs Collecting deepspeed Using cached deepspeed-0.12.6.tar.gz (1.2 MB) Preparing metadata (...
2 votes
0 answers
863 views

I am using accelerate launch with deepspeed zero stage 2 for multi gpu training and inference and am struggling to free up GPU memory. Basically, my programme has three parts Load first model... -...
2 votes
0 answers
342 views

this is my first time writing on this platform, I apologise if there is any issue with the way the question is being asked. I am trying to run a python file with certain deepspeed configurations such ...
2 votes
1 answer
640 views

I am training dolly2.0. When I do so, I get the following output from the terminal: If I use DeepSpeed to perform this training, I note that the learning rate didn't improve: Why didn't the learning ...
0 votes
1 answer
136 views

I want to come up with a very simple Lightning example using DeepSpeed, but it refused to parallelize layers even when setting to stage 3. I'm just blowing up the model by adding FC layers in the hope ...
1 vote
1 answer
530 views

I am learning the Llama model in a multi-node environment using huggingface/accelerate, and if I run it as follows to profile it, the program will die due to a problem with the ssh connection to ...
0 votes
1 answer
3k views

I have installed a package (llava model from github) as python install -e . In my conda env, I have load llava as: >>python >>import llava I put import in a .py file, when I used "...
1 vote
1 answer
3k views

Is there any way to load a Hugging Face model in multi GPUs and use those GPUs for inferences as well? Like, there is this model which can be loaded on a single GPU (default cuda:0) and run for ...
1 vote
0 answers
849 views

Deepspeed fails to offload operations to the CPU, like I thought it should do when it runs out of GPU memory. I guess I have some setting wrong. When the batch size is increased it gives an error like ...
2 votes
0 answers
226 views

Hi I am trying to train the dolly-v2-12b or any of the dolly model using a custom dataset using A10 gpu. I am coding in pycharm, windows os. The task is similar to a Q&A. I am trying to use this ...
1 vote
1 answer
454 views

I am wondering if Vertex AI Training can be used for distributed training using Huggingface Trainer and deepspeed? All I have seen are examples with the native torch distribution strategy. It would be ...
1 vote
0 answers
530 views

I tried to use deepspeed to conduct tensor parallel on starcoder as I had multiple small GPUs and each of which cannot singly hold the whole model. from transformers import AutoModelForCausalLM, ...
1 vote
0 answers
118 views

The example provided in Memory Requirements - DeepSpeed 0.10.1 documentation is as follows: python -c 'from deepspeed.runtime.zero.stage_1_and_2 import estimate_zero2_model_states_mem_needs_all_cold; \...
1 vote
0 answers
103 views

I am newer to deepspeed, and have some experience in deeplearning. I want to know how to set the max gpu memory to use for each device when using deepspeed?. I have done nothong. I have no thoughts my ...
1 vote
0 answers
69 views

When I try to use DeepSpeed example to finetune a OPT 1.3b model on my local machine, I have an unexpected error, which related to following code snippet: template <typename T> __global__ ...