Skip to main content
Filter by
Sorted by
Tagged with
1 vote
1 answer
70 views

When I want to accelerate the model training by using deepspeed, a problem occured when I want to evaluate the model on validation dataset. Here is the problem code snippet: def evaluate(self, ...
external 's user avatar
0 votes
0 answers
158 views

I'm trying to train a small llm on my local computer which has a single gpu with 16gb vram. I kept encoutering vram oom, so I was looking for a way to reduce vram use. DeepSpeed seemed interesting, so ...
Peppermint Addict's user avatar
0 votes
0 answers
93 views

I want to log my model's accuracy after each epoch and its final accuracy at the end but I cannot find a simple way of doing this. I am following this tutorial: https://www.youtube.com/watch?v=...
user22631788's user avatar
1 vote
0 answers
492 views

I'm training my model with accelerate package which uses deepspeed internally. But I can't understand gradient_accumulation_steps param in its configuration. In my knowledge, ...
Yaoming Xuan's user avatar
1 vote
1 answer
715 views

I want to use deepspeed for training LLMs along with Huggingface Trainer. But when I use deepspeed along with trainer I get error "AttributeError: 'DummyOptim' object has no attribute 'step'&...
Refinath's user avatar
  • 695
2 votes
0 answers
863 views

I am using accelerate launch with deepspeed zero stage 2 for multi gpu training and inference and am struggling to free up GPU memory. Basically, my programme has three parts Load first model... -...
Llmw123's user avatar
  • 21
2 votes
0 answers
342 views

this is my first time writing on this platform, I apologise if there is any issue with the way the question is being asked. I am trying to run a python file with certain deepspeed configurations such ...
Eliza's user avatar
  • 21
0 votes
1 answer
136 views

I want to come up with a very simple Lightning example using DeepSpeed, but it refused to parallelize layers even when setting to stage 3. I'm just blowing up the model by adding FC layers in the hope ...
Romeo Kienzler's user avatar
0 votes
3 answers
5k views

If I use the following Dockerfile: FROM python:3.11-bullseye ENV APP_HOME /app WORKDIR $APP_HOME COPY requirements.txt /app RUN pip install uv && uv pip install --system --no-cache -r ...
BioGeek's user avatar
  • 23k
1 vote
1 answer
530 views

I am learning the Llama model in a multi-node environment using huggingface/accelerate, and if I run it as follows to profile it, the program will die due to a problem with the ssh connection to ...
상현박's user avatar
0 votes
0 answers
2k views

When I try to install the deepspeed library in the conda virtual environment, the following error occurs Collecting deepspeed Using cached deepspeed-0.12.6.tar.gz (1.2 MB) Preparing metadata (...
TTTyz's user avatar
  • 1
0 votes
1 answer
3k views

I have installed a package (llava model from github) as python install -e . In my conda env, I have load llava as: >>python >>import llava I put import in a .py file, when I used "...
Mohbat Tharani's user avatar
3 votes
1 answer
6k views

Currently, I am trying to fine tune the Korean Llama model(13B) on a private dataset through DeepSpeed and Flash Attention 2, TRL SFTTrainer. I am using 2 * A100 80G GPUs for the fine-tuning, however, ...
kopilot100's user avatar
1 vote
0 answers
849 views

Deepspeed fails to offload operations to the CPU, like I thought it should do when it runs out of GPU memory. I guess I have some setting wrong. When the batch size is increased it gives an error like ...
paragon00's user avatar
1 vote
1 answer
3k views

Is there any way to load a Hugging Face model in multi GPUs and use those GPUs for inferences as well? Like, there is this model which can be loaded on a single GPU (default cuda:0) and run for ...
NeuralAI's user avatar
  • 113
1 vote
1 answer
454 views

I am wondering if Vertex AI Training can be used for distributed training using Huggingface Trainer and deepspeed? All I have seen are examples with the native torch distribution strategy. It would be ...
esdy's user avatar
  • 11
1 vote
0 answers
530 views

I tried to use deepspeed to conduct tensor parallel on starcoder as I had multiple small GPUs and each of which cannot singly hold the whole model. from transformers import AutoModelForCausalLM, ...
ddaa's user avatar
  • 49
1 vote
0 answers
118 views

The example provided in Memory Requirements - DeepSpeed 0.10.1 documentation is as follows: python -c 'from deepspeed.runtime.zero.stage_1_and_2 import estimate_zero2_model_states_mem_needs_all_cold; \...
Shawn Yuxuan Tong's user avatar
2 votes
0 answers
226 views

Hi I am trying to train the dolly-v2-12b or any of the dolly model using a custom dataset using A10 gpu. I am coding in pycharm, windows os. The task is similar to a Q&A. I am trying to use this ...
Sneha T S's user avatar
1 vote
0 answers
103 views

I am newer to deepspeed, and have some experience in deeplearning. I want to know how to set the max gpu memory to use for each device when using deepspeed?. I have done nothong. I have no thoughts my ...
hjc's user avatar
  • 15
2 votes
1 answer
640 views

I am training dolly2.0. When I do so, I get the following output from the terminal: If I use DeepSpeed to perform this training, I note that the learning rate didn't improve: Why didn't the learning ...
AndyLinOuO's user avatar
1 vote
0 answers
69 views

When I try to use DeepSpeed example to finetune a OPT 1.3b model on my local machine, I have an unexpected error, which related to following code snippet: template <typename T> __global__ ...
coderLMN's user avatar
  • 3,086