当前位置: 首页 > news >正文

ChatGLM2-6B微调记录【1】

  • 参考github:https://github.com/liucongg/ChatGLM-Finetuning
  • 服务器配置:8块A100 80G
  • 从huggingface上下载chatglm2-6模型,此处提供一个好用的镜像网址:https://hf-mirror.com/THUDM/chatglm2-6b
  • 先下载依赖,然后运行命令行
CUDA_VISIBLE_DEVICES=4,5,6,7 deepspeed --master_port 21400 train.py --train_path data/spo_0.json --model_name_or_path chatglm2-6b/ --per_device_train_batch_size 1 --max_len 1560 --max_src_len 1024 --learning_rate 1e-4 --weight_decay 0.1 --num_train_epochs 2 --gradient_accumulation_steps 4 --warmup_ratio 0.1 --mode glm2 --lora_dim 16 --lora_alpha 64  --lora_dropout 0.1 --lora_module_name "query_key_value,dense_h_to_4h,dense_4h_to_h,dense" --seed 1234 --ds_file ds_zero2_no_offload.json --gradient_checkpointing --show_loss_step 10 --output_dir ./output-glm2
  • 问题及解决
    • 问题1:端口冲突,是因为在非特权端口(<1024)上绑定服务器套接字而没有管理员权限
    • 解决1:将原指定端口520换成一个21400解决
    • 问题2:ImportError: cannot import name ‘log’ from ‘torch.distributed.elastic.agent.server.api’
    • 解决2:运行pip install deepspeed --upgrade
    • 问题3:报错AttributeError: ‘ChatGLMTokenizer’ object has no attribute ‘tokenizer’
    • 解决3:降低 transformers 版本就可以跑起来了
      pip uninstall transformers pip install -i https://pypi.tuna.tsinghua.edu.cn/simple transformers==4.33.2
      或者去huggingface更新一下
      tokenization_chatglm.py
      这个文件就可以了(两种方法可以都试一下)
  • 命令行运行结果记录
[2024-11-08 17:06:02,050] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/transformers/utils/generic.py:311: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.torch.utils._pytree._register_pytree_node(
2024-11-08 17:06:03.949823: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-11-08 17:06:04.070586: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-11-08 17:06:04.621454: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.8/lib64:/usr/local/cuda-11.8/lib64:
2024-11-08 17:06:04.621553: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.8/lib64:/usr/local/cuda-11.8/lib64:
2024-11-08 17:06:04.621563: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/transformers/utils/generic.py:311: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.torch.utils._pytree._register_pytree_node(
[2024-11-08 17:06:05,017] [WARNING] [runner.py:215:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
Detected VISIBLE_DEVICES=4,5,6,7: setting --include=localhost:4,5,6,7
[2024-11-08 17:06:05,017] [INFO] [runner.py:607:main] cmd = /data/user23262833/.conda/envs/chatglm/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbNCwgNSwgNiwgN119 --master_addr=127.0.0.1 --master_port=21400 --enable_each_rank_log=None train.py --train_path data/spo_0.json --model_name_or_path chatglm2-6b/ --per_device_train_batch_size 1 --max_len 1560 --max_src_len 1024 --learning_rate 1e-4 --weight_decay 0.1 --num_train_epochs 2 --gradient_accumulation_steps 4 --warmup_ratio 0.1 --mode glm2 --lora_dim 16 --lora_alpha 64 --lora_dropout 0.1 --lora_module_name query_key_value,dense_h_to_4h,dense_4h_to_h,dense --seed 1234 --ds_file ds_zero2_no_offload.json --gradient_checkpointing --show_loss_step 10 --output_dir ./output-glm2
[2024-11-08 17:06:06,382] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/transformers/utils/generic.py:311: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.torch.utils._pytree._register_pytree_node(
2024-11-08 17:06:08.125986: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-11-08 17:06:08.243387: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-11-08 17:06:08.784327: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.8/lib64:/usr/local/cuda-11.8/lib64:
2024-11-08 17:06:08.784423: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.8/lib64:/usr/local/cuda-11.8/lib64:
2024-11-08 17:06:08.784431: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/transformers/utils/generic.py:311: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.torch.utils._pytree._register_pytree_node(
[2024-11-08 17:06:09,168] [INFO] [launch.py:146:main] WORLD INFO DICT: {'localhost': [4, 5, 6, 7]}
[2024-11-08 17:06:09,168] [INFO] [launch.py:152:main] nnodes=1, num_local_procs=4, node_rank=0
[2024-11-08 17:06:09,168] [INFO] [launch.py:163:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1, 2, 3]})
[2024-11-08 17:06:09,168] [INFO] [launch.py:164:main] dist_world_size=4
[2024-11-08 17:06:09,168] [INFO] [launch.py:168:main] Setting CUDA_VISIBLE_DEVICES=4,5,6,7
[2024-11-08 17:06:09,187] [INFO] [launch.py:256:main] process 3443023 spawned with command: ['/data/user23262833/.conda/envs/chatglm/bin/python', '-u', 'train.py', '--local_rank=0', '--train_path', 'data/spo_0.json', '--model_name_or_path', 'chatglm2-6b/', '--per_device_train_batch_size', '1', '--max_len', '1560', '--max_src_len', '1024', '--learning_rate', '1e-4', '--weight_decay', '0.1', '--num_train_epochs', '2', '--gradient_accumulation_steps', '4', '--warmup_ratio', '0.1', '--mode', 'glm2', '--lora_dim', '16', '--lora_alpha', '64', '--lora_dropout', '0.1', '--lora_module_name', 'query_key_value,dense_h_to_4h,dense_4h_to_h,dense', '--seed', '1234', '--ds_file', 'ds_zero2_no_offload.json', '--gradient_checkpointing', '--show_loss_step', '10', '--output_dir', './output-glm2']
[2024-11-08 17:06:09,206] [INFO] [launch.py:256:main] process 3443024 spawned with command: ['/data/user23262833/.conda/envs/chatglm/bin/python', '-u', 'train.py', '--local_rank=1', '--train_path', 'data/spo_0.json', '--model_name_or_path', 'chatglm2-6b/', '--per_device_train_batch_size', '1', '--max_len', '1560', '--max_src_len', '1024', '--learning_rate', '1e-4', '--weight_decay', '0.1', '--num_train_epochs', '2', '--gradient_accumulation_steps', '4', '--warmup_ratio', '0.1', '--mode', 'glm2', '--lora_dim', '16', '--lora_alpha', '64', '--lora_dropout', '0.1', '--lora_module_name', 'query_key_value,dense_h_to_4h,dense_4h_to_h,dense', '--seed', '1234', '--ds_file', 'ds_zero2_no_offload.json', '--gradient_checkpointing', '--show_loss_step', '10', '--output_dir', './output-glm2']
[2024-11-08 17:06:09,227] [INFO] [launch.py:256:main] process 3443025 spawned with command: ['/data/user23262833/.conda/envs/chatglm/bin/python', '-u', 'train.py', '--local_rank=2', '--train_path', 'data/spo_0.json', '--model_name_or_path', 'chatglm2-6b/', '--per_device_train_batch_size', '1', '--max_len', '1560', '--max_src_len', '1024', '--learning_rate', '1e-4', '--weight_decay', '0.1', '--num_train_epochs', '2', '--gradient_accumulation_steps', '4', '--warmup_ratio', '0.1', '--mode', 'glm2', '--lora_dim', '16', '--lora_alpha', '64', '--lora_dropout', '0.1', '--lora_module_name', 'query_key_value,dense_h_to_4h,dense_4h_to_h,dense', '--seed', '1234', '--ds_file', 'ds_zero2_no_offload.json', '--gradient_checkpointing', '--show_loss_step', '10', '--output_dir', './output-glm2']
[2024-11-08 17:06:09,238] [INFO] [launch.py:256:main] process 3443026 spawned with command: ['/data/user23262833/.conda/envs/chatglm/bin/python', '-u', 'train.py', '--local_rank=3', '--train_path', 'data/spo_0.json', '--model_name_or_path', 'chatglm2-6b/', '--per_device_train_batch_size', '1', '--max_len', '1560', '--max_src_len', '1024', '--learning_rate', '1e-4', '--weight_decay', '0.1', '--num_train_epochs', '2', '--gradient_accumulation_steps', '4', '--warmup_ratio', '0.1', '--mode', 'glm2', '--lora_dim', '16', '--lora_alpha', '64', '--lora_dropout', '0.1', '--lora_module_name', 'query_key_value,dense_h_to_4h,dense_4h_to_h,dense', '--seed', '1234', '--ds_file', 'ds_zero2_no_offload.json', '--gradient_checkpointing', '--show_loss_step', '10', '--output_dir', './output-glm2']
[2024-11-08 17:06:10,688] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-08 17:06:10,852] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-08 17:06:10,894] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-11-08 17:06:10,926] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/transformers/utils/generic.py:311: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.torch.utils._pytree._register_pytree_node(
2024-11-08 17:06:12.439215: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/transformers/utils/generic.py:311: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.torch.utils._pytree._register_pytree_node(
2024-11-08 17:06:12.522203: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/transformers/utils/generic.py:311: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.torch.utils._pytree._register_pytree_node(
2024-11-08 17:06:12.572146: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/transformers/utils/generic.py:311: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.torch.utils._pytree._register_pytree_node(
2024-11-08 17:06:12.601861: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-11-08 17:06:12.629245: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-11-08 17:06:12.641090: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F AVX512_VNNI FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2024-11-08 17:06:12.716181: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-11-08 17:06:12.747069: I tensorflow/core/util/port.cc:104] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-11-08 17:06:13.120827: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.8/lib64:/usr/local/cuda-11.8/lib64:
2024-11-08 17:06:13.120924: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.8/lib64:/usr/local/cuda-11.8/lib64:
2024-11-08 17:06:13.120932: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2024-11-08 17:06:13.185514: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.8/lib64:/usr/local/cuda-11.8/lib64:
2024-11-08 17:06:13.185607: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.8/lib64:/usr/local/cuda-11.8/lib64:
2024-11-08 17:06:13.185615: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2024-11-08 17:06:13.292935: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.8/lib64:/usr/local/cuda-11.8/lib64:
2024-11-08 17:06:13.293026: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.8/lib64:/usr/local/cuda-11.8/lib64:
2024-11-08 17:06:13.293034: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2024-11-08 17:06:13.343458: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.8/lib64:/usr/local/cuda-11.8/lib64:
2024-11-08 17:06:13.343553: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-11.8/lib64:/usr/local/cuda-11.8/lib64:
2024-11-08 17:06:13.343562: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/transformers/utils/generic.py:311: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.torch.utils._pytree._register_pytree_node(
[2024-11-08 17:06:13,547] [INFO] [comm.py:652:init_distributed] cdb=None
[2024-11-08 17:06:13,547] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/transformers/utils/generic.py:311: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.torch.utils._pytree._register_pytree_node(
/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/transformers/utils/generic.py:311: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.torch.utils._pytree._register_pytree_node(
/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/transformers/utils/generic.py:311: FutureWarning: `torch.utils._pytree._register_pytree_node` is deprecated. Please use `torch.utils._pytree.register_pytree_node` instead.torch.utils._pytree._register_pytree_node(
[2024-11-08 17:06:13,960] [INFO] [comm.py:652:init_distributed] cdb=None
[2024-11-08 17:06:14,088] [INFO] [comm.py:652:init_distributed] cdb=None
[2024-11-08 17:06:14,089] [INFO] [comm.py:652:init_distributed] cdb=None
tokenizer.pad_token: <unk>
tokenizer.eos_token: </s>
Loading checkpoint shards:   0%|                                                                             | 0/7 [00:00<?, ?it/s]/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/transformers/modeling_utils.py:488: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.return torch.load(checkpoint_file, map_location=map_location)
/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/transformers/modeling_utils.py:488: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.return torch.load(checkpoint_file, map_location=map_location)
Loading checkpoint shards:   0%|                                                                             | 0/7 [00:00<?, ?it/s]/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/transformers/modeling_utils.py:488: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.return torch.load(checkpoint_file, map_location=map_location)
Loading checkpoint shards:   0%|                                                                             | 0/7 [00:00<?, ?it/s]/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/transformers/modeling_utils.py:488: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.return torch.load(checkpoint_file, map_location=map_location)
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████| 7/7 [00:09<00:00,  1.34s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████| 7/7 [00:10<00:00,  1.43s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████| 7/7 [00:10<00:00,  1.44s/it]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████| 7/7 [00:10<00:00,  1.44s/it]
the number of skipping data is 0
len(train_dataloader) = 361
len(train_dataset) = 1441
num_training_steps = 182
num_warmup_steps = 18
base_model.model.transformer.encoder.layers.0.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.0.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.0.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.0.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.0.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.0.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.0.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.0.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.1.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.1.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.1.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.1.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.1.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.1.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.1.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.1.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.2.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.2.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.2.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.2.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.2.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.2.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.2.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.2.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.3.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.3.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.3.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.3.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.3.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.3.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.3.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.3.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.4.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.4.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.4.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.4.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.4.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.4.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.4.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.4.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.5.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.5.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.5.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.5.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.5.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.5.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.5.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.5.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.6.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.6.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.6.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.6.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.6.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.6.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.6.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.6.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.7.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.7.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.7.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.7.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.7.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.7.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.7.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.7.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.8.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.8.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.8.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.8.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.8.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.8.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.8.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.8.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.9.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.9.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.9.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.9.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.9.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.9.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.9.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.9.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.10.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.10.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.10.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.10.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.10.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.10.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.10.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.10.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.11.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.11.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.11.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.11.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.11.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.11.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.11.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.11.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.12.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.12.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.12.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.12.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.12.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.12.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.12.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.12.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.13.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.13.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.13.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.13.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.13.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.13.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.13.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.13.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.14.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.14.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.14.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.14.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.14.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.14.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.14.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.14.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.15.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.15.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.15.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.15.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.15.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.15.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.15.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.15.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.16.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.16.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.16.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.16.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.16.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.16.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.16.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.16.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.17.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.17.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.17.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.17.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.17.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.17.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.17.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.17.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.18.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.18.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.18.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.18.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.18.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.18.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.18.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.18.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.19.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.19.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.19.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.19.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.19.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.19.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.19.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.19.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.20.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.20.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.20.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.20.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.20.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.20.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.20.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.20.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.21.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.21.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.21.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.21.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.21.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.21.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.21.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.21.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.22.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.22.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.22.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.22.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.22.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.22.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.22.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.22.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.23.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.23.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.23.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.23.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.23.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.23.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.23.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.23.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.24.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.24.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.24.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.24.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.24.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.24.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.24.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.24.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.25.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.25.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.25.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.25.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.25.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.25.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.25.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.25.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.26.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.26.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.26.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.26.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.26.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.26.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.26.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.26.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.27.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.27.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.27.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.27.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.27.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.27.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.27.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.27.mlp.dense_4h_to_h.lora_B.default.weight
trainable params: 29646848 || all params: 6273230848 || trainable%: 0.47259297032647635
[2024-11-08 17:07:29,871] [INFO] [logging.py:129:log_dist] [Rank 0] DeepSpeed info: version=0.15.3, git-hash=unknown, git-branch=unknown
[2024-11-08 17:07:29,872] [INFO] [comm.py:677:init_distributed] Distributed backend already initialized
[2024-11-08 17:07:29,872] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 4
the number of skipping data is 0
base_model.model.transformer.encoder.layers.0.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.0.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.0.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.0.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.0.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.0.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.0.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.0.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.1.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.1.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.1.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.1.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.1.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.1.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.1.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.1.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.2.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.2.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.2.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.2.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.2.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.2.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.2.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.2.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.3.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.3.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.3.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.3.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.3.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.3.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.3.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.3.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.4.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.4.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.4.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.4.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.4.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.4.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.4.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.4.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.5.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.5.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.5.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.5.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.5.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.5.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.5.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.5.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.6.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.6.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.6.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.6.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.6.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.6.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.6.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.6.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.7.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.7.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.7.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.7.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.7.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.7.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.7.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.7.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.8.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.8.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.8.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.8.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.8.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.8.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.8.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.8.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.9.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.9.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.9.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.9.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.9.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.9.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.9.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.9.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.10.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.10.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.10.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.10.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.10.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.10.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.10.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.10.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.11.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.11.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.11.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.11.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.11.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.11.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.11.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.11.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.12.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.12.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.12.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.12.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.12.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.12.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.12.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.12.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.13.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.13.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.13.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.13.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.13.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.13.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.13.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.13.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.14.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.14.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.14.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.14.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.14.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.14.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.14.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.14.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.15.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.15.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.15.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.15.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.15.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.15.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.15.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.15.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.16.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.16.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.16.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.16.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.16.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.16.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.16.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.16.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.17.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.17.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.17.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.17.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.17.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.17.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.17.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.17.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.18.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.18.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.18.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.18.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.18.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.18.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.18.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.18.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.19.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.19.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.19.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.19.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.19.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.19.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.19.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.19.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.20.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.20.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.20.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.20.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.20.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.20.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.20.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.20.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.21.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.21.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.21.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.21.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.21.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.21.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.21.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.21.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.22.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.22.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.22.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.22.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.22.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.22.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.22.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.22.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.23.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.23.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.23.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.23.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.23.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.23.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.23.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.23.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.24.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.24.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.24.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.24.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.24.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.24.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.24.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.24.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.25.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.25.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.25.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.25.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.25.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.25.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.25.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.25.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.26.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.26.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.26.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.26.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.26.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.26.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.26.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.26.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.27.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.27.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.27.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.27.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.27.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.27.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.27.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.27.mlp.dense_4h_to_h.lora_B.default.weight
trainable params: 29646848 || all params: 6273230848 || trainable%: 0.47259297032647635
[2024-11-08 17:07:29,969] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 4
the number of skipping data is 0
base_model.model.transformer.encoder.layers.0.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.0.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.0.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.0.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.0.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.0.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.0.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.0.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.1.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.1.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.1.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.1.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.1.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.1.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.1.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.1.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.2.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.2.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.2.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.2.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.2.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.2.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.2.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.2.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.3.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.3.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.3.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.3.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.3.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.3.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.3.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.3.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.4.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.4.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.4.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.4.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.4.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.4.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.4.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.4.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.5.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.5.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.5.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.5.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.5.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.5.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.5.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.5.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.6.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.6.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.6.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.6.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.6.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.6.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.6.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.6.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.7.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.7.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.7.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.7.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.7.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.7.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.7.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.7.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.8.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.8.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.8.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.8.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.8.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.8.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.8.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.8.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.9.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.9.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.9.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.9.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.9.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.9.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.9.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.9.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.10.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.10.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.10.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.10.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.10.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.10.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.10.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.10.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.11.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.11.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.11.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.11.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.11.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.11.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.11.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.11.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.12.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.12.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.12.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.12.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.12.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.12.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.12.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.12.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.13.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.13.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.13.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.13.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.13.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.13.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.13.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.13.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.14.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.14.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.14.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.14.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.14.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.14.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.14.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.14.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.15.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.15.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.15.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.15.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.15.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.15.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.15.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.15.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.16.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.16.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.16.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.16.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.16.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.16.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.16.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.16.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.17.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.17.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.17.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.17.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.17.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.17.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.17.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.17.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.18.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.18.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.18.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.18.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.18.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.18.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.18.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.18.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.19.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.19.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.19.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.19.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.19.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.19.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.19.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.19.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.20.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.20.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.20.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.20.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.20.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.20.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.20.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.20.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.21.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.21.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.21.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.21.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.21.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.21.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.21.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.21.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.22.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.22.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.22.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.22.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.22.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.22.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.22.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.22.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.23.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.23.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.23.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.23.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.23.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.23.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.23.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.23.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.24.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.24.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.24.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.24.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.24.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.24.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.24.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.24.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.25.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.25.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.25.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.25.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.25.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.25.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.25.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.25.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.26.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.26.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.26.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.26.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.26.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.26.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.26.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.26.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.27.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.27.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.27.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.27.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.27.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.27.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.27.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.27.mlp.dense_4h_to_h.lora_B.default.weight
trainable params: 29646848 || all params: 6273230848 || trainable%: 0.47259297032647635
[2024-11-08 17:07:29,990] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 4
the number of skipping data is 0
base_model.model.transformer.encoder.layers.0.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.0.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.0.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.0.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.0.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.0.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.0.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.0.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.1.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.1.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.1.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.1.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.1.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.1.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.1.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.1.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.2.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.2.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.2.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.2.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.2.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.2.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.2.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.2.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.3.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.3.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.3.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.3.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.3.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.3.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.3.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.3.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.4.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.4.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.4.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.4.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.4.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.4.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.4.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.4.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.5.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.5.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.5.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.5.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.5.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.5.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.5.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.5.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.6.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.6.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.6.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.6.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.6.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.6.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.6.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.6.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.7.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.7.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.7.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.7.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.7.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.7.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.7.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.7.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.8.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.8.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.8.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.8.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.8.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.8.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.8.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.8.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.9.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.9.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.9.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.9.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.9.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.9.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.9.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.9.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.10.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.10.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.10.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.10.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.10.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.10.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.10.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.10.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.11.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.11.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.11.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.11.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.11.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.11.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.11.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.11.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.12.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.12.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.12.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.12.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.12.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.12.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.12.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.12.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.13.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.13.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.13.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.13.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.13.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.13.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.13.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.13.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.14.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.14.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.14.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.14.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.14.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.14.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.14.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.14.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.15.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.15.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.15.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.15.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.15.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.15.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.15.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.15.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.16.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.16.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.16.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.16.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.16.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.16.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.16.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.16.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.17.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.17.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.17.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.17.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.17.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.17.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.17.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.17.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.18.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.18.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.18.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.18.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.18.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.18.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.18.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.18.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.19.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.19.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.19.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.19.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.19.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.19.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.19.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.19.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.20.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.20.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.20.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.20.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.20.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.20.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.20.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.20.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.21.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.21.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.21.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.21.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.21.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.21.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.21.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.21.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.22.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.22.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.22.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.22.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.22.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.22.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.22.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.22.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.23.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.23.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.23.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.23.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.23.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.23.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.23.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.23.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.24.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.24.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.24.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.24.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.24.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.24.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.24.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.24.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.25.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.25.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.25.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.25.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.25.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.25.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.25.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.25.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.26.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.26.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.26.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.26.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.26.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.26.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.26.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.26.mlp.dense_4h_to_h.lora_B.default.weight
base_model.model.transformer.encoder.layers.27.self_attention.query_key_value.lora_A.default.weight
base_model.model.transformer.encoder.layers.27.self_attention.query_key_value.lora_B.default.weight
base_model.model.transformer.encoder.layers.27.self_attention.dense.lora_A.default.weight
base_model.model.transformer.encoder.layers.27.self_attention.dense.lora_B.default.weight
base_model.model.transformer.encoder.layers.27.mlp.dense_h_to_4h.lora_A.default.weight
base_model.model.transformer.encoder.layers.27.mlp.dense_h_to_4h.lora_B.default.weight
base_model.model.transformer.encoder.layers.27.mlp.dense_4h_to_h.lora_A.default.weight
base_model.model.transformer.encoder.layers.27.mlp.dense_4h_to_h.lora_B.default.weight
trainable params: 29646848 || all params: 6273230848 || trainable%: 0.47259297032647635
[2024-11-08 17:07:31,295] [INFO] [config.py:733:__init__] Config mesh_device None world_size = 4
[2024-11-08 17:07:34,800] [INFO] [logging.py:129:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False
Using /data/user23262833/.cache/torch_extensions/py38_cu121 as PyTorch extensions root...
Creating extension directory /data/user23262833/.cache/torch_extensions/py38_cu121/fused_adam...
Using /data/user23262833/.cache/torch_extensions/py38_cu121 as PyTorch extensions root...
Using /data/user23262833/.cache/torch_extensions/py38_cu121 as PyTorch extensions root...
Using /data/user23262833/.cache/torch_extensions/py38_cu121 as PyTorch extensions root...
Detected CUDA files, patching ldflags
Emitting ninja build file /data/user23262833/.cache/torch_extensions/py38_cu121/fused_adam/build.ninja...
/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/torch/utils/cpp_extension.py:1965: UserWarning: TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'].warnings.warn(
Building extension module fused_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/3] /usr/local/cuda-11.8/bin/nvcc --generate-dependencies-with-compile --dependency-output multi_tensor_adam.cuda.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/deepspeed/ops/csrc/adam -isystem /data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/torch/include -isystem /data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/torch/include/TH -isystem /data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.8/include -isystem /data/user23262833/.conda/envs/chatglm/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_80,code=compute_80 -gencode=arch=compute_80,code=sm_80 --compiler-options '-fPIC' -O3 -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -lineinfo --use_fast_math -gencode=arch=compute_80,code=sm_80 -gencode=arch=compute_80,code=compute_80 -DBF16_AVAILABLE -U__CUDA_NO_BFLOAT16_OPERATORS__ -U__CUDA_NO_BFLOAT162_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ -std=c++17 -c /data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/multi_tensor_adam.cu -o multi_tensor_adam.cuda.o 
[2/3] c++ -MMD -MF fused_adam_frontend.o.d -DTORCH_EXTENSION_NAME=fused_adam -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -I/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/deepspeed/ops/csrc/includes -I/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/deepspeed/ops/csrc/adam -isystem /data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/torch/include -isystem /data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/torch/include/TH -isystem /data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda-11.8/include -isystem /data/user23262833/.conda/envs/chatglm/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -O3 -std=c++17 -g -Wno-reorder -DVERSION_GE_1_1 -DVERSION_GE_1_3 -DVERSION_GE_1_5 -DBF16_AVAILABLE -c /data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/deepspeed/ops/csrc/adam/fused_adam_frontend.cpp -o fused_adam_frontend.o 
[3/3] c++ fused_adam_frontend.o multi_tensor_adam.cuda.o -shared -L/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/usr/local/cuda-11.8/lib64 -lcudart -o fused_adam.so
Loading extension module fused_adam...
Time to load fused_adam op: 22.472938776016235 seconds
Loading extension module fused_adam...
Loading extension module fused_adam...
Time to load fused_adam op: 22.532176971435547 seconds
Time to load fused_adam op: 22.533612728118896 seconds
[2024-11-08 17:07:57,341] [INFO] [logging.py:129:log_dist] [Rank 0] Using DeepSpeed Optimizer param name adamw as basic optimizer
[2024-11-08 17:07:57,341] [INFO] [logging.py:129:log_dist] [Rank 0] Removing param_group that has no 'params' in the basic Optimizer
Loading extension module fused_adam...
Time to load fused_adam op: 22.533676862716675 seconds
[2024-11-08 17:07:57,402] [INFO] [logging.py:129:log_dist] [Rank 0] DeepSpeed Basic Optimizer = FusedAdam
[2024-11-08 17:07:57,402] [INFO] [utils.py:59:is_zero_supported_optimizer] Checking ZeRO support for optimizer=FusedAdam type=<class 'deepspeed.ops.adam.fused_adam.FusedAdam'>
[2024-11-08 17:07:57,402] [INFO] [logging.py:129:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer
[2024-11-08 17:07:57,402] [INFO] [stage_1_and_2.py:149:__init__] Reduce bucket size 500000000
[2024-11-08 17:07:57,402] [INFO] [stage_1_and_2.py:150:__init__] Allgather bucket size 500000000
[2024-11-08 17:07:57,402] [INFO] [stage_1_and_2.py:151:__init__] CPU Offload: False
[2024-11-08 17:07:57,402] [INFO] [stage_1_and_2.py:152:__init__] Round robin gradient partitioning: False
[2024-11-08 17:07:57,429] [WARNING] [lr_schedules.py:671:__init__] Using unknown warmup_type: cosine. The increasing function is set to default (log)
[2024-11-08 17:07:57,429] [WARNING] [lr_schedules.py:683:get_lr] Attempting to get learning rate from scheduler before it has started0%|                                                                                                   | 0/361 [00:00<?, ?batch/s][2024-11-08 17:07:57,469] [WARNING] [lr_schedules.py:671:__init__] Using unknown warmup_type: cosine. The increasing function is set to default (log)
[2024-11-08 17:07:57,469] [WARNING] [lr_schedules.py:683:get_lr] Attempting to get learning rate from scheduler before it has started0%|                                                                                                   | 0/361 [00:00<?, ?batch/s][2024-11-08 17:07:57,520] [WARNING] [lr_schedules.py:671:__init__] Using unknown warmup_type: cosine. The increasing function is set to default (log)
[2024-11-08 17:07:57,520] [WARNING] [lr_schedules.py:683:get_lr] Attempting to get learning rate from scheduler before it has started0%|                                                                                                   | 0/361 [00:00<?, ?batch/s][2024-11-08 17:07:57,647] [INFO] [utils.py:781:see_memory_usage] Before initializing optimizer states
[2024-11-08 17:07:57,648] [INFO] [utils.py:782:see_memory_usage] MA 11.74 GB         Max_MA 11.75 GB         CA 11.79 GB         Max_CA 12 GB 
[2024-11-08 17:07:57,648] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 71.46 GB, percent = 7.1%
[2024-11-08 17:07:57,774] [INFO] [utils.py:781:see_memory_usage] After initializing optimizer states
[2024-11-08 17:07:57,775] [INFO] [utils.py:782:see_memory_usage] MA 11.74 GB         Max_MA 11.77 GB         CA 11.81 GB         Max_CA 12 GB 
[2024-11-08 17:07:57,775] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 71.47 GB, percent = 7.1%
[2024-11-08 17:07:57,775] [INFO] [stage_1_and_2.py:544:__init__] optimizer state initialized
/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py:600: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.return fn(*args, **kwargs)
/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py:600: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.return fn(*args, **kwargs)
/data/user23262833/.conda/envs/chatglm/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py:600: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.return fn(*args, **kwargs)
[2024-11-08 17:07:57,905] [INFO] [utils.py:781:see_memory_usage] After initializing ZeRO optimizer
[2024-11-08 17:07:57,905] [INFO] [utils.py:782:see_memory_usage] MA 11.74 GB         Max_MA 11.74 GB         CA 11.81 GB         Max_CA 12 GB 
[2024-11-08 17:07:57,905] [INFO] [utils.py:789:see_memory_usage] CPU Virtual Memory:  used = 71.73 GB, percent = 7.1%
[2024-11-08 17:07:57,907] [INFO] [logging.py:129:log_dist] [Rank 0] DeepSpeed Final Optimizer = DeepSpeedZeroOptimizer
[2024-11-08 17:07:57,907] [WARNING] [lr_schedules.py:671:__init__] Using unknown warmup_type: cosine. The increasing function is set to default (log)
[2024-11-08 17:07:57,907] [WARNING] [lr_schedules.py:683:get_lr] Attempting to get learning rate from scheduler before it has started
[2024-11-08 17:07:57,907] [INFO] [logging.py:129:log_dist] [Rank 0] DeepSpeed using configured LR scheduler = WarmupDecayLR
[2024-11-08 17:07:57,907] [INFO] [logging.py:129:log_dist] [Rank 0] DeepSpeed LR Scheduler = <deepspeed.runtime.lr_schedules.WarmupDecayLR object at 0x7f3f7f36bfd0>
[2024-11-08 17:07:57,907] [INFO] [logging.py:129:log_dist] [Rank 0] step=0, skipped=0, lr=[1e-05], mom=[(0.9, 0.95)]
[2024-11-08 17:07:57,910] [INFO] [config.py:999:print] DeepSpeedEngine configuration:
[2024-11-08 17:07:57,910] [INFO] [config.py:1003:print]   activation_checkpointing_config  {"partition_activations": false, "contiguous_memory_optimization": false, "cpu_checkpointing": false, "number_checkpoints": null, "synchronize_checkpoint_boundary": false, "profile": false
}
[2024-11-08 17:07:57,910] [INFO] [config.py:1003:print]   aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True, 'use_gds': False}
[2024-11-08 17:07:57,910] [INFO] [config.py:1003:print]   amp_enabled .................. False
[2024-11-08 17:07:57,910] [INFO] [config.py:1003:print]   amp_params ................... False
[2024-11-08 17:07:57,910] [INFO] [config.py:1003:print]   autotuning

http://www.mrgr.cn/news/68543.html

相关文章:

  • 【计网不挂科】计算机网络期末考试——【选择题&填空题&判断题&简述题】题库(2)
  • ​解决‌win11无法打开msi安装程序包的方法‌
  • AI预测体彩排3采取888=3策略+和值012路+胆码+通杀1码测试11月8日升级新模型预测第128弹
  • 虚假新闻检测:CSV格式数据集的预处理与模型选择
  • 改变财务规划思维方式,迎接创新技术新时代
  • 数据分析的力量如何驱动商业决策和创新发展
  • 文件系统和日志管理 附实验:远程访问第一台虚拟机日志
  • 基于Springboot+Vue的网上拍卖系统 (含源码数据库)
  • 如何简化App Store提现?——作为游戏开发者的跨境收款体验分享
  • openGauss 一主一备 从5.0 LTS 版本升级至 6.0 LTS 版本实战
  • MySQL的SQL书写顺序和执行顺序
  • XHCI 1.2b 规范摘要(14)
  • 用户裂变数据分析
  • Rust常用数据结构教程 序列
  • 2024年11月8日Github流行趋势
  • AI 大模型如何重塑软件开发:从代码生成到未来愿景
  • MySQL 按日查询数据
  • C++静态成员函数
  • 【Ant Design Pro】框架思维导图
  • 面试题:Spring(一)