{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":685351050,"defaultBranch":"main","name":"Megatron-DeepSpeed","ownerLogin":"imh966","currentUserCanPush":false,"isFork":true,"isEmpty":false,"createdAt":"2023-08-31T03:24:57.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/97744372?v=4","public":true,"private":false,"isOrgOwned":false},"refInfo":{"name":"","listCacheKey":"v0:1713945215.0","currentOid":""},"activityList":{"items":[{"before":"508aa8c54c6f423f0e555bb71161ee2dd8d0922b","after":"8ac10d63d1d9aaec1a929fd5fa33c027e09fd3bd","ref":"refs/heads/github_main","pushedAt":"2024-04-24T07:58:24.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"imh966","name":null,"path":"/imh966","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97744372?s=80&v=4"},"commit":{"message":"fix loading distributed checkpoint when enable auto-detect-ckpt-format but disable use-dist-ckpt","shortMessageHtmlLink":"fix loading distributed checkpoint when enable auto-detect-ckpt-forma…"}},{"before":"ccfeda47cb5ca10ee3c4efd9b78c6bb15c2cd3d2","after":"508aa8c54c6f423f0e555bb71161ee2dd8d0922b","ref":"refs/heads/github_main","pushedAt":"2024-04-24T07:56:45.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"imh966","name":null,"path":"/imh966","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97744372?s=80&v=4"},"commit":{"message":"fix loading distributed checkpoint when enable auto-detectckpt-format but disable use-dist-ckpt","shortMessageHtmlLink":"fix loading distributed checkpoint when enable auto-detectckpt-format…"}},{"before":null,"after":"ccfeda47cb5ca10ee3c4efd9b78c6bb15c2cd3d2","ref":"refs/heads/github_main","pushedAt":"2024-04-24T07:53:35.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"imh966","name":null,"path":"/imh966","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97744372?s=80&v=4"},"commit":{"message":"Merge branch 'fix_overlap_param_gather' into 'main'\n\nfix EP distopt with overlap param gather\n\nSee merge request ADLR/megatron-lm!1345","shortMessageHtmlLink":"Merge branch 'fix_overlap_param_gather' into 'main'"}},{"before":"a4f807982d7e3552d2d7f1ff2f37a89efd529759","after":"bcedecd1ff788d4d363f3365fd396053a08d65be","ref":"refs/heads/main","pushedAt":"2024-04-24T07:44:34.000Z","pushType":"push","commitsCount":21,"pusher":{"login":"imh966","name":null,"path":"/imh966","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97744372?s=80&v=4"},"commit":{"message":"Support MoE for GPTModelPipe (#373)\n\n* MOE: Support MoE layers creation for GPTModelPipe\r\n\r\nSigned-off-by: Moshe Island \r\n\r\n* MOE: Support MoE aux loss for GPTModelPipe\r\n\r\nPropagate aux loss along GPTModelPipe layers by forwarding the aggregated loss\r\nfrom each transformer layer to the next transformer layer.\r\n\r\nIn addition, add a layer to GPTModelPipe, after the last transformer layer, to\r\ncatch the final aggregated aux loss and cache it for use in the loss function.\r\n\r\nSigned-off-by: Moshe Island \r\n\r\n* MOE: Support display of MoE loss for GPTModelPipe\r\n\r\nSigned-off-by: Moshe Island \r\n\r\n* MOE: Verify MoE with no pipe/grad partitioned\r\n\r\nCurrently PipelineEngine supports only a single tensor partitioning with grad.\r\nMoE model requires to forward with grad both the activations and the aux_loss.\r\nTherefore, until PilelineEngine limitation is removed, verify no partitioning\r\nwhen using MoE.\r\n\r\nSigned-off-by: Moshe Island \r\n\r\n---------\r\n\r\nSigned-off-by: Moshe Island \r\nCo-authored-by: Moshe Island ","shortMessageHtmlLink":"Support MoE for GPTModelPipe (microsoft#373)"}},{"before":"c08e2cc2579cf7ef5ce6ad58453e76a5710b7b6b","after":"a4f807982d7e3552d2d7f1ff2f37a89efd529759","ref":"refs/heads/main","pushedAt":"2024-01-13T05:58:54.000Z","pushType":"push","commitsCount":43,"pusher":{"login":"imh966","name":null,"path":"/imh966","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97744372?s=80&v=4"},"commit":{"message":"bug fix on args.deepspeed_config_dict (#328)","shortMessageHtmlLink":"bug fix on args.deepspeed_config_dict (microsoft#328)"}},{"before":null,"after":"b3608c4b80fd8b121fa3268924fca17799d217b4","ref":"refs/heads/fix_attention_mask","pushedAt":"2024-01-13T05:58:11.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"imh966","name":null,"path":"/imh966","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97744372?s=80&v=4"},"commit":{"message":"optimize the generation of attention mask","shortMessageHtmlLink":"optimize the generation of attention mask"}},{"before":null,"after":"5bd97efcd5f5c5ef6d4889c5f977bafdfd349ef3","ref":"refs/heads/activation_checkpointing","pushedAt":"2023-09-09T17:01:08.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"imh966","name":null,"path":"/imh966","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97744372?s=80&v=4"},"commit":{"message":"some modification for activation checkpointing","shortMessageHtmlLink":"some modification for activation checkpointing"}},{"before":"7787079a92f5491330e9cbc3837023e5841167cf","after":"f45571a8cc85b0d75a32e17bb98b8e617b24783b","ref":"refs/heads/fix_activation_checkpointing","pushedAt":"2023-09-01T09:46:53.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"imh966","name":null,"path":"/imh966","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97744372?s=80&v=4"},"commit":{"message":"add activation checkpointing arguments for llama pretrain scripts","shortMessageHtmlLink":"add activation checkpointing arguments for llama pretrain scripts"}},{"before":"c80d7acc6bab15f411dcc4f7f42808ce0232a7e4","after":"7787079a92f5491330e9cbc3837023e5841167cf","ref":"refs/heads/fix_activation_checkpointing","pushedAt":"2023-09-01T09:44:06.000Z","pushType":"force_push","commitsCount":0,"pusher":{"login":"imh966","name":null,"path":"/imh966","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97744372?s=80&v=4"},"commit":{"message":"add activation checkpointing arguments for llama pretrain scripts","shortMessageHtmlLink":"add activation checkpointing arguments for llama pretrain scripts"}},{"before":null,"after":"c80d7acc6bab15f411dcc4f7f42808ce0232a7e4","ref":"refs/heads/fix_activation_checkpointing","pushedAt":"2023-08-31T07:23:05.000Z","pushType":"branch_creation","commitsCount":0,"pusher":{"login":"imh966","name":null,"path":"/imh966","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/97744372?s=80&v=4"},"commit":{"message":"add activation checkpointing arguments for llama pretrain scripts","shortMessageHtmlLink":"add activation checkpointing arguments for llama pretrain scripts"}}],"hasNextPage":false,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEONx3mAA","startCursor":null,"endCursor":null}},"title":"Activity · imh966/Megatron-DeepSpeed"}