Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

contrib: support modelscope community #12664

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

tastelikefeet
Copy link

@tastelikefeet tastelikefeet commented Mar 31, 2025

Support gguf models of modelscope community.
linux/mac/windows HF/MS download tested.

Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to re-think a bit about the implementation

From UX perspective, adding 4 more args may not be intuitive for most users, especially for most people outside china who won't use this (I understand the HF is not accessible from china checked with HF team - HF is still accessible in china, but still we should allow switching model host in a way that is less confused)

What I'm thinking is to handle the case where user can add protocol prefix to the existing -m, like what we have in llama-run. So for example

  • hugging face: -m hf://user/model:quant (equivalent to -hf user/model:quant)
  • modelscope: -m ms://user/model:quant

@ngxson
Copy link
Collaborator

ngxson commented Mar 31, 2025

What I'm thinking is to handle the case where user can add protocol prefix to the existing -m, like what we have in llama-run. So for example

* hugging face: `-m hf://user/model:quant` (equivalent to `-hf user/model:quant`)

* modelscope: `-m ms://user/model:quant`

Btw, we also have hf-mirror.com, so I think it's safe to say that adding one set of 4 arguments per host is not a scalable solution. So I would prefer to go with the protocol://... format

@foldl
Copy link
Contributor

foldl commented Mar 31, 2025

The abbreviation MS looks weird to me.

@tastelikefeet
Copy link
Author

tastelikefeet commented Mar 31, 2025

What I'm thinking is to handle the case where user can add protocol prefix to the existing -m, like what we have in llama-run. So for example

* hugging face: `-m hf://user/model:quant` (equivalent to `-hf user/model:quant`)

* modelscope: `-m ms://user/model:quant`

Btw, we also have hf-mirror.com, so I think it's safe to say that adding one set of 4 arguments per host is not a scalable solution. So I would prefer to go with the protocol://... format

This change was previously considered too significant for contributors(to add a new argument protocol for llama-cli and llama-server), so I chose a more conservative approach. Do we need to change to this way?

@ngxson
Copy link
Collaborator

ngxson commented Mar 31, 2025

This change was previously considered too significant for contributors(to add a new argument protocol for llama-cli and llama-server), so I chose a more conservative approach. Do we need to change to this way?

I still think having the protocol:// will be more extensible because:

  • It's easier to add a third provider in the future
  • We can also make it a positional arg (that's an idea for the future), so we can eliminate using -m and simply doing llama-cli protocol://user/model

Also, please correct if I'm wrong, but aren't modelscope and HF urls using the same structure? The format is: {base_domain}/resolve/main/{file}, then in this case I think a lot of codes in this PR is redundant, as we can simply have the base_domain replaceable.

Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a more simple approach. At the moment, this feature is kinda "nice-to-have", so we should start with something simple, then improve it later when more users use this.

@tastelikefeet
Copy link
Author

This change was previously considered too significant for contributors(to add a new argument protocol for llama-cli and llama-server), so I chose a more conservative approach. Do we need to change to this way?

I still think having the protocol:// will be more extensible because:

  • It's easier to add a third provider in the future
  • We can also make it a positional arg (that's an idea for the future), so we can eliminate using -m and simply doing llama-cli protocol://user/model

Also, please correct if I'm wrong, but aren't modelscope and HF urls using the same structure? The format is: {base_domain}/resolve/main/{file}, then in this case I think a lot of codes in this PR is redundant, as we can simply have the base_domain replaceable.

No, the url may be a little different. HF is resolve/main, MS is resolve/master

@yingdachen
Copy link

I think a lot of discussions in this thread are quite constructive, which may result in a cleaner implementation that can facilitate llama.cpp users to access their needed model files from ModelScope. 

At the same time, there are obvious values to having another alternative community. For one thing, unlike lightweight open-source codes, large model weights require much more than a "mirror" site for efficient and high-speed access. ModelScope is dedicated to making models more accessible to our over 15-million users. Obviously integration into popular tools/frameworks like llama.cpp can greatly facilitate the efforts. We also believe easier access to wider model selection can benefit users of llama.cpp, and the open source communities as a whole. After all, the easy access to gguf models on huggingface via mode-id must have been implemented out of the same belief that such integration can connect llama.cpp better with the model-ecosystems. Otherwise, standalone operations like "wget" can get gguf models as well :) 

We will take the discussions today back to see how we can coordinate server-side protocol, to better accommodate client-side implementations that have been built around the hf ecosystem. @tastelikefeet

* master: (123 commits)
  cuda : add f32 to bf16 copy op (ggml-org#12806)
  llava: improve clip_ctx destructor to not memleak load_image_size (ggml-org#12834)
  llama : fix FA when KV cache is not used (i.e. embeddings) (ggml-org#12825)
  server : fix thread.join() on exit (ggml-org#12831)
  llava: add more helper functions to check projector types in clip context (ggml-org#12824)
  arg : Including limits file on AIX (ggml-org#12822)
  server : webui : Improve Chat Input with Auto-Sizing Textarea (ggml-org#12785)
  Revert "sycl:remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor" (ggml-org#12812)
  gguf-py : support lazy tensor splitting (ggml-org#12809)
  llama : Support llama 4 text-only (ggml-org#12791)
  opencl: better identify Adreno GPU (ggml-org#12760)
  hellaswag: display estimated score confidence interval (ggml-org#12797)
  cuda : fix HIP and MUSA BF16 (#0)
  sync : ggml
  ggml : simplify Arm fp16 CPU logic (ggml/1177)
  CUDA: don't convert BF16 weights to FP32 (ggml/1174)
  cpu: move all the operators into a separate c++ file (except mul_mat) (ggml/1167)
  sycl: remove redundant memcopy in function ggml_backend_sycl_buffer_set_tensor (ggml-org#12734)
  ci : no curl on ggml-ci (ggml-org#12796)
  cmake : enable curl by default (ggml-org#12761)
  ...

# Conflicts:
#	common/arg.cpp
#	common/common.cpp
#	common/common.h
@tastelikefeet
Copy link
Author

@ngxson @ggerganov We fixed the conversations and updated our backend code, and fixed some code that hard-coded Hugging Face endpoint to support ModelScope and other communities with the same downloading protocol, meanwhile we changed the HF_ENDPOINT to MODEL_ENDPOINT to make note developers with multiple communities support. cc @yingdachen

@@ -1027,6 +1027,16 @@ struct common_init_result common_init_from_params(common_params & params) {
return iparams;
}

std::string get_model_endpoint() {
std::string model_endpoint = "https://huggingface.co/";
const char * model_endpoint_env = getenv("MODEL_ENDPOINT");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can (and should) support both HF_ENDPOINT and MODEL_ENDPOINT, otherwise this may be a breaking change; something like:

const char * model_endpoint_env = getenv("MODEL_ENDPOINT");
const char * hf_endpoint_env = getenv("HF_ENDPOINT");
const char * endpoint_env = model_endpoint_env ? model_endpoint_env : hf_endpoint_env;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have thought about that as well. but since the introduction of HF_ENDPOINT is fairly recent:

c6ff5d2

we think it may be better to keep one single env variable, weighting in long-term maintainability and user-friendness.

but do let us know if you have a strong opinion on this...

Copy link
Collaborator

@ngxson ngxson Apr 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From UX perspective, I don't feel comfortable of breaking other people's workflow without doing proper communication about it. Support both variable names is just 2 more lines of code anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants