Use Azure OpenAI to Run WebArena Evaluation #6372

wxl-lxw · 2025-01-21T00:04:00Z

I'm trying to run WebArena evaluation following this guide. However, it only shows how to run it using OpenAI API. Now I am trying to evaluate WebArena using Azure OpenAI API. Are there any instructions for me to follow?

Thanks.

enyst · 2025-01-21T00:24:21Z

I think the problem there is that WebArena/browsergym requires an OpenAI key for some internal functions.

In my understanding, other than that you can set up an Azure LLM for the agent like in the linked guide here

Cc: @adityasoni9998 I believe you hit the same issue?

adityasoni9998 · 2025-01-23T19:21:00Z

I think the problem there is that WebArena/browsergym requires an OpenAI key for some internal functions.

In my understanding, other than that you can set up an Azure LLM for the agent like in the linked guide here

Cc: @adityasoni9998 I believe you hit the same issue?

Yes, this is a bit annoying and I encountered a similar issue when trying to use LiteLLM proxy for VisualWebArena evaluation. There is 2-step dependency here - OpenHands relies on BrowserGym for evaluation on WebArena benchmark and BrowserGym internally relies on WebArena functions to compute resolve rates. In case the model name mentioned in the code linked above matches your Azure OpenAI model, you can try setting OPENAI_BASE_URL environment variable to Azure API base URL in your sandbox here.

mamoodi added documentation Related to documentation evaluation Related to running evaluations with OpenHands labels Jan 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Azure OpenAI to Run WebArena Evaluation #6372

Use Azure OpenAI to Run WebArena Evaluation #6372

wxl-lxw commented Jan 21, 2025

enyst commented Jan 21, 2025

adityasoni9998 commented Jan 23, 2025

Use Azure OpenAI to Run WebArena Evaluation #6372

Use Azure OpenAI to Run WebArena Evaluation #6372

Comments

wxl-lxw commented Jan 21, 2025

enyst commented Jan 21, 2025

adityasoni9998 commented Jan 23, 2025