Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DisallowedHost exception for Django instrumentation #1781

Closed
leohahn opened this issue May 3, 2023 · 23 comments · Fixed by #2912 · May be fixed by #1903 or #1825
Closed

DisallowedHost exception for Django instrumentation #1781

leohahn opened this issue May 3, 2023 · 23 comments · Fixed by #2912 · May be fixed by #1903 or #1825
Assignees
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@leohahn
Copy link

leohahn commented May 3, 2023

Describe your environment

  • Django 4.1.7
  • Python 3.11.1
  • opentelemetry-instrumentation-django 0.38b0

Steps to reproduce
Create basic django app:

mkdir django-test && cd django-test
python -m venv .venv
source .venv/bin/activate
pip install django
django-admin startproject django_test .
python manage.py migrate
cat <<EOF > gunicorn.config.py
from opentelemetry.instrumentation.django import DjangoInstrumentor

def post_fork(server, worker):
    DjangoInstrumentor().instrument()
EOF

Only allow hosts from a specific origin. This is best practice in terms of security.
Change the following variables in django_test/settings.py:

DEBUG = False
ALLOWED_HOSTS = ['example.com']

Create a health check middleware. The middleware is added at the beginning of the middleware
array, since we don't want the allowed hosts rule to apply to the health check, since we'll
likely be receiving health checks from the load balancer, which does not have the correct host.

cat <<EOF > django_test/middleware.py
from django.http import HttpResponse


class HealthCheckMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        if request.path == "/api/health":
            return HttpResponse("ok\n")
        return self.get_response(request)
EOF

Add to the array in django_test.settings.py:

MIDDLEWARE = [
    'django_test.middleware.HealthCheckMiddleware',
    # ... other middlewares here
]

Now the health check route works as expected when we run the server:

python manage.py runserver
❯ curl localhost:8000/api/health
ok

Now we need to add open telemetry django and gunicorn:

pip install opentelemetry-instrumentation-django
pip install gunicorn

Now we run django with instrumentation:

DJANGO_SETTINGS_MODULE=django_test.settings OTEL_SERVICE_NAME=TestApi gunicorn django_test.wsgi:application -c gunicorn.config.py

The healthcheck endpoint fails with 400 now, because of the DisallowedHost exception. More specifically, the error is raised here on the request.build_absolute_uri call.

What is the expected behavior?

The healthcheck endpoint should not fail when telemetry is enabled.

I'm not sure what the best solution for this is, but I guess we want to build the absolute URI without calling request.build_absolute_uri, since it will throw an exception.

EDIT: since I'm new to the library, I'm not sure what the best solution would be. But if you have any pointers feel free to say them that I can try to fix the issue as well.

@leohahn leohahn added the bug Something isn't working label May 3, 2023
@srikanthccv srikanthccv added the help wanted Extra attention is needed label May 5, 2023
@TheAnshul756
Copy link
Contributor

Hey @srikanthccv. Can you please assign this to me.

@TheAnshul756
Copy link
Contributor

TheAnshul756 commented May 6, 2023

Hello @leohahn, thank you for bringing this issue to our attention. After conducting some research, I have found that the request.build_absolute_uri method is internally calling the get_host method, which checks for allowed hosts and throws an exception if the host is not allowed. This behavior is not ideal, as it is affecting the execution of other middlewares even though it has been added to the top of the middleware stack.

To address this issue, I suggest that we write our own build_absolute_uri method to bypass the Middleware and ensure that the execution of other middlewares is not affected. However, before implementing this solution, we should thoroughly test it to ensure that it does not introduce any new bugs or issues.

I would appreciate your thoughts on this approach, @srikanthccv. Thank you.

@srikanthccv
Copy link
Member

How about giving the option to configure where in the middleware stack Django instrumentation be placed? I believe that will also help potential issues that arise because of the order of the execution. For instance, the user has a number of middleware that should be executed before any instrumentation work begins.

@TheAnshul756
Copy link
Contributor

Many frameworks provide functionality for getting middleware priority and sorting them by it. However, I'm not sure how we could implement this on the instrumentation level. It's an interesting idea to give users the option to configure where in the middleware stack Django instrumentation should be placed, as this could help prevent potential issues that arise due to the order of execution. We could explore this further and see if there are any feasible solutions.

@srikanthccv
Copy link
Member

_middleware_setting = _get_django_middleware_setting()
settings_middleware = []
try:
settings_middleware = getattr(settings, _middleware_setting, [])
except ImproperlyConfigured as exception:
_logger.debug(
"DJANGO_SETTINGS_MODULE environment variable not configured. Defaulting to empty settings: %s",
exception,
)
settings.configure()
settings_middleware = getattr(settings, _middleware_setting, [])
except ModuleNotFoundError as exception:
_logger.debug(
"DJANGO_SETTINGS_MODULE points to a non-existent module. Defaulting to empty settings: %s",
exception,
)
settings.configure()
settings_middleware = getattr(settings, _middleware_setting, [])
# Django allows to specify middlewares as a tuple, so we convert this tuple to a
# list, otherwise we wouldn't be able to call append/remove
if isinstance(settings_middleware, tuple):
settings_middleware = list(settings_middleware)
is_sql_commentor_enabled = kwargs.pop("is_sql_commentor_enabled", None)
if is_sql_commentor_enabled:
settings_middleware.insert(0, self._sql_commenter_middleware)
settings_middleware.insert(0, self._opentelemetry_middleware)
setattr(settings, _middleware_setting, settings_middleware)

@TheAnshul756
Copy link
Contributor

I appreciate the code you provided to set the order of middleware. However, my concern is how we would decide which middleware should execute before the OpenTelemetry middleware.

I think providing a decorator for middlewares that executes before the OpenTelemetry middleware is a potential solution to give developers more control over the middleware stack. However, I understand that this would require customers to make code-level changes to import and use the decorator. What do you think about this solution?

@srikanthccv
Copy link
Member

my concern is how we would decide which middleware should execute before the OpenTelemetry middleware.

We don't; the user decides. The configuration option I mentioned earlier is the order index where the OTEL middleware should go.

@TheAnshul756
Copy link
Contributor

Asking this just to check if we are on the same page, Django takes a tuple of middleware in which the customer provides their middleware, and our code by default inserts the OpenTelemetry middleware on top. Could you please explain how the user would specify where to put the OpenTelemetry middleware in this tuple? Would they need to provide an index as input to indicate where the OpenTelemetry middleware should be added?"

@srikanthccv
Copy link
Member

it could be the index or name of the middleware to come after/before or by some other way to place it in the right position.

@leohahn
Copy link
Author

leohahn commented Jul 24, 2023

Hey folks, do we have any updates here?

@TheAnshul756
Copy link
Contributor

Hello @leohahn
I sincerely apologize for the delay in addressing the issue. Due to unforeseen circumstances, I couldn't work on it as promptly as I had hoped. However, I want to assure you that I am now actively working on resolving the problem and will dedicate my efforts to completing it within next one week.

@leohahn
Copy link
Author

leohahn commented Jul 25, 2023

Hey @TheAnshul756, thanks for the response. No worries! Not sure if I can help, but feel free to point something that I could be useful as well if necessary.

@shinkawk
Copy link

@leohahn @TheAnshul756 Hi guys,

Just ran into the same issue today. I'm using AWS ECS and implemented HealthCheckMiddleware exacly the way described to bypass Django's security check.

Here is my quick workaround for the issue. Add this to your settings.py (originally from https://stackoverflow.com/questions/37031749/django-allowed-hosts-ips-range

from socket import gethostbyname_ex, gethostname

ALLOWED_HOSTS += [ gethostname(),] + list(set(gethostbyname_ex(gethostname())[2]))

This will add the private ip address of ECS container to each app instance which get_host validate against.
Adding these hosts to ALLOWED_HOSTS shouldn't impact your security since they are private addresses.

@jeremydvoss
Copy link
Contributor

I am also having trouble avoiding ALLOWED_HOSTS errors running on Linux specifically. Even after adding a wildcard to ALLOWED_HOSTS to allow everything, I don't seem to get any spans. I am guessing it is related to this. When running with "python manage.py runserver --noreload" on Windows it works, but gunicorn on Linux does not.

@jeremydvoss
Copy link
Contributor

I want to make sure that whatever solution we come up with here works intuitively for auto-instrumentation. That functionality seems to have been a bit overlooked in the past. So, for instance, we should prioritizing finding a fix that works natively and doesn't require additional user input. If use input is needed, as is the case for the settings module env var, then we need to make sure it's configurable via environment variable so as to not break codeless scenarios.

@jeremydvoss
Copy link
Contributor

Potentially related: #2043

@Puneet140500
Copy link

Hey @srikanthccv. Can you please assign this to me.

@TheAnshul756 TheAnshul756 removed their assignment Aug 13, 2024
@emdneto
Copy link
Member

emdneto commented Aug 13, 2024

@Puneet140500 Please open a PR if you have some work done.

@saichander17
Copy link
Contributor

saichander17 commented Sep 11, 2024

saichander17#1
saichander17#2

Do any of these make sense? I haven't tested or did much. I literally updated those files quickly to get an idea of if any one of them are follows correct approach or not.

cc: @lzchen

@emdneto
Copy link
Member

emdneto commented Sep 12, 2024

@saichander17, please open the PR against this repository

@saichander17
Copy link
Contributor

I just don't know if it makes any sense or not. That's why I didn't open the PR against this one. Let me go ahead and do it anyways

@saichander17
Copy link
Contributor

Done.
#2866
#2867

@saichander17
Copy link
Contributor

Seems like #2834 is already in progress and is probably in the right direction. I'll go ahead close my PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment