Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't store request specific data into the Layer object #113

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Congelli501
Copy link

@Congelli501 Congelli501 commented Feb 2, 2024

Issue

Currently, the path & params of the latest request is stored into the last matched layer.
The "Layer" object should be immutable: a request should not modify its attributes.

This means that parameters are kept into memory after the end a request.

Fix

This PR simply removes the path & params parameters of the Layer object, and return them as the result of the match function.

Test case

A test case is included, to check that the params attribute is not available anymore in the Layer.

History

An initial PR was made on the express project: expressjs/express#5426

Example

For example, this code will show the latest value of the parameter, as the value is stored until a new request is made:

const express = require('./index.js');
const app = express();
const port = 3000;

app.get('/secret/:token', (req, res) => {
	res.send('Hello World!');
});

app.listen(port, () => {
	console.log(`Example app listening on port ${port}`);
});

setInterval(() => {
	console.log(app._router.stack.at(2).params);
}, 1000);
curl localhost:3000/secret/super_secret

Result:

Example app listening on port 3000
undefined
undefined
undefined
{ token: 'super_secret' }
{ token: 'super_secret' }

@dougwilson dougwilson changed the title [Security] Don't store request specific data into the Layer object Don't store request specific data into the Layer object Feb 2, 2024
Copy link
Contributor

@dougwilson dougwilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just looking from my phone so not indepth code review yet. I made a comment and we probably should figure out if match being two different shapes causes v8 bailouts or not and if so what to do. We can land this in 2.0 since folks reach into these layer parameters currently.

index.js Outdated Show resolved Hide resolved
@wesleytodd
Copy link
Member

wesleytodd commented Feb 2, 2024

I just want clarify one thing: putting a secret in the URL is never acceptable. Every proxy layer, access log system, and tracing implementation will store those in plain text and often ship them off the instance to log aggregation systems. I am not opposed to the general idea of not keeping around information about completed requests, but if the purpose of this request is some "security" thing like it was tagged with when originally opened then I think we need to be clear that nothing in the URL is considered secure or secret.

@Congelli501
Copy link
Author

I just want clarify one thing: putting a secret in the URL is never acceptable. Every proxy layer, access log system, and tracing implementation will store those in plain text and often ship them off the instance to log aggregation systems. I am not opposed to the general idea of not keeping around information about completed requests, but if the purpose of this request is some "security" thing like it was tagged with when originally opened then I think we need to be clear that nothing in the URL is considered secure or secret.

Putting a secret in a URL should be avoided when possible, but there are a lot of use cases where you don't have a choice. Any reset password or magic connection link sent by email for example will have a token in the URL, and there is no other way to do it.

Even if there was no legitimate use cases for putting secrets in an URL, mitigating user mistakes (developers are a user of the lib) is always a good thing.
Take this far fetched example: It would be like saying that storing hashed password is not important because users should use a password manager and a different password for each service.

I agree that this is a very low security concern, anyone having the capability to read a token from the express service would obviously have others means to do so at disposal.

FYI, I was looking for memory leak in an application when I found this behaviour in express, as some new string / objects are stored on each request (replacing the previous one, so this is not a memory leak, it was just flagged by the memory profiler).

@wesleytodd
Copy link
Member

Any reset password or magic connection link sent by email for example will have a token in the URL

This is an application design choice, and if you are going to make that choice then you need to be aware of the potential consequences. In this case, the consequences go far beyond Express keeping a reference around in memory. For the sake of this PR any security discussion points are straw man arguments. If an attacker has an RCE or access to the process memory then they have more than your individual users password reset link as well. This is just not at all a security concern for the express project even in a "protect users from themselves" sort of way.

I am not saying we should not clean this up when the request is finished. Sorry for being so pedantic, we just don't want someone who doesn't know what is going on to see this and be concerned for no reason (or worse, file a CVE we need to fight).

@wesleytodd
Copy link
Member

I should add, all of the above comments still need to be addressed to consider moving this forward. Please clean up all of the extra changes and slim this down to just the change to clean up the state.

@Congelli501
Copy link
Author

@wesleytodd, do you mean this comment?

We can land this in 2.0 since folks reach into these layer parameters currently.

I don't really see a way to clean the sate without breaking these use cases.

An obvious way would be to just reset the state (this.params & this.path on Layer) back to empty values at some point, but I don't see where that could be added: these attributes are currently kept with the last request value as is as long as non async code is run (as they can be overwritten by another request).
This means that, even if we add a cleanup code after this line (https://github.com/pillarjs/router/blob/master/index.js#L350), it can still break existing usage of this.params & this.path.

I'm not familiar with the express codebase so there might be something I'm missing here.

@wesleytodd
Copy link
Member

wesleytodd commented Feb 5, 2024

EDIT: I didn't notice you had pushed some of the things which were requested to be changed. With this slimmed down part I see maybe those swaps were because you set match to false. Let me re-visit the code now that you did take a cleanup pass, sorry I didn't notice that at first since there were other conversation threads I was focused on.

do you mean this comment?

No I meant all the unrelated stuff like swapping !== checks to === checks and stuff that doesn't have anything to do with the change you are proposing.

An obvious way would be to just reset the state (this.params & this.path on Layer) back to empty values at some point, but I don't see where that could be added

Yeah, I agree we might not be able to find a place to clear those out that is not a breaking change. Either way, I and other contributors are not going to spend a bunch of time helping find that while the PR could never be merged because of all the unrelated changes. Once this is slimmed down to just the changes you need to achieve the goal we can discuss options for landing it.

return {
path: path,
params: {'0': decode_param(path)}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code has been around in this state for a very long time. My worry is that this new way creates garbage to collect (the new object added) on a hot path, which might be the reason it was done this way in the first place. Maybe @dougwilson will remember because this predates me by quite a bit. Without more context I would like to see us do some sort of performance pass on this since I don't think the feature addition would be worth it if we impacted performance in a negative way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A big part was preventing bail out due to the same variable being different shapes (i.e. here we now have something that can be a bool or an object, and v8 will struggle to make a bytecode and may be falling back to slow interpreter mode). Ideally if we want to keep things in hot paths fast they should definitely keep the same shapes for the byte code generator.

I don't remember off hand all the details for why it was mutating properties, but it was very likely to reduce generating a bunch of little objects only to trash them immediately. That can be addressed if an object pool is used for them, however.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't run through a perf yet as I am away, but I suspect using null instead of false would fix any bail outs since those would both be objects

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that is a good point as well. I think ideally we have some real world benchmarks to base these decisions off of, but I think in theory both the de-opt from different shapes and increased GC from these little objects will be perf regressions. It would be nice to comment these with links to the benchmarks at some point so that the context is kept in the code for future contributors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far what I have done is run the perf with the test suite, which the deopts from there usually pretty low hanging fruit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But ya, if it needs a shaped object can make a no match object at the top level and return that reference for the no match

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change to use null instead of false for the no match scenario (I added a new commit to help reading the new changes, but I can squash them if you prefer).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I see how that addresses the worry here. The change you made does mean you could return some of those boolean checks back to their original form I think, which is overall good to reduce the change for change's sake, but I think this comment thread here is about the method returning a boolean and not producing unnecessary garbage in the process.

I am not suggesting removing the last changes, but I think ideally we find a way to not create these objects yet still clean up the state after the layers are finished being handled.

@Congelli501
Copy link
Author

No I meant all the unrelated stuff like swapping !== checks to === checks and stuff that doesn't have anything to do with the change you are proposing.

I re-checked the PR, and all the changes I made are only related to the removal of the state.
The match function now returns either an object on success or false on failure (no match), so I need to check for === false instrad of !== true.

@wesleytodd
Copy link
Member

@Congelli501 yeah sorry I had not fully re-reviewed the code before making that comment. I edited as soon as I noticed that. Sorry if that was confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants