-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async version, remove depenency on RazorLight, implemented IDisposable #260
base: master
Are you sure you want to change the base?
Async version, remove depenency on RazorLight, implemented IDisposable #260
Conversation
DoRequest made public. Set maxRedirects = 10 (since there exists some incompatible sites that needed scraping...)
Remove calls to HtmlLogFormatter depending on RazorLight.
Moved logging to use injected service so dependency could be outside the core package.
Added Async versions of all calls, made Browser Disposable, moved logging outside core dll
Finally got the tests to run. Had to do some adjustments to handle Async in the "Mockers" and update url to jkorpela.fi/cgi-bin/echo.cgi (missing https so we got redirected when submitting the forms). Would be very glad if you could merge this PR and make a new nuget release (I already bumped the nuget version for it to be ready). BR |
@kevingy Can You please check it? |
@takielias I created a forked branch here: https://github.com/mikaelliljedahl/SimpleHeadlessBrowser and nuget here: https://www.nuget.org/packages/SimpleHeadlessBrowser/ |
|
||
public class Browser | ||
public class Browser : IDisposable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? Browser does not have any unmanaged resources to require implementing IDisposable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It helps the caller who forget to call Close to free up resources. It also fixes a memory leak if there is an Exception before the call to Close is done (or if the caller forgets). If not all frames are cleared there will be pointers to those hanging around. The Dispose method fixes those without the caller needing to know what is going on behind the hood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikaelliljedahl SimpleBrowser doesn't have any unmanaged resources, therefore IDisposable is not appropriate. In the case of an Exception, the clean up that is in Close occurs when the instance of Browser is garbage collected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was to solve this issue:
#121
Even if there are no unmanaged resources, if you have nested references inside a static variable that are never disposed the GC will never be able to do it's job. Since you have this static variable any references inside _allContext will prohibit the GC from freeing up resources.
This is inside Browser class.
private static readonly List<List> _allContexts = new List<List>();
The dispose method automatically calls the Close method even if exceptions happen. Maybe the Close method needs additional cleanup code to remove references from _allContexts. The important part is to make sure there are no references left from the static list to the disposed object (even if the user missed (eg due to exception) calling Close().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to Microsoft docs:
There are additional reasons for implementing Dispose, such as undoing something that was previously done. For example, freeing memory that was allocated, removing an item from a collection that was added, signaling the release of a lock that was acquired, and so on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a PR to be implemented shortly, I'm going to fix the "memory leak" permanently by re-working the logging. The user will be able to configure how many pages they would like to navigate back - from 0 (no history) to the amount of memory they have. In my applications, I never use back. I'll probably set mine to 0.
SimpleBrowser/Browser.cs
Outdated
@@ -1027,7 +1099,9 @@ internal bool DoRequest(Uri uri, string method, NameValueCollection userVariable | |||
|
|||
try | |||
{ | |||
using (IHttpWebResponse response = req.GetResponse()) | |||
System.Threading.Thread.Sleep(100); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the application need to sleep here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, I used this in a project where the server wasn't fast enough. Should have put it in the program calling the code, not here. I am removing this.
SimpleBrowser/HtmlLogFormatter.cs
Outdated
var engine = new RazorLight.RazorLightEngineBuilder() | ||
.UseMemoryCachingProvider() | ||
.Build(); | ||
//var engine = new RazorLight.RazorLightEngineBuilder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please delete this commented code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
public string RenderToString<TModel>(string template, string title, TModel model) | ||
{ | ||
|
||
return RazorEngine.Engine.Razor.RunCompile(template, title, model.GetType(), model); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this solution. It looks like RazorLight has simply been replaced by RazorEngine. Was that the intention? I thought you wanted to be able to create a production library that didn't include a dependency on Razor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention was to remove the dependency from the library, so I used a dependency injection pattern (using the IViewRenderService interface that only implements this method). In the sample program (which is not part of the nuget) I just show that RazorEngine could be used to render the log to string, but anything that implements the interface can be used. And if you like to use RazorLight, go ahead. The Interface doesn't care.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used this as a step to move forward. I removed the dependency on RazorLight from SimpleBrowser and put all of the rendering stuff into a separate library by implementing the interface and setting the renderer in SimpleBrowser. This way, any renderer can be implemented - razor, json, email, etc.
@mikaelliljedahl Can you please fix the above issues? |
…FC2068, Section 10.3. Removed commented code from HtmlLogFormatter
I have pushed the changes to the PreparePRToMainProject branch. Do I need to do something else to update the PR? (I am used to Microsoft DevOps) |
I'll check, but that is all you should have to do. |
@mikaelliljedahl Did you check all the comments of @kevingy ? |
Yea, but I guess we kind of disagree on the Browser object needed to implement IDisposable pattern. Another way to eliminate the memory leak problem is to completly remove the static variable containing the list of active browsers. That will of course change the behavior a bit. It has been done already in the separate fork branch that I created waiting for this PR to be accepted. BR |
@kevingy Could you please consider the PR? |
@takielias You are free to use the fork, I made a fresh nuget for it too with the static list of browsers causing the memory leaks removed: https://www.nuget.org/packages/SimpleHeadlessBrowser/ |
@mikaelliljedahl Implementing IDisposable will not fix the memory leak! You either don't have a full understanding of the problem internal to SimpleBrowser or you don't have a full understanding of IDisposable - or both. SimpleBrowser maintains an infinitely growing collection of HTML nodes. So long as a Browser instance exists, the collection will grow every time a navigation occurs. (This is why I have always suggested transient Browser instances.) None of your changes address this real problem. Simply implementing IDisposable certainly won't fix it. Furthermore, IDisposable "Provides a mechanism for releasing unmanaged resources." (https://docs.microsoft.com/en-us/dotnet/api/system.idisposable?view=net-6.0). The collection contains all managed resources. In fact, I don't believe there are any unmanaged resources in SimpleBrowser. Everything is managed. Therefore, there is no need for IDisposable in the project at all. There is good work in this PR. I wish you would have separated the work out so that I could have accepted parts of it at a time rather than a single huge PR. Additionally, your changes break backwards compatibility. I don't want to have to rewrite my SimpleBrowser applications to support your changes. @takielias What is it in the PR that you need, exactly. Are you wanting the asynchronous methods? A "fix" for the memory leak? |
@kevingy async is awesome but memory leak is a big issue. |
Async is awesome! I would welcome it with open arms, so long as it was implemented in a way that didn't break backwards compatibility. To avoid the memory leak, change your code to use transient Browser instances. That is, don't do this:
Do this:
The infinitely growing collection leak was added by, in my opinion, a bad contribution that was not fully vetted before being merged. To fix it, significant changes are required to the way that SimpleBrowser handles it's navigation history. The change was added so that when the user "clicks" the Back button, the SimpleBrowser loads the previous page from memory rather than navigating to the previous URL. So, the "leak" is an intentional "feature". The HTML element collection grows infinitely so that the user can click Back all the way to the first navigation. That feature (clicking the Back button) is probably little used and probably the greatest problem SimpleBrowser has. |
@kevingy Thanks for the explanation. I used this Package in several projects but I did not need to use the Back button yet. |
The "leak" you are describing is not what I would call a leak, it is just a consequence of how you use the browser if you open an infinite number of windows. The implementation is not breaking the old behavior, you can just ignore the Disposable pattern and also ignore there is a Close method. BR |
This PR has been around forever. I'm going to start pulling pieces of it in at a time, then ultimately deny this pull request in favor of the individual pieces. |
How's it going pulling the pieces? |
No description provided.