Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async version, remove depenency on RazorLight, implemented IDisposable #260

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

mikaelliljedahl
Copy link

No description provided.

mikaelliljedahl and others added 8 commits May 5, 2021 07:46
DoRequest made public.
Set maxRedirects = 10 (since there exists some incompatible sites that needed scraping...)
Remove calls to HtmlLogFormatter depending on RazorLight.
Moved logging to use injected service so dependency could be outside the core package.
Added Async versions of all calls, made Browser Disposable, moved logging outside core dll
@mikaelliljedahl
Copy link
Author

Finally got the tests to run. Had to do some adjustments to handle Async in the "Mockers" and update url to jkorpela.fi/cgi-bin/echo.cgi (missing https so we got redirected when submitting the forms).

Would be very glad if you could merge this PR and make a new nuget release (I already bumped the nuget version for it to be ready).

BR
Mikael

@takielias
Copy link

@kevingy Can You please check it?

@mikaelliljedahl
Copy link
Author

@kevingy Can You please check it?

@takielias I created a forked branch here: https://github.com/mikaelliljedahl/SimpleHeadlessBrowser and nuget here: https://www.nuget.org/packages/SimpleHeadlessBrowser/
waiting for this PR to completed ;)


public class Browser
public class Browser : IDisposable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? Browser does not have any unmanaged resources to require implementing IDisposable.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It helps the caller who forget to call Close to free up resources. It also fixes a memory leak if there is an Exception before the call to Close is done (or if the caller forgets). If not all frames are cleared there will be pointers to those hanging around. The Dispose method fixes those without the caller needing to know what is going on behind the hood.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikaelliljedahl SimpleBrowser doesn't have any unmanaged resources, therefore IDisposable is not appropriate. In the case of an Exception, the clean up that is in Close occurs when the instance of Browser is garbage collected.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was to solve this issue:
#121

Even if there are no unmanaged resources, if you have nested references inside a static variable that are never disposed the GC will never be able to do it's job. Since you have this static variable any references inside _allContext will prohibit the GC from freeing up resources.
This is inside Browser class.
private static readonly List<List> _allContexts = new List<List>();

The dispose method automatically calls the Close method even if exceptions happen. Maybe the Close method needs additional cleanup code to remove references from _allContexts. The important part is to make sure there are no references left from the static list to the disposed object (even if the user missed (eg due to exception) calling Close().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to Microsoft docs:
There are additional reasons for implementing Dispose, such as undoing something that was previously done. For example, freeing memory that was allocated, removing an item from a collection that was added, signaling the release of a lock that was acquired, and so on

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a PR to be implemented shortly, I'm going to fix the "memory leak" permanently by re-working the logging. The user will be able to configure how many pages they would like to navigate back - from 0 (no history) to the amount of memory they have. In my applications, I never use back. I'll probably set mine to 0.

SimpleBrowser/Browser.cs Outdated Show resolved Hide resolved
@@ -1027,7 +1099,9 @@ internal bool DoRequest(Uri uri, string method, NameValueCollection userVariable

try
{
using (IHttpWebResponse response = req.GetResponse())
System.Threading.Thread.Sleep(100);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the application need to sleep here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I used this in a project where the server wasn't fast enough. Should have put it in the program calling the code, not here. I am removing this.

var engine = new RazorLight.RazorLightEngineBuilder()
.UseMemoryCachingProvider()
.Build();
//var engine = new RazorLight.RazorLightEngineBuilder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please delete this commented code.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

public string RenderToString<TModel>(string template, string title, TModel model)
{

return RazorEngine.Engine.Razor.RunCompile(template, title, model.GetType(), model);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this solution. It looks like RazorLight has simply been replaced by RazorEngine. Was that the intention? I thought you wanted to be able to create a production library that didn't include a dependency on Razor.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention was to remove the dependency from the library, so I used a dependency injection pattern (using the IViewRenderService interface that only implements this method). In the sample program (which is not part of the nuget) I just show that RazorEngine could be used to render the log to string, but anything that implements the interface can be used. And if you like to use RazorLight, go ahead. The Interface doesn't care.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this as a step to move forward. I removed the dependency on RazorLight from SimpleBrowser and put all of the rendering stuff into a separate library by implementing the interface and setting the renderer in SimpleBrowser. This way, any renderer can be implemented - razor, json, email, etc.

@takielias
Copy link

@mikaelliljedahl Can you please fix the above issues?

…FC2068, Section 10.3.

Removed commented code from HtmlLogFormatter
@mikaelliljedahl
Copy link
Author

I have pushed the changes to the PreparePRToMainProject branch. Do I need to do something else to update the PR? (I am used to Microsoft DevOps)

@kevingy
Copy link
Contributor

kevingy commented Sep 28, 2021

I have pushed the changes to the PreparePRToMainProject branch. Do I need to do something else to update the PR? (I am used to Microsoft DevOps)

I'll check, but that is all you should have to do.

@takielias
Copy link

@mikaelliljedahl Did you check all the comments of @kevingy ?

@mikaelliljedahl
Copy link
Author

@mikaelliljedahl Did you check all the comments of @kevingy ?

Yea, but I guess we kind of disagree on the Browser object needed to implement IDisposable pattern. Another way to eliminate the memory leak problem is to completly remove the static variable containing the list of active browsers. That will of course change the behavior a bit. It has been done already in the separate fork branch that I created waiting for this PR to be accepted.

BR
Mikael

@takielias
Copy link

@mikaelliljedahl Did you check all the comments of @kevingy ?

Yea, but I guess we kind of disagree on the Browser object needed to implement IDisposable pattern. Another way to eliminate the memory leak problem is to completly remove the static variable containing the list of active browsers. That will of course change the behavior a bit. It has been done already in the separate fork branch that I created waiting for this PR to be accepted.

BR Mikael

@kevingy Could you please consider the PR?

@mikaelliljedahl
Copy link
Author

@takielias You are free to use the fork, I made a fresh nuget for it too with the static list of browsers causing the memory leaks removed: https://www.nuget.org/packages/SimpleHeadlessBrowser/

@kevingy
Copy link
Contributor

kevingy commented Jun 7, 2022

@mikaelliljedahl Implementing IDisposable will not fix the memory leak! You either don't have a full understanding of the problem internal to SimpleBrowser or you don't have a full understanding of IDisposable - or both.

SimpleBrowser maintains an infinitely growing collection of HTML nodes. So long as a Browser instance exists, the collection will grow every time a navigation occurs. (This is why I have always suggested transient Browser instances.) None of your changes address this real problem. Simply implementing IDisposable certainly won't fix it.

Furthermore, IDisposable "Provides a mechanism for releasing unmanaged resources." (https://docs.microsoft.com/en-us/dotnet/api/system.idisposable?view=net-6.0). The collection contains all managed resources. In fact, I don't believe there are any unmanaged resources in SimpleBrowser. Everything is managed. Therefore, there is no need for IDisposable in the project at all.

There is good work in this PR. I wish you would have separated the work out so that I could have accepted parts of it at a time rather than a single huge PR. Additionally, your changes break backwards compatibility. I don't want to have to rewrite my SimpleBrowser applications to support your changes.

@takielias What is it in the PR that you need, exactly. Are you wanting the asynchronous methods? A "fix" for the memory leak?

@takielias
Copy link

@kevingy async is awesome but memory leak is a big issue.

@kevingy
Copy link
Contributor

kevingy commented Jun 7, 2022

@kevingy async is awesome but memory leak is a big issue.

@takielias

Async is awesome! I would welcome it with open arms, so long as it was implemented in a way that didn't break backwards compatibility.

To avoid the memory leak, change your code to use transient Browser instances. That is, don't do this:

// This will leak, with an infinitely growing collection of HTML elements.
var browser = new Browser();
do 
{
    browser.Navigate("someurl");
} while (true);

Do this:

// This will still leak, but will be cleaned up when browser goes out of scope at the end of each loop.
do 
{
    var browser = new Browser();
    browser.Navigate("someurl");
} while (true);

The infinitely growing collection leak was added by, in my opinion, a bad contribution that was not fully vetted before being merged. To fix it, significant changes are required to the way that SimpleBrowser handles it's navigation history. The change was added so that when the user "clicks" the Back button, the SimpleBrowser loads the previous page from memory rather than navigating to the previous URL. So, the "leak" is an intentional "feature". The HTML element collection grows infinitely so that the user can click Back all the way to the first navigation. That feature (clicking the Back button) is probably little used and probably the greatest problem SimpleBrowser has.

@takielias
Copy link

@kevingy Thanks for the explanation. I used this Package in several projects but I did not need to use the Back button yet.

@mikaelliljedahl
Copy link
Author

mikaelliljedahl commented Jun 7, 2022

The "leak" you are describing is not what I would call a leak, it is just a consequence of how you use the browser if you open an infinite number of windows.
The problematic behavior is that the system can never free resources if there is an exception before calling Close();
The reason for this is that a static list of all browser windows are stored and those references are never removed unless you call Close().
IMO having a library that requires a consumer to call the Close method to remove the reference from the static list of all browsers is bad design. Having a static list of references to other browser windows stops the GC from freeing up memory causing the leak. The leak I am referring to is the one you have reproduced here:
#121
That is why I implemented an IDisposable pattern to make sure the cleanup is done automatically for you when your browser object goes out of scope (or due to an exception).

The implementation is not breaking the old behavior, you can just ignore the Disposable pattern and also ignore there is a Close method.
The Async pattern is neither a breaking change, you can still call the old non-async methods however you will get a build warning saying these methods are obsolete.

BR
Mikael

@kevingy
Copy link
Contributor

kevingy commented Nov 23, 2023

This PR has been around forever. I'm going to start pulling pieces of it in at a time, then ultimately deny this pull request in favor of the individual pieces.

@mikaelliljedahl
Copy link
Author

This PR has been around forever. I'm going to start pulling pieces of it in at a time, then ultimately deny this pull request in favor of the individual pieces.

How's it going pulling the pieces?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants