Skip to content

Commit

Permalink
Fix to use custom straight forward lock, read and write functions to …
Browse files Browse the repository at this point in the history
…lower memory usage
  • Loading branch information
iliajie committed Aug 3, 2024
1 parent c3a6564 commit c3098fe
Show file tree
Hide file tree
Showing 2 changed files with 68 additions and 5 deletions.
69 changes: 65 additions & 4 deletions stats-lib-funcs.pl
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
# Licensed under MIT (https://github.com/authentic-theme/authentic-theme/blob/master/LICENSE)
#
use strict;
use Fcntl ':flock';

our $json;
eval "use JSON::XS";
Expand Down Expand Up @@ -37,6 +38,68 @@
virt => acl_system_status('virt'),
};

# Read file contents
sub stats_read_file_contents {
my ($filename) = @_;
# Check if file is readable
return undef unless -r $filename;
# Read file contents
my $fh;
if (!open($fh, '<', $filename)) {
error_stderr("Failed to open file '$filename' for reading: $!");
return undef;
}
# Acquire lock
if (!flock($fh, LOCK_SH)) {
error_stderr("Failed to acquire shared lock on file '$filename': $!");
close($fh);
return undef;
}
# Read file contents
local $/;
my $contents = <$fh>;
# Release lock
flock($fh, LOCK_UN);
# Close file
if (!close($fh)) {
error_stderr("Failed to close file '$filename': $!");
return undef;
}
# Return file contents
return $contents;
}

# Write file contents
sub stats_write_file_contents {
my ($filename, $contents) = @_;
# Open file for writing
my $fh;
if (!open($fh, '>', $filename)) {
error_stderr("Failed to open file '$filename' for writing: $!");
return 0;
}
# Acquire lock the file for writing
if (!flock($fh, LOCK_EX)) {

This comment has been minimized.

Copy link
@jcameron

jcameron Aug 8, 2024

Collaborator

flock is great, but doesn't this mean that the file has already been opened for writing and thus truncated before the lock is taken? This is why Webmin's lock_file function uses a .lock file to prevent concurrent access before opening the file.

This comment has been minimized.

Copy link
@iliajie

iliajie Aug 8, 2024

Author Collaborator

but doesn't this mean that the file has already been opened for writing and thus truncated before the lock is taken?

Yep! Though it doesn't strictly matter for this particular case scenario, as the stats.pl process is the only process that can write to the file. We could avoid file locking completely.

This is why Webmin's lock_file function uses a .lock file to prevent concurrent access before opening the file.

Though, I agree we should consider potential race condition. I have improved it according to your suggestion! And, I think writing a .lock file to the disk is overhead nowadays.

Check this polished version 910ae5a please.

error_stderr("Failed to acquire exclusive lock on file '$filename': $!");
close($fh);
return 0;
}
# Write to file
if (!print($fh $contents)) {
error_stderr("Failed to write to file '$filename': $!");
close($fh);
return 0;
}
# Unlock the file
flock($fh, LOCK_UN);
# Close file
if (!close($fh)) {
error_stderr("Failed to close file '$filename': $!");
return 0;
}
return 1;
}

# Check if feature is enabled
sub get_stats_option
{
Expand Down Expand Up @@ -181,7 +244,7 @@ sub get_stats_history
my ($noempty) = @_;
my $file = "$var_directory/modules/$current_theme".
"/real-time-monitoring.json";
my $graphs = jsonify(read_file_contents($file));
my $graphs = jsonify(stats_read_file_contents($file));
# No data yet
if (!keys %{$graphs}) {
unlink($file);
Expand Down Expand Up @@ -267,9 +330,7 @@ sub save_stats_history
# Save data
my $file = "$var_directory/modules/$current_theme".
"/real-time-monitoring.json";
lock_file($file);
write_file_contents($file, $json->encode($graphs));
unlock_file($file);
stats_write_file_contents($file, $json->encode($graphs));
# Release memory
undef($graphs_chunk);
undef($all_stats_histoy);
Expand Down
4 changes: 3 additions & 1 deletion stats.cgi
Original file line number Diff line number Diff line change
Expand Up @@ -94,12 +94,14 @@ my $statsserver_cmd = "$config_directory/$current_theme/$server_name";
create_wrapper($statsserver_cmd, $current_theme, $server_name)
if (!-r $statsserver_cmd);

# Launch the server
# Launch the server in a sub-process (no fork)
my $logfile = $get_logfile->($port);
my $rs = system_logged(
"ulimit -v 384000; ". # Set sanity limit for memory usage to 384 MB

This comment has been minimized.

Copy link
@jcameron

jcameron Aug 8, 2024

Collaborator

This seems risky if the memory needed by stats.pl ever legitimately exceeds 384 MB, as it would cause and endless crash loop. Is it really needed?

This comment has been minimized.

Copy link
@iliajie

iliajie Aug 8, 2024

Author Collaborator

Is it really needed?

No, it actually not needed. I have removed it in this 7c89110 patch.

"SESSION_ID=$main::session_id ".
"$statsserver_cmd @{[quotemeta($port)]} ".
">$logfile 2>&1 </dev/null &");

# Return the result
print_json({ success => !$rs, port => $port,
socket => $get_socket->($port),
Expand Down

11 comments on commit c3098fe

@jcameron
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do these locking and file reading functions fix the memory issue exactly?

@iliajie
Copy link
Collaborator Author

@iliajie iliajie commented on c3098fe Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do these locking and file reading functions fix the memory issue exactly?

I didn't dig deeply, so I cannot tell you exactly how. However, my assumption is that there is something in the lock_file, unlock_file, read_file_contents, and write_file_contents subs and everything they call subsequently that created this issue.

Nevertheless, it's understandable because there has never been such an application for the Webmin API before — it was always meant to interrupt the process in a short period of time.

@jcameron
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I'm not comfortable with a fix that we don't understand how it works! Also, reading/writing and locking/unlocking are very different operations, so it would be useful to know which one of these is causing the memory leak.

@iliajie
Copy link
Collaborator Author

@iliajie iliajie commented on c3098fe Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't understand how it works!

We clearly understand how it work! Check this comment please.

What we don't understand (yet) is why the current implementation of lock_file and unlock_file doesn't release resources. I have an off-hand guess that it has everything to do with garbage collection and how things are meant to be cleared when terminating the whole process. In stats.pl, we don't have any of that since this is a long-running process.

Also, reading/writing and locking/unlocking are very different operations, so it would be useful to know which one of these is causing the memory leak.

Sure, I can dig into that, but it shouldn't change anything regarding the current local implementation.

@jcameron
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we don't understand (yet) is why the current implementation of lock_file and unlock_file doesn't release resources.

That's pretty important!!! I would really rather we don't add new features if they have memory leaks when using the standard APIs in Webmin. Implementing a work-around to fix it isn't solving the root problem, and could mean that we're not truly fixing the issue properly. I know you're excited about this new real-time stats feature, but it has to be done correctly, even if that means fixing bugs in the Webmin API.

Also, why is the save_stats_history function needed at all? I though that the long-running stats.pl process was just serving data from memory..

Also, is there only one stats.pl process running ever? If so, maybe locking isn't even needed?

@iliajie
Copy link
Collaborator Author

@iliajie iliajie commented on c3098fe Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementing a work-around to fix it isn't solving the root problem, and could mean that we're not truly fixing the issue properly.

This is correct, and I agree! However, the Webmin API has never been used like this before, i.e. in infinitely running process.

I created this simple and effective solution to perhaps help you rethink how we manage file locking and unlocking. I wouldn't risk changing these core subroutines right before the new Webmin release! I needed a solution that worked locally, and I made it happen.

I know you're excited about this new real-time stats feature, but it has to be done correctly, even if that means fixing bugs in the Webmin API.

I absolutely agree. However, the Webmin API doesn't need an immediate fix for this. As I've mentioned before, stats.pl is a never-ending, long-running process. It's entirely new, and make take months to address properly. We need a new Webmin release now!

Also, why is the save_stats_history function needed at all? I though that the long-running stats.pl process was just serving data from memory..

Yes, it does, but we don't want to show empty graphs when the user returns to the Dashboard or logs back in. We want the user to see the last 20 minutes of server activity. So, we save stats to a file on the disk every 30 seconds and trim the file to keep only 20 minutes of data (this duration is fixed and can't be changed in the UI, though can be changed using browser's console but no more than to 60 minutes).

Also, is there only one stats.pl process running ever? If so, maybe locking isn't even needed?

Yes, as I mentioned earlier, it's not strictly necessary. However, since I've already implemented it, I'd prefer to keep it for a potentially easier integration with the Webmin API in the future.

@jcameron
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because a solution "works" doesn't mean it is correct. I'd actually prefer to disable this new stats.pl feature by default until we can get it implemented properly, than to push out a new release that contains a work-around solution that we don't understand. We really have to focus on doing things right, not on new features or rushing out new releases. Because that's where bugs and security holes come from.

I understand why we need to save the stats history though, so we shouldn't remove that.

The real question is where the memory leak comes from.. if you were to revert all this code and just use read_file_contents and write_file_contents with no locking, does the leak still happen?

@iliajie
Copy link
Collaborator Author

@iliajie iliajie commented on c3098fe Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because a solution "works" doesn't mean it is correct.

Jamie, I completely agree and understand! Though, what I don't understand yet, is what exactly am I doing wrong in my currently checked-in code? Does my current code introduce a bug? If so, how exactly? If there’s anything wrong with my code, I’d be happy to fix it right here and now.

I'd actually prefer to disable this new stats.pl feature by default

This won’t be possible to disable it without overriding each and every user preference — both when disabling it now and later when re-enabling it.

Until we can get it implemented properly

What is implemented wrong exactly? I would love to fix it, right here and right now. If you told me what's wrong.

Than to push out a new release that contains a work-around solution

Which workaround? I'm just reading and writing a file to the disk.

That we don't understand

Are you referring to a memory leak problem that occurs when I don’t use locally implemented functions to lock, read, and write a file? Well, in that case, I don’t see how this relates to stats.pl in any way .. But I agree that we don’t yet understand why it happens when the other subs are used, though this is not related to the currently checked-in code.

We really have to focus on doing things right, not on new features or rushing out new releases. Because that's where bugs and security holes come from.

I cannot agree more!

I understand why we need to save the stats history though, so we shouldn't remove that.

Excellent, thanks!

The real question is where the memory leak comes from

To my knowledge, with the currently checked-in code, it isn’t coming from anywhere in stats.pl or any related stats libraries. Or is it?

If you were to revert all this code and just use read_file_contents and write_file_contents with no locking, does the leak still happen?

This is a good question! I’m not sure, as I replaced those all at once.

Since you asked, I’ll run deeper tests today and let you know! Thank you!

@iliajie
Copy link
Collaborator Author

@iliajie iliajie commented on c3098fe Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you were to revert all this code and just use read_file_contents and write_file_contents with no locking, does the leak still happen?

This is a good question! I’m not sure, as I replaced those all at once.

Since you asked, I’ll run deeper tests today and let you know!

Alright, from what I can observe, I don't see any clear indication of a memory leak when using the stock read_file_contents and write_file_contents subs.

However, after adding lock_file and unlock_file, I clearly noticed memory usage for the stats.pl process climbing up.

Now, I'm trying lock_file($file, undef, undef, 1); instead to see how it goes to narrow down the problem ..

@iliajie
Copy link
Collaborator Author

@iliajie iliajie commented on c3098fe Aug 9, 2024 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcameron
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, using no-diff upon locking a file with Webmin API seems to
help

Yes!! That would explain it ... when logging is enabled, the lock_file / unlock_file functions store diffs in RAM so they can be included if and when webmin_log is eventually called. And since stats.pl runs forever, they would just accumulate and consume unlimited memory! Glad you were able to find this.

Although lock_file has a param to turn off logging, there's also a global variable $no_log_file_changes that you should set instead at the top of stats.pl so that all lock_file functions called indirectly don't do any logging of diffs into memory.

To answer your original question, THIS is the fix I was looking for because now we truly understand what is going wrong, and it only requires a single line of code to fix it. That's clearly the better option than writing a custom locking function for this case.

Please sign in to comment.