Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SWEEP: Add string overloads to eliminate unnecessary allocations of FileInfo and DirectoryInfo objects #832

Open
NightOwl888 opened this issue Apr 13, 2023 · 2 comments · May be fixed by #1060
Assignees
Labels
is:enhancement New feature or request pri:normal

Comments

@NightOwl888
Copy link
Contributor

In Java it is common to use the File or Path object as a parameter to methods, as they are accepted as parameters of stream constructors and many other places.

However, in .NET, the corresponding objects (TextWriter, Stream, StreamWriter, etc) don't accept a FileInfo as a parameter, they accept strings. As such, we are commonly newing up these FileInfo objects, passing them through our API and reading the original string back as FileInfo.FullName. The FileInfo object is not even used in many cases.

To keep the API the same as Lucene, we should keep the overloads that accept FileInfo, but add overloads that are identical in every way except that they accept a string for a file name instead of a FileInfo. The FileInfo overloads can then just pass FileInfo.FullName to the string overloads.

For example, we currently have

        /// <summary>
        /// Returns an <see cref="Stream"/> over the requested file, identifying
        /// the appropriate <see cref="Stream"/> instance similar to <see cref="GetInputStream(FileInfo)"/>.
        /// </summary>
        public static Stream GetOutputStream(FileInfo file)
        {
            // First, create a FileInputStream, as this will be required by all types.
            // Wrap with BufferedInputStream for better performance
            Stream os = new FileStream(file.FullName, FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite);
            return GetFileType(file).GetOutputStream(os);
        }

And we should change it to

        /// <summary>
        /// Returns an <see cref="Stream"/> over the requested <paramref name="file"/>, identifying
        /// the appropriate <see cref="Stream"/> instance similar to <see cref="GetInputStream(FileInfo)"/>.
        /// </summary>
        public static Stream GetOutputStream(FileInfo file)
            => GetOutputStream(file?.FullName);

        /// <summary>
        /// Returns an <see cref="Stream"/> over the requested <paramref name="filePath"/>, identifying
        /// the appropriate <see cref="Stream"/> instance similar to <see cref="GetInputStream(string)"/>.
        /// </summary>
        public static Stream GetOutputStream(string filePath)
        {
            // First, create a FileInputStream, as this will be required by all types.
            // Wrap with BufferedInputStream for better performance
            Stream os = new FileStream(filePath, FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite);
            return GetFileType(filePath).GetOutputStream(os);
        }

There are several places throughout the project where we are instantiating and then discarding FileInfo objects without even making use of their functionality (example). Some could instead use static methods (i.e. File.Exists). In some other cases, these FileInfo and DirectoryInfo instances are appropriate.

@NightOwl888 NightOwl888 added up-for-grabs This issue is open to be worked on by anyone good-first-issue Good for newcomers is:enhancement New feature or request pri:normal labels Apr 13, 2023
@NightOwl888 NightOwl888 self-assigned this Apr 16, 2023
@NightOwl888 NightOwl888 removed up-for-grabs This issue is open to be worked on by anyone good-first-issue Good for newcomers labels Apr 16, 2023
@ljcollins25
Copy link

ljcollins25 commented Dec 19, 2023

@NightOwl888, what do you think creating a new struct type (named 'FilePath') which is implicitly convertible from FileInfo and String, but only implicitly convertible to String? That's a strategy I've used in the past to simplify API migrations like this. It does have the downside of not being runtime compatible, so any consumers of the updated API will need to recompile.

@NightOwl888
Copy link
Contributor Author

@ljcollins25

We have considered that approach for char sequences. In Java, String, char[], and StringBuilder all implement the CharSequence interface, but in .NET there is no common interface between them (and we should also deal with ReadOnlySpan<char>), so it was up for consideration. But ultimately we decided against such an approach due to the confusion it may cause - if an API says it accepts a CharSequence struct, it may not be obvious that you can pass in a string. Besides, ReadOnlySpan<char> now implicitly converts both String and char[], which is far more intuitive to use.

However, being that we are only passing in FileInfo and DirectoryInfo to match the Java API and they ultimately are just allocations we don't need because FileStream and other downstream APIs don't accept them as parameters, we have to read out the strings to concatenate, etc., they aren't really needed in .NET. We only want to keep these overloads around to match Java Lucene.

That being said, FileInfo and DirectoryInfo do more than just allocate memory. When the string is passed into the constructor of FileInfo and DirectoryInfo, it is resolved to handle relative paths and symbolic links. However, this logic is also exposed in .NET publicly as static methods, so we still can eliminate the allocations without changing the behavior, it just takes a bit more work analyzing what is going on when FileInfo and DirectoryInfo are used and duplicating it for the string overloads. Sure, we could handle that in a custom struct to make our job easier, but IMO it makes the end user's job harder, which is not worth the tradeoff.

@paulirwin paulirwin added this to the 4.8.0-beta00018 milestone Oct 28, 2024
@paulirwin paulirwin self-assigned this Dec 5, 2024
paulirwin added a commit to paulirwin/lucene.net that referenced this issue Dec 5, 2024
paulirwin added a commit to paulirwin/lucene.net that referenced this issue Dec 5, 2024
paulirwin added a commit to paulirwin/lucene.net that referenced this issue Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
is:enhancement New feature or request pri:normal
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants