Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SOLR-16470: Create V2 equivalent of V1 Replication: Get files/{filePath} #2734

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

mlbiscoc
Copy link
Contributor

@mlbiscoc mlbiscoc commented Sep 30, 2024

https://issues.apache.org/jira/browse/SOLR-16470

Description

Create a V2 equivalent API for the replication "GET files/{filePath}"

V1
/solr/{coreName}/replication?command=filecontent&file=_0_Lucene99_0.tmd&wt=filestream

V2
/api/cores/{coreName}/replication/files/{filePath}?dirType=file

Other sample requests

/api/cores/{coreName}/replication/files/./tlog.0000000000000000000?dirType=tlogFile
/api/cores/{coreName}/replication/files/./solrconfig.xml?dirType=cf

Solution

Moved getFileStream() logic into ReplicationAPIBase class and updated the V1 endpoint to use the CoreReplicationAPI class so that the ReplicationHandler can be deprecated in the future.

For the V2 api, made dirType a required parameters specifying what file directory (cf/tlogFile/file) to fetch a file from and better expressed the optional parameters that exist for the api

  • offset
  • compression
  • checksum
  • maxWriteMBPerSec
  • generation

Tests

Wrote new unit test in CoreReplicationAPITest class testFetchFile()

Checklist

Please review the following and check all that apply:

  • I have reviewed the guidelines for How to Contribute and my code conforms to the standards described there to the best of my ability.
  • I have created a Jira issue and added the issue ID to my pull request title.
  • I have given Solr maintainers access to contribute to my PR branch. (optional but recommended, not available for branches on forks living under an organisation)
  • I have developed this patch against the main branch.
  • I have run ./gradlew check.
  • I have added tests for my changes.
  • I have added documentation for the Reference Guide

@mlbiscoc
Copy link
Contributor Author

Hey @gerlowskija, should be the last API for SOLR-16470 whenever you get time to review. Thanks!

@gerlowskija
Copy link
Contributor

Awesome - thanks @mlbiscoc . Sorry for the delay getting to this - took me awhile to catch back up after the travel last week. Reviewing now...

@@ -68,6 +71,45 @@ public FileListResponse fetchFileList(
return doFetchFileList(gen);
}

@GET
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all correct and in line with the v2 APIs you did back in 2023 (PR #1620 PR #1704).

But we've made a slight tweak since then to support the OpenAPI code generation stuff. In short - we needed the JAX-RS annotations to live somewhere that gets compiled before "core", so that "core" can use all the nice generated code. We managed this by creating a new Gradle module - "api" - where the JAX-RS annotations live on interfaces. (Code in "core" can then implement those interfaces with methods containing the API logic).

Long story short - we'll want to do the same here. There's some instructions here that might be helpful. Though in your case most of the steps are already done - the pieces just need moved around a bit.

(Slightly my fault for not calling out more explicitly on SOLR-16470 that the setup had changed since your earlier commits, so I'm willing to make the necessary changes here. You may beat me to it, but if not I'll try and update this eventually.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good to know. No worries though, I just got back from vacation so will find some bandwidth soon to make the necessary adjustments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll leave you to it, but if you have any questions or get stuck, lmk and I'll try to help out!

(Hope you had a great vacation!)

@GET
@Path("/files/{filePath}")
@Produces({MediaType.TEXT_PLAIN, BINARY_CONTENT_TYPE_V2})
public String fetchFile(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[?] Can this really return a String? Since it can return raw files I would've expected this to have to return InputStream or Object or something more generic in order to handle the potentially-binary content coming back?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String seems to work fine but if I remember correctly I wasn't sure what was a better return Object. There is a higher level indexFetcher test that seems to pass just with the string but let me do more digging.

@mlbiscoc
Copy link
Contributor Author

mlbiscoc commented Nov 1, 2024

Hey @gerlowskija made the changes but ended up kind of inflating this PR to support the OpenAPI code generation and moving files. Lmk if this went the right direction

Comment on lines 24 to 36
public class ReplicationFileResponse extends SolrJerseyResponse implements StreamingOutput {

public StreamingOutput dfs;

public ReplicationFileResponse(StreamingOutput stream) {
this.dfs = stream;
}

@Override
public void write(OutputStream outputStream) throws IOException, WebApplicationException {
dfs.write(outputStream);
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So instead of String, I found jax-rs supports StreamingOutput but the code generation didn't like how it compiled just returning StreamingOutput without a SolrJerseyResponse so I went with this approach for now. This might look a bit hacky, but wasn't sure what other ways. Maybe there is an obvious fix here but lmk if this is or isn't right.

@gerlowskija
Copy link
Contributor

Hey @gerlowskija made the changes but ended up kind of inflating this PR to support the OpenAPI code generation and moving files

Awesome! I see a lot of the increase comes from moving some of the other replication APIs ("list files", "get index version") from core over to the "api" module -- I've actually got another PR that takes a stab at some of those changes as well, so if we rebase one on top of the other, that should shrink this PR back down a good bit. Will take a crack at that shortly, and check back in when that's done!

Very curious about the "StreamingOutput" type you found. I've used InputStream in a few similar cases with other APIs (e.g. there's a "filestore" API that returns raw files, similar to this /replication one), though maybe StreamingOutput is better? Will have to do some reading...

The original 5014270 was temporarily reverted on branch to ease a merge
with 'main'.  Now that main is merged in, this commit replays all of
Matt's work for the file-fetching API on top of it.
@gerlowskija
Copy link
Contributor

Alright, merged in the latest 'main' and synced it with the changes here. The only real change of note is that the interface file is now named 'ReplicationApis.java' (I went with the name on 'main' since it was already there. The merge conflicts were a bit hairy, so if anything other than that seems "off", it's likely a mistake so please point it out 😆

Hoping to dig into the StreamingOutput thing shortly - thanks @mlbiscoc !

@mlbiscoc
Copy link
Contributor Author

Alright, merged in the latest 'main' and synced it with the changes here. The only real change of note is that the interface file is now named 'ReplicationApis.java' (I went with the name on 'main' since it was already there. The merge conflicts were a bit hairy, so if anything other than that seems "off", it's likely a mistake so please point it out 😆

Hoping to dig into the StreamingOutput thing shortly - thanks @mlbiscoc !

Great, thanks Jason. Will go back through it and take a look at the merge soon. Also need to look that precommit failure…

@gerlowskija
Copy link
Contributor

After a bit of reading, I agree with Matt that StreamingOutput seems to be the "preferred" approach.

The best explanation I could find on "why" comes from this SO post. To summarize: StreamingOutput is easier for various JAX-RS supported interceptor hooks (MessageBodyWriter, ResponseFilter, etc.). The example given is an interceptor that implements gzip-encoding - apparently that's tough to do on an OutputStream, but easier in some way for JAX-RS to handle when wrapped as a 'StreamingOutput'.

That said - supporting this type in our Java code-generation might take a bit more effort than we want to bite off here and now. The API should be fine as far as Javascript and Python clients are concerned, but the custom template we're using for Java makes that one a bit trickier.

I'll exempt fetchFile from Java code generation (we have an annotation for that purpose), and create a followup JIRA ticket around standardizing on and supporting StreamingOutput.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants