Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tolerate loading empty linkset files into IMS #62

Open
randykerber opened this issue Nov 27, 2017 · 19 comments
Open

Tolerate loading empty linkset files into IMS #62

randykerber opened this issue Nov 27, 2017 · 19 comments
Assignees
Milestone

Comments

@randykerber
Copy link
Contributor

Currently, when attempt to load a linkset file into IMS, if the linkset file is empty an Exception is thrown and the load terminates. emty means there are zero links in the linkset file. Here is contents of such a file:

@randykerber
Copy link
Contributor Author

file: PDB/LINKSET_EXACT_PDB20171110.ttl

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix ops: <http://chemistry.openphacts.org//> .
@prefix : <http://chemistry.openphacts.org/download/20171110/PDB/LINKSET_EXACT_PDB20171110.ttl#> .
<http://chemistry.openphacts.org/download/20171110/PDB/LINKSET_EXACT_PDB20171110.ttl> void:inDataset <http://chemistry.openphacts.org/download/20171110/void_2017-11-10.ttl#pdb_exactMatch> .

@randykerber
Copy link
Contributor Author

The only triple in the file is a void:inDataset statement. There are no "link" triples.

@randykerber
Copy link
Contributor Author

I remember running into this issue earlier in the year. When I looked into it I recall the Exception was caused by some sort of pre-check phase performed on each file, where some piece of code attempted to read the first link triple/line of the file and verify that it had the correct link-predicate (and perhaps other checks). However, when there was no "first record", it resulted in an Exception. I could not see a way to fix without possibly wrecking other functionality.

This was probably 6+ months ago, so memory is a bit fuzzy.

@randykerber
Copy link
Contributor Author

The correct behavior is probably to issue some sort of WARNING message that an empty linkset file was encountered, but to otherwise allow the process to proceed. Basically, an empty linkset file should be allowed and proceed with the load.

@AlasdairGray
Copy link
Member

Can you justify why we should allow loading an empty linkset file?

It is a deliberate quality assurance step that the loader performs these test.

@egonw
Copy link
Member

egonw commented Nov 27, 2017

We were thinking about automating things more, and in an automated process it's easy enough to have a data set where there are no links... they found three such link sets right now... is an empty link set semantically wrong?

@AlasdairGray
Copy link
Member

We took the view that it is wrong. You have metadata saying there are links between these two data sources but then there are no links.

As such, we wanted the automated loading process to fail so that we would have to fix the issue rather than just seeing a bunch of warnings that are easy to ignore and pass over.

@stain
Copy link
Member

stain commented Nov 28, 2017

Agree with @AlasdairGray that a linkset with no links is not really a linkset (and so should not declare itself as that), and a warning would be easily ignored (as IMS has loads of them) - and so you would no longer detect the probably more common case that the void declares the wrong void:linkPredicate.

But perhaps it could be allowed if the void also says void:triples 0?

@egonw
Copy link
Member

egonw commented Nov 28, 2017

I'm in favor of allowing empty sets. I don't think the IMS loading is the right place or time to complain about this. The IMS function does not break after loading an empty set (which is still a set). The predicate cannot be wrong either, with zero links, so an exception for that sounds semantically wrong to me.

I see a link set as the result of a calculation (process) to determine the links. In fact, for many of our link sets these are in fact calculated. For others, the process is an analysis of an external data set...

An explicit claim that given that process that there are zero links makes a lot of sense to me. Without any VoID we also loose the observation that there were no links.

@AlasdairGray
Copy link
Member

I think your final point is more about the outcome of the linking process rather than the use of the linksets.

I would argue that from the use of linksets, having a set with no links doesn't make sense and will have an impact on the performance of the system, as we could easily generate thousands of linksets with no links, e.g. UniProt:Protein to aers:AdverseEvent.

@stain
Copy link
Member

stain commented Nov 30, 2017

I don't think performance will be an issue, but there might be an issue if the empty linkset causes a false transitive path possibility. Not tested!

But we can just add an --ignore-empty flag then and see how it goes? To the command line or loader XML file?

@AlasdairGray
Copy link
Member

I was not referring the performance of the loader, but of the computation of links.

@egonw can you give a concrete use case where you have a linkset with no links that should be loaded into the IMS?

@egonw
Copy link
Member

egonw commented Dec 3, 2017

@AlasdairGray, Randy/Ian are trying to finish the OPS 2.2 API release and running into empty link sets. Since many link sets are autogenerated, as part of a update release flow, some end up empty. See the aforementioned example by Randy...

@AlasdairGray
Copy link
Member

From the data governance perspective, I think it is better that we weed out the empty linksets as part of the loading process. If they are truely meant to be empty then we don't need to load them, they would generate no query answers. If they are not meant to be empty then we need to find out why they are empty.

@egonw egonw added this to the 2.3.0 milestone Feb 5, 2018
@egonw
Copy link
Member

egonw commented Feb 5, 2018

@randykerber, what command line call did you make on the above (empty) link set that we can use to reproduce the exception? (And thus, test the --ignore-empty option to be implemented...)

@randykerber
Copy link
Contributor Author

The load command line would have used a load.xml that loaded all openphacts linksets (more than 100). I'm guessing you want something more concise?

The second entry in this issue has a complete linkset file that triggered this exception. I can fetch the void file that goes along with this. That enough?

@randykerber
Copy link
Contributor Author

I've attached the void file (below) that references the zero-link linkset listed in the second entry of this 'issues' page.

I had to rename the file to void_2017-11-10.ttl.txt, else GitHub would not accept it. Should be named void_2017-11-10.ttl.

void_2017-11-10.ttl.txt

@egonw
Copy link
Member

egonw commented Apr 12, 2018

Stacktrace:

Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at com.simontuffs.onejar.Boot.run(Boot.java:340)
        at com.simontuffs.onejar.Boot.main(Boot.java:166)
Caused by: org.bridgedb.utils.BridgeDBException: Error loading 
        at uk.ac.manchester.cs.openphacts.ims.loader.RunLoader.main(RunLoader.java:241)
        ... 6 more
Caused by: org.bridgedb.utils.BridgeDBException: Found 0 where 1 expected
        at uk.ac.manchester.cs.openphacts.ims.loader.LinksetLoader.getObjectURI(LinksetLoader.java:169)
        at uk.ac.manchester.cs.openphacts.ims.loader.LinksetLoader.getParser(LinksetLoader.java:132)
        at uk.ac.manchester.cs.openphacts.ims.loader.LinksetLoader.load(LinksetLoader.java:91)
        at uk.ac.manchester.cs.openphacts.ims.loader.LinksetLoader.load(LinksetLoader.java:69)
        at uk.ac.manchester.cs.openphacts.ims.loader.RunLoader.loadLinkset(RunLoader.java:102)
        at uk.ac.manchester.cs.openphacts.ims.loader.RunLoader.main(RunLoader.java:225)
        ... 6 more

@egonw
Copy link
Member

egonw commented Apr 17, 2018

OK, turns out that to test this, I need to bake at least a new .war and possibly a new Docker... I am working on this, but Tina will likely want to make a new PathVisio release and Anders a new pvjs release, so pushing this to the next release... which will, in all fairness, follow really soon... Oh, and Jonathan is waiting for a release too... #rsro

@egonw egonw modified the milestones: 2.3.0, 2.4.0 Apr 17, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants