Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make load-data apparently not loading data to cluster #20

Open
arminus opened this issue Apr 16, 2021 · 0 comments
Open

make load-data apparently not loading data to cluster #20

arminus opened this issue Apr 16, 2021 · 0 comments
Labels

Comments

@arminus
Copy link

arminus commented Apr 16, 2021

This is the output when running make load-data: (I ran make before that, there's a 384,4MB sansa-examples-spark.jar present in examples/jars and my setup appears to be running fine):

make load-data
docker run -it --rm -v /home/www/bde/SANSA-Notebooks/sansa-notebooks/examples/data:/data --net spark-net -e "CORE_CONF_fs_defaultFS=hdfs://namenode:8020" bde2020/hadoop-namenode:1.1.0-hadoop2.8-java8 hdfs dfs -copyFromLocal /data /data
Configuring core
 - Setting fs.defaultFS=hdfs://namenode:8020
Configuring hdfs
 - Setting dfs.namenode.name.dir=file:///hadoop/dfs/name
Configuring yarn
Configuring httpfs
Configuring kms
Configuring for multihomed network
docker exec -it namenode hdfs dfs -ls /data
Found 1 items
drwxr-xr-x   - root supergroup          0 2021-04-16 13:45 /data/data

This was the 2nd time I ran make load-data, so besides apparently not uploading any data, I recreated another data dir inside /data on the 2nd run.

Navigating to http://localhost:8088/filebrowser/#/data I can see the nested dat dir but nothing else.

-> I un-jared sansa-examples-spark.jar into examples/data so that the data gets picked up by make load-data, but that step seems to be missing in one of the build targets.

In conjunction with that, the Zeppelin RDF notebook references a file hdfs://namenode:8020/data/rdf.nt - that file is not present in sansa-examples-spark.jar - so I wonder if there's some other issue in play here?

As a side note, copying the data now seems to be running forever (on a reasonable fast Linux box)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants