From 8a7dbb35adcd0209492212a9a1b4bdc53df0d2da Mon Sep 17 00:00:00 2001 From: Bongjin Koo Date: Sun, 15 May 2022 14:23:32 -0700 Subject: [PATCH] Issue #74: Update download_and_upload_with_osf.ipynb. --- notebooks/download_and_upload_with_osf.ipynb | 210 ++++++------------- 1 file changed, 62 insertions(+), 148 deletions(-) diff --git a/notebooks/download_and_upload_with_osf.ipynb b/notebooks/download_and_upload_with_osf.ipynb index a81cbae..ef51b77 100644 --- a/notebooks/download_and_upload_with_osf.ipynb +++ b/notebooks/download_and_upload_with_osf.ipynb @@ -11,15 +11,11 @@ "source": [ "# Download and Upload with OSF\n", "\n", - "This tutorial shows how to download and upload cryo-EM datasets using the `datasets` module from `ioSPI`, that interact with the [Open Science Foundation (OSF)](https://osf.io/) framework.\n", + "This tutorial shows how to download and upload cryo-EM datasets using the `datasets` module from `ioSPI`, which interact with the [Open Science Foundation (OSF)](https://osf.io/) framework. We will also learn how to list and remove files.\n", "\n", "OSF is an initiative that aims to increase the openness, reproducibility and integrity of scientific research. Among other functionalities, it is possible to upload scientific data which can be accessed by an Application Programming Interface (API). \n", "\n", - "``ioSPI`` offers functionalities that allow uploading and accessing cryo-EM data using:\n", - "- either, in order to get started: using the class `Project` that leverages the package `osfclient`\n", - "- or, if the user requires finer control: using the class `OSFUpload` which follows `OSF APIv2`. \n", - "\n", - "This tutorial introduces both options." + "`ioSPI` offers functionalities that allow uploading and accessing cryo-EM data using the class `OSFProject` that leverages the package `osfclient`." ] }, { @@ -31,7 +27,7 @@ } }, "source": [ - "# Set-up" + "# Setup" ] }, { @@ -43,9 +39,9 @@ } }, "source": [ - "First, you will need to get setup with osf.\n", + "First, you will need to get setup with OSF.\n", "\n", - "- Create an account on https://osf.io/ and save the email address you use.\n", + "- Create an account on and save the email address you use.\n", "- On this account, create a personal token in [Settings](https://osf.io/settings/tokens) and save it.\n", "\n", "The email address and the token will be needed to connect to different OSF projects.\n", @@ -55,7 +51,7 @@ }, { "cell_type": "code", - "execution_count": 1, + "execution_count": 3, "id": "e74dd634", "metadata": { "pycharm": { @@ -105,20 +101,20 @@ "source": [ "Find the OSF project from which you wish to download your data. \n", "\n", - "In this tutorial, we use a project called \"cryoEM simulated\" which contains simulated images from the 80s human ribosome. This project is on osf at the url: \"https://osf.io/7g42j/\".\n", + "In this tutorial, we use a project called \"simSPI\" which is a dummy project for testing. This project is on OSF at the url: .\n", "\n", "- Save the ID of the project of interest, which appears in the project's url.\n", "\n", - "In our case, the project ID is `7g42j`.\n", + "In our case, the project ID is `xbr2m`.\n", "\n", - "- Create an object from the class `Project` using:\n", - " - your credentials from the set up: email address and token,\n", + "- Create an object from the class `OSFProject` using:\n", + " - your credentials from the setup: email address and token,\n", " - the project ID that you just saved." ] }, { "cell_type": "code", - "execution_count": 2, + "execution_count": 4, "id": "febfdedd", "metadata": { "pycharm": { @@ -135,7 +131,7 @@ } ], "source": [ - "cryoem_simulated_project = datasets.OSFProject(\n", + "osf_project = datasets.OSFProject(\n", " username=\"ninamio78@gmail.com\", \n", " token=\"HBGGBOJcLYQfadEKIOyXJiLTum3ydXK4nGP3KmbkYUeBuYkZma9LPBSYennQn92gjP2NHn\",\n", " project_id=\"xbr2m\")" @@ -152,14 +148,14 @@ "source": [ "You have successfully set up the configuration of the OSF project!\n", "\n", - "## List Files on the OSF Project\n", + "## List Files in the OSF Project\n", "\n", - "Now you can list the files available on this OSF project. Note that this code can take a few minutes to run." + "Now you can list the files available in this OSF project." ] }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 9, "id": "f4f5b054", "metadata": { "pycharm": { @@ -172,12 +168,12 @@ "output_type": "stream", "text": [ "Listing files from OSF project: xbr2m...\n", - "osfstorage/new_4v6x_randomrot_copy0_defocus3.0_yes_noise.txt\n" + "osfstorage/4v6x_randomrot_copy0_defocus3.0_yes_noise.txt\n" ] } ], "source": [ - "cryoem_simulated_project.ls()" + "osf_project.ls()" ] }, { @@ -189,18 +185,18 @@ } }, "source": [ - "We observe that this project contains many files, organized in different folders.\n", + "We observe that this project contains one file in a directory called `osfstorage`. This is a default storage used in an OSF project. We don't have to worry about this for now.\n", "\n", " ## Download Files from the OSF Project\n", "\n", - "We can download one of these files, e.g. choosing from the above list the following txt file:\n", + "We can download one of these files, e.g., choosing from the above list the following txt file:\n", "\n", - "- `osfstorage/randomrot1D_nodisorder/4v6x_randomrot_copy0_defocus3.0_yes_noise.txt`.\n" + "- `4v6x_randomrot_copy0_defocus3.0_yes_noise.txt`.\n" ] }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 10, "id": "6f5270f4", "metadata": { "pycharm": { @@ -212,7 +208,7 @@ "name": "stdout", "output_type": "stream", "text": [ - "Downloading osfstorage/randomrot1D_nodisorder/4v6x_randomrot_copy0_defocus3.0_yes_noise.txt to 4v6x_randomrot_copy0_defocus3.0_yes_noise.txt...\n", + "Downloading osfstorage/4v6x_randomrot_copy0_defocus3.0_yes_noise.txt to 4v6x_randomrot_copy0_defocus3.0_yes_noise.txt...\n", "Done!\n" ] }, @@ -220,13 +216,13 @@ "name": "stderr", "output_type": "stream", "text": [ - "100%|██████████| 4.22k/4.22k [00:00<00:00, 19.1Mbytes/s]\n" + "100%|██████████| 4.22k/4.22k [00:00<00:00, 18.0Mbytes/s]\n" ] } ], "source": [ - "cryoem_simulated_project.download(\n", - " remote_path=\"osfstorage/randomrot1D_nodisorder/4v6x_randomrot_copy0_defocus3.0_yes_noise.txt\", \n", + "osf_project.download(\n", + " remote_path=\"4v6x_randomrot_copy0_defocus3.0_yes_noise.txt\",\n", " local_path=\"4v6x_randomrot_copy0_defocus3.0_yes_noise.txt\")" ] }, @@ -239,7 +235,9 @@ } }, "source": [ - "## Upload Files to an OSF Project" + "You can check if the file is downloaded in the current working directory.\n", + "\n", + "## Upload Files to the OSF Project" ] }, { @@ -251,25 +249,23 @@ } }, "source": [ - "Importantly, OSF will not let you upload data to any folder: authorization is requested.\n", + "Importantly, OSF will not let you upload data to any folder: authorization is required.\n", "\n", - "To test this functionality, you can create a new project through osf.io (https://osf.io/myprojects/) by clicking: `Create project`.\n", + "To test this functionality, you can create a new project through osf.io by clicking: `Create Project`.\n", "\n", "This will create a new project page, as the one we are using here.\n", "- Save the project ID of the project you just created!\n", "\n", - "You should then create a new `my_project` object of the class `datasets.Project` with the new project ID.\n", + "You should then create a new `my_project` object of the class `datasets.OSFProject` with the new project ID.\n", "\n", - "For the purpose of this tutorial, however, we will stay with our original project cryoEM simulated and use our object `cryoem_simulated_project`.\n", + "For the purpose of this tutorial, however, we will stay with our original project and use our object `osf_project`.\n", "\n", - "We re-upload the file that we just downloaded, renaming it by adding a `new_version` prefix to its name. We will first create a child node inside the parent node which corresponds to our root directory. Then, we will upload the file to ths child node.\n", - "\n", - "To do this, let's create an instance of ``OSFUpload`` class which takes care of uploading data to ``osf.io``. Provide the personal token and project ID." + "We re-upload the file that we just downloaded, renaming it by adding a `new_version` prefix to its name." ] }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 11, "id": "74c6125e", "metadata": { "pycharm": { @@ -281,87 +277,43 @@ "name": "stdout", "output_type": "stream", "text": [ - "Uploading ../tests/data/test_upload.txt to osfstorage/test_upload.txt...\n", + "Uploading 4v6x_randomrot_copy0_defocus3.0_yes_noise.txt to osfstorage/new_version_4v6x_randomrot_copy0_defocus3.0_yes_noise.txt...\n", "Done!\n" ] } ], "source": [ - "cryoem_simulated_project.upload(\"../tests/data/test_upload.txt\", \"test_upload.txt\")\n" + "osf_project.upload(\"4v6x_randomrot_copy0_defocus3.0_yes_noise.txt\", \"new_version_4v6x_randomrot_copy0_defocus3.0_yes_noise.txt\")" ] }, { - "cell_type": "code", - "execution_count": 4, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Removing osfstorage/test_upload.txt in the project...\n", - "Done!\n" - ] - } - ], - "source": [ - "\n", - "cryoem_simulated_project.remove(\"test_upload.txt\")" - ], - "metadata": { - "collapsed": false, - "pycharm": { - "name": "#%%\n" - } - } - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], + "cell_type": "markdown", "source": [ - "\n", - "\n", - "cryoem_simulated_project.upload(\"4v6x_randomrot_copy0_defocus3.0_yes_noise.txt\",\"osfstorage/new_4v6x_randomrot_copy0_defocus3.0_yes_noise.txt\")\n" + "Let's check if the upload was successful by listing the files in the OSF project." ], "metadata": { "collapsed": false, "pycharm": { - "name": "#%%\n" + "name": "#%% md\n" } } }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 12, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Deleting osfstorage/new_4v6x_randomrot_copy0_defocus3.0_yes_noise.txt in the project...\n", - "Done!\n" + "Listing files from OSF project: xbr2m...\n", + "osfstorage/new_version_4v6x_randomrot_copy0_defocus3.0_yes_noise.txt\n", + "osfstorage/4v6x_randomrot_copy0_defocus3.0_yes_noise.txt\n" ] } ], "source": [ - "\n", - "cryoem_simulated_project.delete(\"osfstorage/new_4v6x_randomrot_copy0_defocus3.0_yes_noise.txt\")\n" - ], - "metadata": { - "collapsed": false, - "pycharm": { - "name": "#%%\n" - } - } - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "\n", - "osf = datasets.OSFUpload(token=cryoem_simulated_project.token, data_node_guid=cryoem_simulated_project.project_id)\n", - "print(osf.headers)" + "osf_project.ls()" ], "metadata": { "collapsed": false, @@ -373,7 +325,11 @@ { "cell_type": "markdown", "source": [ - "Now we create a child node inside the parent node, for representing the dataset with 80s ribosome data. We will use the pdb id as the name for this new node, and the function will return its ID. Since the child is also a node, it can be accessed separately from the parent node." + "You should see the file `new_version_4v6x_randomrot_copy0_defocus3.0_yes_noise.txt` in the list.\n", + "\n", + "## Remove Files in the OSF Project\n", + "\n", + "Finally, let's tidy up the project by removing the file we uploaded.\n" ], "metadata": { "collapsed": false, @@ -384,20 +340,19 @@ }, { "cell_type": "code", - "execution_count": 6, + "execution_count": 13, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "8wz6g\n" + "Removing osfstorage/new_version_4v6x_randomrot_copy0_defocus3.0_yes_noise.txt in the project...\n", + "Done!\n" ] } ], "source": [ - "pdb_id = '4v6x'\n", - "child_guid = osf.write_child_node(parent_guid=cryoem_simulated_project.project_id, title= pdb_id)\n", - "print(child_guid)" + "osf_project.remove(\"new_version_4v6x_randomrot_copy0_defocus3.0_yes_noise.txt\")" ], "metadata": { "collapsed": false, @@ -409,73 +364,30 @@ { "cell_type": "markdown", "source": [ - "Now, we finally upload the files related to the 80s ribosome to the child node using ``write_files`` function. Note the ``file_paths`` must be a list, thus ``[]`` is needed around the filepaths." + "Check if the file was removed." ], "metadata": { "collapsed": false, "pycharm": { "name": "#%% md\n" } - }, - "outputs": [] + } }, { "cell_type": "code", - "execution_count": 7, + "execution_count": 14, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ - "Uploaded 4v6x_randomrot_copy0_defocus3.0_yes_noise.txt \n" + "Listing files from OSF project: xbr2m...\n", + "osfstorage/4v6x_randomrot_copy0_defocus3.0_yes_noise.txt\n" ] - }, - { - "data": { - "text/plain": "True" - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "osf.write_files(child_guid, [\"4v6x_randomrot_copy0_defocus3.0_yes_noise.txt\"])" - ], - "metadata": { - "collapsed": false, - "pycharm": { - "name": "#%%\n" - } - } - }, - { - "cell_type": "markdown", - "source": [ - "We can now check if it was uploaded correctly using the function `read_structure_guid` which will return an ID corresponding to the pdb id passed as a parameter." - ], - "metadata": { - "collapsed": false, - "pycharm": { - "name": "#%% md\n" - } - } - }, - { - "cell_type": "code", - "execution_count": 8, - "outputs": [ - { - "data": { - "text/plain": "'ezh4k'" - }, - "execution_count": 8, - "metadata": {}, - "output_type": "execute_result" } ], "source": [ - "osf.read_structure_guid(pdb_id)" + "osf_project.ls()" ], "metadata": { "collapsed": false, @@ -487,7 +399,9 @@ { "cell_type": "markdown", "source": [ - "Congratulations! You have successfully downloaded and uploaded data from/to OSF." + "You should see that the file was removed from the project.\n", + "\n", + "Brilliant! Now you know how to upload, download, list and remove files in an OSF project!" ], "metadata": { "collapsed": false,