Is Crawling required again, for changing the folder structure path in Windows #1370
Replies: 5 comments 8 replies
-
Hi @NivethikaM There's no such a feature in FSCrawler. I think you were almost there. When you change the path to a file, you need to compute again:
This last one represents the folder where the file is so it needs to be updated accordingly. Also, I think you need to update the folder index. {
"_index": "fscrawler_fs_crawler_test_sub_dirs_i_t_test_subdirs_deep_tree_folder",
"_type": "_doc",
"_id": "5e6e88529a54132361ca34bc2c6b",
"_score": 1,
"_source": {
"path": {
"root": "68d15b87d1545c8425478dd9b7b59ac3",
"virtual": "/",
"real": "/var/folders/xn/47mdpxd12vq4zrjhkwbhd5_r0000gn/T/junit1606348347133366394/resources/test_subdirs_deep_tree"
},
"file": {
"content_type": "text/directory",
"filename": "test_subdirs_deep_tree"
}
}
} You need to update the document If you succeed in making that happen, I'd really appreciate if you could document exactly what you did. |
Beta Was this translation helpful? Give feedback.
-
Thanks for your quick reply. I have updated the path.root in the Index as you have mentioned. Though i have mentioned index_folders as true in the '_settings.json', i am not able to update Path.root & _id in the Index_folder. I am having the mapping for the index_folder as mentioned above by you and the number of shards & replicas is mentioned as '1' in the Index_Folder/_settings. But i am not having any data when i queried the Index_folder/_search. I have almost 30 index and none of the index_folder is having any data in it (total is showing 0). |
Beta Was this translation helpful? Give feedback.
-
Now i am getting the data in Index_Folder. The problem is when we are giving the path as "C:/index/test/" in fs.url, we need to have sub folders under it. If we are having only files in the path, we won't get the data in the Index_Folder. So, sub-directories needs to be present after the 'fs.url' path. In the Index_folder, the value for the _id and the root is unique for each file location and this data is provided from fscrawler. Test 1
According to my testing, _id and root value of index_folder is pushed by fscrawler for each path and this value is stored in the machine. |
Beta Was this translation helpful? Give feedback.
-
I have created 3 indices namely tmp_both, tmp_a & tmp_b. The tmp_both_folder index has 2 records in it.
The tmp_a_folder has 1 record in it.
The tmp_b_folder has 1 record in it.
So each path has some unique _id and root values. If we use the same path in multiple index then also the _id is same. The values in the index are taking the _id of folder index as path.root based on the indexing path. |
Beta Was this translation helpful? Give feedback.
-
For moving a folder from D:/index/test/verify to C:/index/test/verify, I have done the following: Since it is not possible to update the _id in Elastic, I have deleted the _id: 47cce77fc87eb153015eb17e1553579 in tmp_a_folder. Now added a document with the _id : 6dd2bfd4ae85d09ffc4c6f3a44cad81 and root: 78486e9dba9752b4d5b8fcbec74d73c in the tmp_a_folder index. Also make sure that all the file.filename, path.root, path.virtual and path.real is correct. Updated the path.root, path.real and file.url in the tmp_a index. Now also changed the url in the _settings.json file. I added the document present in the tmp_a index to the new url path manually. I have tested thrice in different ways. Please guide me in solving this? |
Beta Was this translation helpful? Give feedback.
-
I have created a folder namely E:/index/test/ and i have added 10 files in it. I have indexed all the files in Elastic Search (fs.url: E:/index/test/) . Now i want to change the path of the folder structure from E:/Index/test/ to C:/index/test/. I have changed the fs.url to C:/index/test and copied all the files from E:/index/test/ to C:/index/test/.
In Elastic Search i have changed the path.real and file.url from E:/index/test/${filename} to C:/index/test/${filename} using update_by_query. Also made sure path.virutal is present as required.
Now when i am crawling for the same index from the different path C:/index/test/, all the files are added again and the number of files has been increased from 10 to 20. Though the content is same, path.real, path.virtual and file.url is same for the files, it is indexed again.
Is is possible to change the folder structure path for the same index in windows?
Fscrawler Version - 2.9
Elastic - 6.8.0
Beta Was this translation helpful? Give feedback.
All reactions