You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Setting a small enough size would force 1 page per WARC file. That being said, I wouldn't recommend this since you may run into file system problems on large crawls.
Another option is to use the FILESYSTEM data formats. They do create one file per URL, but they don't support the WARC format as yet.
Is there a config option for splitting out downloaded files into their own warc files instead of going into the same one?
This will allow for easier data extraction based on individual items
The text was updated successfully, but these errors were encountered: