|
| 1 | +# warc |
| 2 | +[](http://github.com/datatogether) |
| 3 | +[](https://archivers-slack.herokuapp.com/) |
| 4 | +[](http://godoc.org/github.com/datatogether/warc) |
| 5 | +[](./LICENSE) |
| 6 | + |
| 7 | +warc is an implementation of ISO28500 1.0, the WebARCive specfication. |
| 8 | +it provides readers, writers, and structs for working with warc records. |
| 9 | + |
| 10 | +from the spec: |
| 11 | +> The WARC (Web ARChive) file format offers a convention for concatenating |
| 12 | +multiple resource records (data objects), each consisting of a set of |
| 13 | +simple text headers and an arbitrary data block into one long file. The |
| 14 | +WARC format is an extension of the ARC File Format [ARC] that has |
| 15 | +traditionally been used to store "web crawls" as sequences of content |
| 16 | +blocks harvested from the World Wide Web. Each capture in an ARC file is |
| 17 | +preceded by a one-line header that very briefly describes the harvested |
| 18 | +content and its length. This is directly followed by the retrieval |
| 19 | +protocol response messages and content. The original ARC format file is |
| 20 | +used by the Internet Archive (IA) since 1996 for managing billions of |
| 21 | +objects, and by several national libraries. |
| 22 | +package warc |
| 23 | + |
| 24 | +## License & Copyright |
| 25 | + |
| 26 | +[Affero General Public License v3](http://www.gnu.org/licenses/agpl.html) ] |
| 27 | + |
| 28 | +## Getting Involved |
| 29 | + |
| 30 | +We would love involvement from more people! If you notice any errors or would like to submit changes, please see our [Contributing Guidelines](./.github/CONTRIBUTING.md). |
| 31 | + |
| 32 | +We use GitHub issues for [tracking bugs and feature requests](https://github.com/datatogether/REPONAME/issues) and Pull Requests (PRs) for [submitting changes](https://github.com/datatogether/REPONAME/pulls) |
| 33 | + |
| 34 | +## Usage |
| 35 | +`import "gitnub.com/datatogether/warc"` |
0 commit comments