diff --git a/index.md b/index.md index 8c700e1..981960b 100644 --- a/index.md +++ b/index.md @@ -97,4 +97,97 @@ Choose one of the languages listed above to see the language-specific leaderboar [Java]: https://github.com/gunnarmorling/1brc +## The challenge + +Your mission, should you choose to accept it, is to write a program that retrieves temperature measurement values from a text file and calculates the min, mean, and max temperature per weather station. There's just one caveat: the file has 1,000,000,000 rows! That's more than 10 GB of data! 😱 + +The text file has a simple structure with one measurement value per row: + +``` +Hamburg;12.0 +Bulawayo;8.9 +Palembang;38.8 +Hamburg;34.2 +St. John's;15.2 +Cracow;12.6 +... etc. ... +``` + +The program should print out the min, mean, and max values per station, alphabetically ordered. The format that is expected varies slightly from language to language, but the following example shows the expected output for the first three stations: + +``` +Hamburg;12.0;23.1;34.2 +Bulawayo;8.9;22.1;35.2 +Palembang;38.8;39.9;41.0 +``` + +Oh, and this `input.txt` is different for each submission since it's generated on-demand. So no hard-coding the results! 😉 + +## Rules and limits + +- No external library dependencies may be used. That means no lodash, no numpy, no Boost, no nothing. You're limited to the standard library of your language. + +- Implementations must be provided as a single source file. Try to keep it relatively short; don't copy-paste a library into your solution as a cheat. + +- The computation must happen _at application runtime_; you cannot process the measurements file at build time + +- Input value ranges are as follows: + + - **Station name:** non null UTF-8 string of min length 1 character and max length 100 bytes (i.e. this could be 100 one-byte characters, or 50 two-byte characters, etc.) + - **Temperature value:** non null double between -99.9 (inclusive) and 99.9 (inclusive), always with one fractional digit + +- There is a maximum of 10,000 unique station names. + +- Implementations must not rely on specifics of a given data set. Any valid station name as per the constraints above and any data distribution (number of measurements per station) must be supported. + +## Entering the challenge + +Some languages have special instructions but in general here's what you can expect: + +1. Create a fork of the 1BRC repository for your language on your own GitHub profile. This will let you submit your solution via a pull request. + +2. Somehow create a new implementation file in the repository. This will vary by language. For example in JavaScript you might create a new `src/.js` file while in C++ you might make a new `src/.cpp` file. It's recommended to copy the default reference solution to get started and then modify it from there. + +3. Make that implementation fast. Really fast. + +4. Test & benchmark your solution! There's usually language-specific instructions on how to do this but in general you run ` bench ` to run your solution against the reference implementation. If you see any differences, fix them before submitting your implementation. + +5. Create a pull request against the upstream repository! 🎉 There's usually some additional instructions in the Pull Request template on information you should include like how long it took on your computer and your computer's specs. + +6. Someone or some robot will run your solution "officially" on the same hardware as everyone else's solution (so no hardware differences) and report the results. If you're the fastest, you win! 🏆 If not, you'll still probably go on the leaderboard. 🥉 + +If you'd like to discuss any potential ideas for implementing 1BRC with the community, you can use the GitHub Discussions of this [@1brc](https://github.com/1brc) GitHub organization or the language-specific repository discussions. Please keep it friendly and civil. + +The challenge runs until Jan 31 2024. \ +Any submissions (i.e. pull requests) created after Jan 31 2024 23:59 UTC will not be considered. + +## Prize 🎁 + +If you enter this challenge, you may learn something new, get to inspire others, and take pride in seeing your name listed in the scoreboard above. +Rumor has it that the winner of the [Java] competition (the original challenge language) may receive a unique 1️⃣🐝🏎️ t-shirt, too! + +## FAQ + +Make sure you check your language-specific FAQ as well. 😉 + +###### What is the encoding of the measurements.txt file? + +The file is encoded as UTF-8. + +###### Can I make assumptions on the names of the weather stations showing up in the data set? + +No. While only a fixed set of station names is used by the data set generator, any solution should work with arbitrary UTF-8 station names. For the sake of simplicity, names are guaranteed to contain no `;` character. + +###### Can I copy code from other submissions? + +Yes, you can. The primary focus of the challenge is about learning something new, rather than "winning". When you do so, please give credit to the relevant source submissions. Please don't re-submit other entries with no or only trivial improvements. + +###### My solution runs in 2 sec on my machine. Am I the fastest 1BRC-er in the world? + +Probably not. 😊 1BRC results are reported in wallclock time, thus results of different implementations are only comparable when obtained on the same machine. If for instance an implementation is faster on a 32 core workstation than on the 8 core evaluation instance, this doesn't allow for any conclusions. When sharing 1BRC results, you should also always share the result of running the baseline implementation on the same hardware. + +###### Why 1️⃣🐝🏎️? + +It's the abbreviation of the project name: the **One** **B**illion **R**ow **C**hallenge. +