-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should we benchmark languages other than Fortran, why, and how? #10
Comments
I would consider this benchmark repository also as an example repository to learn about possible usage of the language to write performant code. Having different languages in the benchmarks could serve as kind of Rosetta stone for users proficient in optimizing code in another language trying to learn about optimization in Fortran. |
Copying my comment from #9. We discussed quite a few times, e.g. in #2, to include other languages. Why: to actually have a comparison across languages. I expect that across Fortran/C/C++ perhaps even Julia one can fiddle with the code to eventually get similar speed. And we should have that final code. But also what would be interesting (for me) would be code that you would reasonably write as a domain expert, say a physicist. And we should have those codes too. How: that has to be discussed, for now let's have it in some form, such as in #9. I expect we'll have benchmarks for smaller arrays, that have to be run many times. I have a runner for that somewhere, I'll see if I can contribute it. Then I expect to have longer running tests, such as #9, which are not as sensitive to how it is timed. |
@certik I guess I'm asking for more clarity beyond what's been already discussed. I interepret this as, there will be at least two versions of each problem in each language:
Is the above aligned with your view? Is there some other comparison that is missing, beside these two? |
I would expect to have many versions of the same benchmark. Besides the two you mentioned, also one that allows assembly intrinsics and one that does not. One that perhaps uses more array operations, one that does not. If you look at the benchmarks in https://benchmarksgame-team.pages.debian.net/benchmarksgame/, they have many versions for the same languauge. I would expect every time someone contributes an improvement, we can have a new version. Also we each have a different "taste" what constitutes "nice code", so making sure we each have a version that we like there would drive the point home I think. See also: |
I think having multiple languages is desirable, in as many languages as possible, so that the reader can compare both the verbosity and performance. For example, in #9 one surprising feature was that the C compiler does a relatively good job of optimizing (probably inlining?) function calls, whereas Fortran suffers from a bigger performance penalty in this case (probably it doesn't inline the function calls?) At any rate, such comparisons are interesting, at least to me. Perhaps we could define some set of languages as "minimal" and strive to have each benchmark have implementations in at least those languages. For example, C/C++, Fortran, Python and Julia could be candidates, since they are all popularly used for scientific computing. |
@arunningcroc I agree 100%. |
Modern languages have diverged to such a great extent in philosophy and design that direct comparisons are exceedingly difficult, especially for comparisons based on performance. ParallelismFortran is an inherently parallel language. For a direct language-to-language comparison, you will have to handicap Fortran by comparing only serial code, leaving a ton of performance on the table even on modest platforms (e.g., an 8-core laptop), which defeats the purpose of a performance comparison. Or you can compare parallel codes, but then the comparison is between Fortran and Language X + Parallel Programming Model Y. Vectorization, Multithreading, and GPU OffloadingThe Fortran committee intended for Other implicitly parallel Fortran features include array statements, LibrariesIdiomatic C++ and Python typically rely upon external libraries even for such basic things as multidimensional array functionality. A C++ programmer wanting performance portability across a range of heterogeneous hardware architectures, for example, is likely to hand off as much performance-critical computation as possible to a library like kokkos. If you write self-contained C++ without exploiting such libraries, that alone might disqualify the code from being idiomatic C++. If instead you incorporate libraries, then you're comparing Fortran to Language X + Library Y. It seems unlikely that the proposed effort will match the performance of highly optimized libraries. Generic ProgrammingI suspect that any modern C++ library is by definition generic (using templates), but Fortran's generic programming features are still under development, making an apples-to-apples comparison impossible without a ton of clunky Bottom LineLet's all write code in the languages that make us feel most productive while providing acceptable performance. Fortunately, there could be an ultimate convergence as the language developers learn from each other. Fortran 202Y will support generic programming. C++23 will support multidimensional arrays. I'm not sure any of the languages named will ever support parallel programming in the way that Fortran does, however, because too small a sliver of the languages' programmers require scalable parallelism for the languages to add something that so fundamentally changes the language's execution model. Julia is an exception to the latter statement because its designers are targeting high-performance computing. Lastly, performance analysis and tuning is a challenging research topic in its own right. It's probably not a great idea to wade into this area unless one is taking it on as a subject of research and planning to dive deeply into it. |
@rouson, @milancurcic I see you are both a little bit reserved about the purpose of the I still think we should benchmark against other languages. We should benchmark in parallel and compare against C++ and Kokkos, among other things. As a user that is exactly what I would like to see. |
@certik my "Bottom Line" section accurately summarizes where I stand. I worry that raw performance comparisons between the languages distract from other important considerations such as programmer productivity and fundamental differences in the languages' design philosophy. I also worry that the comparisons could devolve into debates about whether what is written is truly idiomatic in the given languages if the focus is performance. Most importantly, I worry about what happens when the goals of writing high-performing code and idiomatic code diverge. I cringe when I see deeply nested In addition to all the other features that I've mentioned so far, I would hope that idiomatic Fortran would make extensive use of intrinsic functions, which also can replace multiple lines of custom logic, but the performance of intrinsic functions is another area in which I would expect considerable variation across compilers, compiler versions, and compiler flags. Consider, for example, that some compilers can be directed to call some user-chosen optimized BLAS implementation top support |
@rouson I think objectively you're quite right that comparisons between the performance of languages frequently make little sense. Nevertheless, people are interested in them, as can be seen from the continued popularity of websites like the benchmarks game, and the recent discussions in the discourse forum surrounding e.g. Julia. Julia, for instance, includes benchmarks quite prominently on its page. I know I certainly sometimes look for benchmarks when evaluating languages, with the full understanding that this can't give me a complete picture. I don't think a benchmark or any other marker of language performance has to be 100% fair and perfect to be useful or just plain fun to look at. Indeed, I think Julia's performance page says it best:
|
@certik If performance comparisons are the primary aim, then I recommend contributing to Jeff Hammond's Parallel Research Kernels (PRK). Parallelism is an essential ingredient in any performance discussion in the multicore/manycore/GPU era. PRK contains Fortran, C++11, Julia, Python, Ruby, UPC, and more. I would love to contribute to refactoring some of the Fortran code to be more idiomatic along the lines of what I wrote above. For example, PRK's Fortran kernels contain 11 different matrix-tranpose implementations. In my view, idiomatic Fortran simply calls the |
@arunningcroc I have to admit that the whole time I've been writing responses, I've been thinking it would be fun to look at the proposed code. :) From that perspective, it's definitely a valuable exercise. I like the analogy someone made to the result being a Rosetta Stone. |
@certik @arunningcroc @milancurcic having now looked at the first example, I feel even more strongly that this effort could do more harm than good. First, I urge you to not call the repository "benchmarks." If you do that, you're diving into a field with a long and controversial history. Consider the following language near the top of the README for PRK repository, a multi-language comparison effort very similar to yours except for the parallelism: "These programs should not be used as benchmarks. They are operations to explore features of a hardware platform, but they do not define fixed problems that can be used to rank systems. Furthermore they have not been optimimzed for the features of any particular system." You could adopt similar language, replacing "hardware platform" with languages. |
Second, @certik, I strongly disagree with the idea of launching a new project to write "code that you would reasonably write as a domain expert, say a physicist" unless you're going to have comparison code to demonstrate much more modern practices. The majority of domain experts are writing a narrow subset of a 31-year-old version of Fortran. As mentioned above, I cringe every time I see nested do loops doing what an array statement could do as I suspect is the case twice in just the 55-line Poisson solver. I call such code Cortran: it's the Fortran program that a C programmer would write because they think they have to loop over all the elements of an array just to initialize the array. Moreover, the program uses fixed-size arrays despite Even the "optimized.f90" Poisson solver appears at first glance to be standard-conforming Fortran 90. In fact, with some reformatting, much of it would be standard-conforming Fortran 77. If this effort moves forward, I hope that every code will have a "modern.f90" comparison that separates interfaces from implementations, uses array statements and intrinsic procedures wherever possible, and uses Moving into the 21st century, I would hope that any effort to write a substantial amount of new code would decompose the problem into procedures and then separate the procedure interfaces (in modules) from procedure definitions (in submodules), and use Fortran's facilities for parallelism and concurrency. With the exception of submodules, every feature I've named has potential runtime performance implications -- usually positive implications with a sufficiently advanced compiler -- so using these features fits perfectly with the goals of the repository. And even submodules have potentially positive compile-time performance implications, which might matter less for small kernels, but having procedure interfaces sure makes for a nice introduction to the high-level goals of different parts of the program as expressed. |
@rouson, thanks for the feedback. I think your concerns can be alleviated:
Whether we like it or not, people will keep doing such comparisons and posting online. Such comparisons influence people's choices. I know from personal interaction that people watch the "benchmarks game" site. We don't have to call the repository "benchmarks". We can call it "Rosetta". Or we can call it "how to solve a given problem in Fortran and other languages", so that people can learn what the options are: how to write idiomatic code, how to write the "simplest" code. How to write high performing code. And what is (currently) the top performing code+compiler+platform combination. I think the harm comes from drawing (wrong) conclusions. But having codes that solve a problem can't harm. In fact, we have different opinions what "idiomatic" means in Fortran. For example I do not like object oriented for numerics. I know others do. So we should have both. A third person comes an says "I don't like either of these!", so they can write a third version that they think is the best and we should include it too. Then we should have automatic tooling that can compile all such codes with different compilers and compiler options and time it. We should present it in a nice way, so that you can find code version that you personally like the most, and you can see how it stacks using current compilers. Then we should have a conversation how to improve compilers (or is it even possible for a given version). If it is not possible, then maybe that should not be the "idiomatic" way to write such code. And so on. |
Regarding your comment at #10 (comment), what you described should absolutely be one version of a Fortran code that solves a given problem. Depending on the problem, I might agree or I might not, I would have to see the code. If I do not agree, then I can submit another version. I expect we will have 10 versions easily. Then we can see them side by side, see how they perform, see how easy they are to read, to maintain, etc. A physicist can absolutely learn how to program in modern style. That is what I meant. But it needs to be simple to learn. |
@certik I agree with everything you wrote and if I can do it quickly, I'll contribute one modern.f90 companion to the poisson2d subdirectory. It might also be nice to have a poisson3d version to nudge things a bit closer to the kind of problem likely to appear in applications and the kind of problem for which performance matters more. While we're at it, I wonder if there should be Matlab versions or some open-source Matlab equivalent such as GNU Octave. A surprising amount of real science happens in Matlab and the performance differences can be even more significant relative to Fortran than with the other languages mentioned so far. |
@rouson yes, I am thinking the initial set of languages could be Fortran, C++, Python/NumPy, Julia and Matlab/Octave. |
@rouson Just for the record, many versions of the Fortran codes were proposed in the Discourse topic, and I only included the ones that ran fastest on my machine. That includes a vectorized version. As for allocatable arrays, I don't really understand why that would be more modern. I'm after all dealing with a fixed size calculation. Any pointers on that? Anyway, I hope you contribute a modern.f90 version as well, and if you do, I can also run that on my machine with the same settings, then we can get the timings on it. I also welcome the name change, but I'm dubious that any code we post just to compare similar algorithms in different languages could really do a lot of harm. We could call the repository "the instructions for world domination", and it would not change the content one bit. I think some trust in the intelligence of the reader is warranted here. |
@arunningcroc I personally like fixed size arrays as well, as they are also automatically deallocated (just like allocatable arrays) and for a simple example like you did I think they are a perfect fit. But as I mentioned in #22 (comment), I don't want to argue about this now what the "modern idiomatic" style is. For now I just want to have all approaches in, and we should have that discussion later. |
I followed the discussion here and related threads a bit and I'm somewhat disappointed about the overall tone. I know that coding style can be a somewhat loaded topic, but let's not judge each other by the way we write small code examples, please. Let's keep this a place for respectful collaboration such that we all can enjoy working on this project together. |
@awvwgk apologies for any judgement. I do worry that a large part of what turns so many people away from Fortran is the older Fortran that they've seen. If we're still writing code that is effectively Fortran 77 plus a tiny subset of Fortran 90, it's going to be very hard to attract new people to the language. Moreover, it's worth noting precisely what some of the older constructs communicate to the reader and to the compiler. A |
I am not against benchmarking implementations in different languages. My reservation is the same as with any other Fortran-lang project (stdlib, fpm, website etc.): Let's build things mindfully and with intention rather than just throwing things in there and seeing what happens. Now, I understand and recognize that nobody here argued that we should just throw things in and see what happens. But, I also haven't seen a clear goal on what exactly we want to compare between the languages. Take for example #9: There, I'd like to have seen documented (in the README perhaps):
My point is, if you ask and answer the question what exactly you're trying to compare, you'll have a better chance to make a meaningful (fair) comparison. Ultimately, I want to avoid a benchmarks repository where Fortran implementations are inadvertently fine-tuned to demonstrate or even imply language superiority. As this is a common criticism of other benchmarks like those on the Julia website and some recent blog posts, I expect that we wouldn't want to repeat the same, with tables turned. All of us here are responsible for preventing language wars from happening. |
It's a chicken and egg problem: we can't design and show what we are trying to do without first having a few benchmarks in, but we can't get the few benchmarks in because we do not have criteria to judge them. I think we all understand the dangers of mindlessly benchmarking. Also nobody wants fine tuned benchmarks here (as the only thing or the main thing). I think we have all explained enough what we do not want. So let's now discuss what we want. I proposed a vision in #22. That vision answers your questions:
Nothing is being directly measured. This is an example ("idiom") how to solve a 2D Poisson equation with certain boundary conditions using a first order finite difference scheme and relaxation method. We already have two such examples contributed. I would like to see even more. Yes, we are interested in timings and benchmarks for this too, as one of the many other criteria, such as readability, and how hard it is to write.
As proposed in #11, we need tests to ensure any submitted example / idiom returns exactly the same answer. Regarding libraries to use, there will be 10 codes in C++ let's say, so some can use other libraries, some might only use "pure C++". We can look at timings and other pros and cons and compare and everybody can make their own opinion which one is better.
With as many compilers / options that our infrastructure allows. NumPy can be installed using Conda (probably how a lot of people would install it), so a version number should be enough to identify. This has been discussed previously: #2 (comment) @milancurcic and others let me know if you agree / disagree with the vision I presented. If it is too early to tell, then let's simply at least try, and if it is not going in a direction we want, we can always remove this repository from fortran-lang later. If you have a different vision for what this particular effort could become, then please share it. |
The vision is great, I like it a lot. In this issue we're discussing last bullet point specifically.
Why can't we? I very much think we can. We just need to ask the question. For the 2-D Poisson problem, for example, the question could be: How fast are the executables produced by Fortran and C compilers given idiomatic (no matter how we define this) and semantically equivalent Fortran and C code? Is the above not a meaningful, interesting, and simple enough question to ask? I think it is, and it's something we can measure. Then, if we like the question and agree to start there, contribute the Fortran and C implementations like those in #9. Meanwhile, ensure they're both correct and produce the same results (#11). Then, we can look and the timings. Now we have a minimal framework to expand upon, and add other languages to the mix. (I suggested Fortran and C to start because they have companion compilers, so it's likely easier to make a meaningful benchmark).
This is not what I meant. I meant: If NumPy is not compiled with I'm happy that the discourse in this thread is shifting from "we're doing benchmarks" to "we're doing idioms, and maybe we'll do some benchmarks", but it doesn't make for a fair discourse because I originally asked "Should we benchmark other languages", and not "Should we include other languages at all". |
What I proposed above as the "minimal framework" or "the first question" is just my idea of where to start, I'm not sure that it's the best way to go and I don't have a lot of experience in that area. Is there interest in discussing what would be the minimal framework to start with? In other words, what to compare and measure? I would like that. Or would you prefer to just focus on idioms (source code) for now, and not worry about benchmarks until later? |
I as well don't think of allocatable and static arrays as modern or archaic. They're just different. When I don't need an allocatable array, I don't use it. However, we should plan ahead, to facilitate easier testing and timing, problem size, like |
Thanks @milancurcic, good points. Yes, if we can constructively figure out criteria, then by all means let's do that. I like what you started with the question. My comment would be that I think I want to answer more questions than just Fortran and C. All I was saying is that I don't want this effort to die just because we do not have well fleshed out criteria. The same with benchmarks. I very much want to focus on benchmarking. However, the objection was that only focusing on benchmarking would be harmful, so I am willing to let benchmarking go for now, and focus on idioms and code, and worry about benchmarking later. In terms of what I care about long term, I would like to have the various maintained codes (in different languages) that solve the given problem (and where each of us can find that one version that we personally really like), and that could be benchmarked, and the infrastructure that allows this. For example, I want to have a NumPy version (or versions) that work. So that once I get to benchmarking, I don't have to worry about writing or debugging the codes, I already have them, and can concentrate on actually getting meaningful benchmarks out of it (as you said, then things should perhaps be compiled with the same options, although not in all cases, such as if you just want to compare the "default experience"). |
Great! I'd actually like benchmarks to move forward together with idioms, but with all these cautions that we discussed in mind. We have a good start with the 2-D Poisson, and I'd like us to work on it and polish it--I think mainly document it, ensure it's correct, and that it clearly conveys a message that we want to send. I think we'll have a minimal framework then to re-use for other problems. |
I absolutely do not intend to kill this effort :). |
A source of cross-language benchmarks is the Benchmarks section of the Fortran code on GitHub list I am maintaining. |
On Tue, Jun 29, 2021 at 6:49 PM Milan Curcic ***@***.***> wrote:
For the 2-D Poisson problem, for example, the question could be: How fast
are the executables produced by Fortran and C compilers given idiomatic (no
matter how we define this) and semantically equivalent Fortran and C code?
Based on my extremely limited experience with performance analysis, I
suspect this is a much, much deeper, thornier issue than one might
immediately think and it's one that's likely to unnecessarily step on toes
and lead to unproductive discussions. I'm not just saying this
hypothetically. Without going into specifics, I'm watching these problems
happen in real-time elsehwere right now. In that case, there exist some
reasonably widely known codes that could easily be misinterpreted as
answering exactly the question you're posing and yet experts in one the
languages involved can immediately identify fundamental problems in the way
that language's code is written that severely disadvantage it relative to
the other languages. That's harmful. The harm wouldn't be a big deal if
it weren't for the facts that (1) the people involved do not have the
freedom to chase after every such occurrence and volunteer time to fix the
problem because they have jobs with high-stakes deliverables, (2) the codes
in question live for years with no one fixing the issues, and (3) anyone
who naively encounters and runs the codes will reach conclusions that
incorrectly shed a bad light on other people's life work in developing the
languages and compilers and such.
Even setting aside the need for deep expertise in the language(s) in
question, there are so many issues related to the interplay between problem
size, hardware architecture (cache size, memory bandwidth, etc.), compiler
choice, compiler version, compiler options, algorithm choice, etc.
Computing enough cases to broadly range across the options along any one of
the variables just named will take a considerable amount of time and still
leave a high-dimensional parameter space to judiciously explore along all
of the other axes. I recommend staying away from drawing any performance
conclusions that apply beyond one person's particular choice for each of
the aforementioned factors, and I doubt that my list was exhaustive.
Moreover, most compilers these days are lowering any given language down to
some language-independent, intermediate representation, at which point I
assume that all languages are functionally equivalent anyway, which makes
the whole exercise pointless to some extent.
And as I have pointed out at length, there are fundamental differences in
the design of the languages that prevent such comparisons. Most of the
languages we're discussing have no semantic equivalent to coarrays so, at
best, you'll have to leave out one of the most important factors in
performance (parallelization) to do any language-to-language performance
comparison. That defeats the purpose.
I think the best purpose of this repository is to serve as a sort of
Rosetta Stone for translating across various languages and across various
idioms within a language.
Damian
|
Thanks @rouson, this is exactly the kind of perspective that we need. I didn't have sufficient foresight to recognize these issues when posing my question. I agree with pursuing only idioms and not benchmarks until we convince ourselves otherwise. |
Thank you. Looks like we are in agreement now. I still want the timings though! Just not presented as benchmarks initially. |
I see great value in implementing a variety of simple yet real-world algorithms in Fortran and benchmarking them along multiple axes:
How about different languages? What would be the main purpose of that?
Are we interested in comparing the performance of Fortran and other language implementations, using idiomatic, naive code (i.e. the code that a novice would write), and thus comparing the compilers capability to optimize?
Or are we interested in writing code in different languages that produces the same (or as similar as possible) assembly, and then compare the source code?
The text was updated successfully, but these errors were encountered: