Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New benchmarks and applications #140

Open
mcopik opened this issue Mar 19, 2023 · 7 comments
Open

New benchmarks and applications #140

mcopik opened this issue Mar 19, 2023 · 7 comments
Labels
enhancement New feature or request

Comments

@mcopik
Copy link
Collaborator

mcopik commented Mar 19, 2023

In SeBS, we provide a representative set of functions and have developed a set of serverless workflows that will be included in the upcoming release. However, the serverless field is constantly changing, and new types of applications are being "FaaS-ified". SeBS will benefit from new type of functions, new workflows, and from new applications - the last type has not been considered for SeBS yet.

Functions

The current list of functions is available in the documentation.

New ideas (all should be rather simple to implement thanks to a large number of open-source implementations):

  • ML - More ML examples (different Resnet versions, BERT, etc.) - we can use Ml-Perf as a reference.
  • Utilities - PDF generation and conversion seems to be a common - headless Chrome to generate PDF of a webpage (maybe with Browsershot?), or convert PDFs to different formats (Calibre? pdfkit for Node.js?). Example article with browsershot..
  • Utilities - scanning object storage for viruses.
  • Webpage/Utilities - QR code generator.
  • Two collections of serverless snippets - there might some interesting applications there. ServerlessLand and FunctionResources.

Workflows

The current list of workflows is in the PR #88 and in the related thesis. In the PR, we have workflows for MapReduce, video analysis, ExCamera, and ML fitting. The thesis also documents the abstract language we use to specify each workflow.

To extend SeBS, we want to cover new application types and rich workflows with new computational patterns.

Potential new ideas:

  • ServerlessLand snippets
  • Simple webapp application - todos.
  • Another webapp - airline flight booking
  • Pywren has several linear algebra applications. It is not written as a workflow, but turning a Cholesky or matrix-matrix multiplication into a workflow with pre-defined schedule would be very interesting.
  • AFCL has two interesting benchmarks - we should analyze if they can be expressed properly in our system to run on AWS Step Functions or Azure Durable.
  • cbl-translate is a very interesting, rich and complex ML inference workflow that utilize several different models. The workflow uses external APIs such as DeepL, but we should be able to not rely on them - either deploy an existing open-source model for this task instead of an external API or remove that feature entirely.
  • maskopy - another interesting utility application that runs on AWS Step Functions.
  • Compilation benchmark - examples are in the gg paper. We do not need generality, but an example of offloading compilation steps.

Applications

Our benchmark suite contains functions and workflows, but it does not contain full applications not written as workflows. This can be standalone applications offloading certain tasks to serverless, and fully serverless applications.

  • Black-Scholes from PARSEC benchmark suite - we already have code for a serverless invocation that should be integrated soon.
  • Microservices are an extremely important workload to the cloud, and several authors have attempted to port them to serverless. This usually requires adding database for state, and replacing RPC with queues and triggers. DeathStarBench is a great source of microservices - SocialNetwork might be a great candidate and it's in C++ for which we have provisional support. Another paper describes porting several microservices - the paper does not come with a code, but some of these application should be open source, e.g., Overleaf.
  • Another work is focused on benchmarking applications built on top of serverless triggers - one of the examples might also be new to SeBS and interesting. So far, we have not explored asynchronous triggers in SeBS.
  • Serverless Google Maps is open source.
  • Simple webapp application - todos.
  • Another webapp - airline flight booking
  • Pywren has several linear algebra applications.
  • cbl-translate is a very interesting, rich and complex ML inference workflow that utilize several different models. We can represent this as an application and remove the dependency on an external ML API.
@mcopik mcopik added the enhancement New feature or request label Mar 19, 2023
@Rajiv2605
Copy link

@mcopik Thanks for creating this issue. I have some doubts:

  1. What are the expected number of benchmarks you are looking for in the GSoC period?
  2. At what detail should a GSoC proposal address the adding of new benchmarks?

@mcopik
Copy link
Collaborator Author

mcopik commented Mar 19, 2023

@Rajiv2605 It depends on their type - functions should be fairly straightforward, while applications can take a lot of time to port and test for correctness. I think that once you create a schedule with milestones, it will be much more clear.

When it comes to the second question, it should be clear from the proposal how you are planning to approach the transition to SeBS - is their open-source implementation with an appropriate license, do you plan to implement it from scratch, how much work will be involved, do you foresee any potential issues, etc. In the proposal, there's no need for deep technical details about each benchmark, but an assessment that the application is interesting as a benchmark, novel for SeBS, viable technically to be used as benchmark, and an estimation of the difficulty and time commitment.

@Rajiv2605
Copy link

@mcopik Will we be working on our deliverables during the community bonding period or does the coding start only after it? I am confused if I can include the community bonding period to the coding period while deciding upon the milestones and the amount of work in the proposal.

@mcopik
Copy link
Collaborator Author

mcopik commented Mar 22, 2023

@Rajiv2605 Working on deliverables during the community bonding period is not required, but it's a great time to do research, work with other libraries/projects and ensure they work as expected.

@mcopik
Copy link
Collaborator Author

mcopik commented Mar 22, 2023

Adding one more interesting application - serverless maps from HackerNews.

@lawrence910426
Copy link
Contributor

I think Black-Scholes from PARSEC benchmark suite could be very interesting for me. I have some experience in Dertivative Pricing & HPC. QuantLib also implemented BS-Model (and similar models).

@mcopik
Copy link
Collaborator Author

mcopik commented Apr 2, 2023

@lawrence910426 As I said above, we already have code for that :-) However, other Monte Carlo simulations might be a great addition! If you know some other examples from QuantLib, particularly those with different I/O and computational intensity, and the library's code allows us to integrate example to SebS, then I think this might be a great and interesting addition.

@mcopik mcopik mentioned this issue Mar 22, 2024
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants