Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Smaller Docker image size #1

Merged
merged 1 commit into from
Nov 21, 2024
Merged

Smaller Docker image size #1

merged 1 commit into from
Nov 21, 2024

Conversation

TinyMarsh
Copy link
Collaborator

@TinyMarsh TinyMarsh commented Nov 21, 2024

User description

This pull request includes updates to the Dockerfile and README.md to improve the build process and result in a smaller image size. The base image has been switched from Debian to Ubuntu, the package management commands have been updated, and the Docker run command in the documentation has been modified.


PR Type

enhancement, documentation


Description

  • Changed the base image in the Dockerfile from Debian to Ubuntu to optimize the Docker image size.
  • Updated package management commands to ensure packages are up to date and necessary locales are configured.
  • Modified the DIA-NN download and setup process to align with the new base image.
  • Updated the Docker run command in README.md to reflect changes in the directory structure.

Changes walkthrough 📝

Relevant files
Enhancement
Dockerfile
Optimize Dockerfile for smaller image size and better configuration

Dockerfile

  • Changed base image from Debian to Ubuntu.
  • Updated package installation and locale configuration.
  • Added metadata and environment variables for locale settings.
  • Modified DIA-NN download and setup process.
  • +24/-16 
    Documentation
    README.md
    Update Docker run command in documentation                             

    README.md

    • Updated Docker run command to reflect new directory structure.
    +1/-1     

    💡 PR-Agent usage: Comment /help "your question" on any pull request to receive relevant information

    Summary by CodeRabbit

    • New Features

      • Updated Docker container to use Ubuntu 22.04.
      • Enhanced metadata labels for better documentation and clarity.
      • Included the latest DIA-NN binary for Linux.
    • Bug Fixes

      • Clarified licensing implications for different DIA-NN versions in the README.
    • Documentation

      • Updated README with new execution command for running the Docker container.
      • Provided clearer guidance on compliance with DIA-NN licensing.

    @TinyMarsh TinyMarsh requested a review from ypriverol November 21, 2024 16:56
    Copy link

    coderabbitai bot commented Nov 21, 2024

    Walkthrough

    The changes involve significant updates to both the Dockerfile and README.md for the DIA-NN project. The Dockerfile transitions the base image from Debian to Ubuntu, adds various metadata labels, modifies the installation process, and updates the method for downloading DIA-NN software. The README.md clarifies licensing implications for different DIA-NN versions, confirms the software version, and updates the command for running the Docker container.

    Changes

    File Change Summary
    Dockerfile - Changed base image from debian:12.8-slim to ubuntu:22.04.
    - Added metadata labels for software details.
    - Introduced USER directive and set DEBIAN_FRONTEND to noninteractive.
    - Split package installation into separate update and install commands.
    - Adjusted locale settings to generate en_US.UTF-8.
    - Updated DIA-NN download method and set permissions for the DIA-NN folder.
    README.md - Updated license section to clarify distribution implications for DIA-NN versions.
    - Confirmed DIA-NN version as 1.9.2.
    - Modified command in "Running the Container" section to reflect new execution path for DIA-NN binary.

    Sequence Diagram(s)

    sequenceDiagram
        participant User
        participant DockerContainer
        participant DIA-NN
    
        User->>DockerContainer: Run container
        DockerContainer->>DIA-NN: Execute ./diann-1.9.2/diann-linux
        DIA-NN-->>DockerContainer: Process data
        DockerContainer-->>User: Return results
    
    Loading

    🐇 "In the garden, we hop and play,
    With Docker changes brightening the day.
    Ubuntu now, with labels galore,
    DIA-NN ready, let’s explore!
    License clear, the path is set,
    A leap in code, we won't forget!" 🥕


    Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

    ❤️ Share
    🪧 Tips

    Chat

    There are 3 ways to chat with CodeRabbit:

    • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
      • I pushed a fix in commit <commit_id>, please review it.
      • Generate unit testing code for this file.
      • Open a follow-up GitHub issue for this discussion.
    • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
      • @coderabbitai generate unit testing code for this file.
      • @coderabbitai modularize this function.
    • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
      • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
      • @coderabbitai read src/utils.ts and generate unit testing code.
      • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
      • @coderabbitai help me debug CodeRabbit configuration file.

    Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

    CodeRabbit Commands (Invoked using PR comments)

    • @coderabbitai pause to pause the reviews on a PR.
    • @coderabbitai resume to resume the paused reviews.
    • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
    • @coderabbitai full review to do a full review from scratch and review all the files again.
    • @coderabbitai summary to regenerate the summary of the PR.
    • @coderabbitai resolve resolve all the CodeRabbit review comments.
    • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
    • @coderabbitai help to get help.

    Other keywords and placeholders

    • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
    • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
    • Add @coderabbitai anywhere in the PR title to generate the title automatically.

    CodeRabbit Configuration File (.coderabbit.yaml)

    • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
    • Please see the configuration documentation for more information.
    • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

    Documentation and Community

    • Visit our Documentation for detailed information on how to use CodeRabbit.
    • Join our Discord Community to get help, request features, and share feedback.
    • Follow us on X/Twitter for updates and announcements.

    Copy link

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    ⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
    🧪 No relevant tests
    🔒 Security concerns

    Root execution:
    The Dockerfile explicitly sets USER root but never changes to a non-privileged user, which is a security best practice. The container will run all processes as root by default, potentially exposing the host system to greater risk if the container is compromised.

    ⚡ Recommended focus areas for review

    Missing Cleanup
    The package installation doesn't clean up apt cache files which can increase the image size. Consider adding apt-get clean and removing /var/lib/apt/lists

    Missing Workdir
    The WORKDIR directive was removed from the new version. This could affect how the container runs and where files are created by default

    Missing PATH
    The PATH environment variable setup was removed. This could affect the ability to run the diann executable from any directory

    Copy link

    PR Code Suggestions ✨

    Explore these optional code suggestions:

    CategorySuggestion                                                                                                                                    Score
    General
    Set environment variables to ensure proper executable access

    Add PATH and DIA_NN_PATH environment variables to ensure the DIA-NN executable is
    accessible from any directory.

    Dockerfile [40]

     RUN chmod -R 775 /diann-1.9.2
    +ENV PATH="$PATH:/diann-1.9.2"
    +ENV DIA_NN_PATH="/diann-1.9.2"
    • Apply this suggestion
    Suggestion importance[1-10]: 9

    Why: Adding PATH and DIA_NN_PATH variables is essential for proper functionality, as these were present in the old version but missing in the new one. This ensures the DIA-NN executable can be called from any directory.

    9
    Clean up package manager cache to reduce Docker image size

    Add cleanup commands after package installation to reduce image size. Use 'apt-get
    clean' and remove package lists.

    Dockerfile [22]

    -RUN apt-get install wget unzip libgomp1 locales -y
    +RUN apt-get install wget unzip libgomp1 locales -y && \
    +    apt-get clean && \
    +    rm -rf /var/lib/apt/lists/*
    • Apply this suggestion
    Suggestion importance[1-10]: 8

    Why: Adding cleanup commands after package installation is a crucial best practice in Docker to reduce image size significantly. This directly impacts deployment efficiency and resource usage.

    8
    Define a consistent working directory for container operations

    Set a working directory for the container to ensure consistent file operations.

    Dockerfile [42-43]

     # NOTE: It is entirely the user's responsibility to ensure compliance with DIA-NN license terms.
     # Please review the licensing terms for DIA-NN before using or distributing this Docker image.
     
    +WORKDIR /data
    +
    • Apply this suggestion
    Suggestion importance[1-10]: 7

    Why: Setting WORKDIR is important for container consistency and was present in the old version but removed. This helps standardize file operations and provides a clear context for container execution.

    7

    💡 Need additional feedback ? start a PR chat

    Copy link

    @coderabbitai coderabbitai bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Actionable comments posted: 2

    🧹 Outside diff range and nitpick comments (4)
    README.md (1)

    Line range hint 19-21: Base image information needs to be updated

    The README still shows debian:12.8-slim as the base image, but according to the PR changes, the base image has been changed to ubuntu:22.04. This needs to be updated to maintain accuracy.

    Apply this diff to fix the inconsistency:

    - **Base Image**: `debian:12.8-slim`
    + **Base Image**: `ubuntu:22.04`
    Dockerfile (3)

    Line range hint 8-8: Parameterize the DIA-NN version to improve maintainability

    Defining the DIA-NN version as a build argument prevents duplication and simplifies future updates. Using a variable for the version reduces the risk of inconsistencies.

    Apply this diff to parameterize the DIA-NN version:

    +ARG DIANN_VERSION=1.9.2
    
     LABEL software.version="1.9.2"
    +LABEL software.version="${DIANN_VERSION}"
    
    -RUN wget https://github.com/vdemichev/DiaNN/releases/download/1.9.2/diann-1.9.2.Linux.zip -O diann-1.9.2.Linux.zip
    +RUN wget https://github.com/vdemichev/DiaNN/releases/download/${DIANN_VERSION}/diann-${DIANN_VERSION}.Linux.zip -O diann-${DIANN_VERSION}.Linux.zip
    
    -RUN unzip diann-1.9.2.Linux.zip
    +RUN unzip diann-${DIANN_VERSION}.Linux.zip
    
    -RUN chmod -R 775 /diann-1.9.2
    +RUN chmod -R 775 /diann-${DIANN_VERSION}

    Also applies to: 34-34, 37-37, 40-40


    34-37: Remove unnecessary files and combine RUN commands to reduce image size

    After unzipping the DIA-NN package, the zip file is no longer needed. Combining commands into a single RUN statement reduces the number of layers and overall image size.

    Apply this diff to clean up unnecessary files and optimize layering:

    -RUN wget https://github.com/vdemichev/DiaNN/releases/download/${DIANN_VERSION}/diann-${DIANN_VERSION}.Linux.zip -O diann-${DIANN_VERSION}.Linux.zip
    
    -RUN unzip diann-${DIANN_VERSION}.Linux.zip
    
    +RUN wget https://github.com/vdemichev/DiaNN/releases/download/${DIANN_VERSION}/diann-${DIANN_VERSION}.Linux.zip -O diann-${DIANN_VERSION}.Linux.zip && \
    +    unzip diann-${DIANN_VERSION}.Linux.zip && \
    +    rm diann-${DIANN_VERSION}.Linux.zip

    40-40: Set appropriate permissions and consider adding DIA-NN to PATH

    Ensure that the DIA-NN binary is executable and consider adding it to the PATH for easier execution.

    Apply this diff to update permissions and set the PATH:

     RUN chmod -R 775 /diann-${DIANN_VERSION}
    +ENV PATH="/diann-${DIANN_VERSION}:$PATH"
    📜 Review details

    Configuration used: CodeRabbit UI
    Review profile: CHILL

    📥 Commits

    Reviewing files that changed from the base of the PR and between 63b7248 and e0e88c8.

    📒 Files selected for processing (2)
    • Dockerfile (2 hunks)
    • README.md (1 hunks)
    🔇 Additional comments (3)
    README.md (2)

    42-42: LGTM: Docker run command updated correctly

    The Docker run command has been updated to use the correct path to the DIA-NN binary, which aligns with the new container structure.


    Line range hint 9-11: Verify the license file reference

    The documentation references a license file DIANN1.9.2-LICENSE.txt but we should verify its existence and accessibility.

    ✅ Verification successful

    License file reference is correctly linked in the documentation

    The verification confirms that the DIANN1.9.2-LICENSE.txt file exists in the repository and is accessible, making the documentation reference accurate.

    🏁 Scripts executed

    The following scripts were executed for the analysis:

    Script:

    #!/bin/bash
    # Check if the referenced license file exists
    if [ -f "DIANN1.9.2-LICENSE.txt" ]; then
        echo "License file exists"
    else
        echo "Warning: Referenced license file DIANN1.9.2-LICENSE.txt not found"
    fi
    
    # Check for any other license files that might be misnamed
    fd -t f -e txt -e md -e pdf -i license
    

    Length of output: 148

    Dockerfile (1)

    42-43: ⚠️ Potential issue

    Verify compliance with DIA-NN licensing terms before distributing the Docker image

    DIA-NN's licensing terms may restrict the distribution of Docker images containing the software. Ensure that including DIA-NN in this image complies with the license terms to avoid legal issues.

    Run the following script to retrieve the DIA-NN license text for review:

    Please review the license to ensure compliance.

    ✅ Verification successful

    Let's search for any license files in the repository that might contain DIA-NN's license terms, since the direct URL fetch failed. This will help us verify the licensing compliance requirements.


    DIA-NN license has specific restrictions on commercial and cloud usage

    Based on the discovered license terms in the repository, DIA-NN has important usage restrictions:

    1. Commercial/for-profit use is not permitted
    2. Cloud usage is only allowed under collaborative use with the Demichev laboratory
    3. Usage requires proper attribution in any derived materials
    4. Experimental functionality requires collaboration with the Demichev laboratory

    The warning comment in the Dockerfile is appropriate and necessary, as users need to carefully review these terms before using or distributing the Docker image.

    🏁 Scripts executed

    The following scripts were executed for the analysis:

    Script:

    #!/bin/bash
    # Description: Retrieve the DIA-NN license text for version 1.9.2
    
    curl -sSL https://github.com/vdemichev/DiaNN/raw/main/LICENSE.txt
    

    Length of output: 102512


    Script:

    #!/bin/bash
    # Search for DIA-NN license information in the repository
    fd -t f -i "license|copying" -X cat {}
    

    Length of output: 16927

    @@ -1,5 +1,6 @@
    FROM debian:12.8-slim AS builder
    FROM ubuntu:22.04
    Copy link

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    🛠️ Refactor suggestion

    Consider using a smaller base image to achieve the goal of reducing image size

    Switching from debian:12-slim to ubuntu:22.04 may increase the Docker image size, as Ubuntu images are generally larger than Debian slim images. To meet the PR objective of reducing the image size, consider using debian:12-slim or another minimal base image.

    Comment on lines +19 to +22
    RUN apt-get update && apt-get upgrade -y

    RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && \
    locale-gen && update-locale LANG=en_US.UTF-8
    # Install necessary packages including locales
    RUN apt-get install wget unzip libgomp1 locales -y
    Copy link

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    🛠️ Refactor suggestion

    Optimize package installation to reduce image size and avoid unnecessary upgrades

    Running apt-get upgrade is generally discouraged in Dockerfiles, as it can increase image size and may introduce unintended changes. Combining apt-get update and apt-get install in a single RUN command reduces image layers and avoids cache issues.

    Apply this diff to optimize package installation:

    -RUN apt-get update && apt-get upgrade -y
    
    -# Install necessary packages including locales
    -RUN apt-get install wget unzip libgomp1 locales -y
    +RUN apt-get update && \
    +    apt-get install -y --no-install-recommends wget unzip libgomp1 locales && \
    +    rm -rf /var/lib/apt/lists/*
    📝 Committable suggestion

    ‼️ IMPORTANT
    Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

    Suggested change
    RUN apt-get update && apt-get upgrade -y
    RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && \
    locale-gen && update-locale LANG=en_US.UTF-8
    # Install necessary packages including locales
    RUN apt-get install wget unzip libgomp1 locales -y
    RUN apt-get update && \
    apt-get install -y --no-install-recommends wget unzip libgomp1 locales && \
    rm -rf /var/lib/apt/lists/*

    Copy link

    @coderabbitai coderabbitai bot left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Actionable comments posted: 2

    🧹 Outside diff range and nitpick comments (3)
    README.md (1)

    Line range hint 14-14: Base image information needs to be updated

    The README states the base image as debian:12.8-slim, but according to the PR objectives, the base image has been switched to Ubuntu. Please update this information to reflect the actual base image being used.

    Dockerfile (2)

    1-1: Consider using a slimmer base image to further reduce the Docker image size.

    Switching to ubuntu:22.04 is a good step, but you might achieve a smaller image by using ubuntu:22.04-slim or ubuntu:22.04-minimal as the base image. This aligns with your goal of reducing the Docker image size without sacrificing functionality.


    40-40: Restrict permissions on the DIA-NN directory for security.

    Setting permissions to 775 grants write access to group members, which may not be necessary and could pose a security risks. It's advisable to set permissions to 755 to grant execute permissions without unnecessary write access.

    Apply the following change:

    -RUN chmod -R 775 /diann-1.9.2
    +RUN chmod -R 755 /diann-1.9.2
    📜 Review details

    Configuration used: CodeRabbit UI
    Review profile: CHILL

    📥 Commits

    Reviewing files that changed from the base of the PR and between 63b7248 and e0e88c8.

    📒 Files selected for processing (2)
    • Dockerfile (2 hunks)
    • README.md (1 hunks)
    🔇 Additional comments (2)
    README.md (1)

    42-42: Verify the new binary path exists in the container

    The Docker run command has been updated to use ./diann-1.9.2/diann-linux. Let's verify this path exists in the built container.

    Dockerfile (1)

    25-26: Verify locale generation to prevent runtime errors.

    The locale-gen command may not fully generate locales in Ubuntu 22.04. Ensuring that en_US.UTF-8 is properly generated is crucial to avoid locale-related runtime errors.

    Run the following script to confirm the locale is correctly generated:

    If the locale is not listed, consider installing locales-all or adjusting the locale generation commands.

    ✅ Verification successful

    Locale configuration is properly set up

    The Dockerfile follows the correct sequence for locale configuration:

    1. Installs the locales package via apt-get
    2. Generates the en_US.UTF-8 locale using locale-gen
    3. Updates the system locale configuration
    4. Sets the required environment variables LANG and LANGUAGE
    🏁 Scripts executed

    The following scripts were executed for the analysis:

    Script:

    #!/bin/bash
    # Description: Verify that the en_US.UTF-8 locale is generated.
    
    # Test: List available locales and check for en_US.UTF-8
    locale -a | grep -i 'en_US\.utf8'
    

    Length of output: 132


    Script:

    #!/bin/bash
    # Let's check if the Dockerfile has the necessary dependencies for locale generation
    rg -A 5 -B 5 "locale-gen" 
    
    # Also check for any locale-related package installations
    rg -i "apt.*install.*locale"
    

    Length of output: 532

    g++ build-essential cmake zlib1g-dev libbz2-dev libboost-all-dev wget locales unzip && \
    apt-get clean && rm -rf /var/lib/apt/lists/*
    # Update package lists and ensure package versions are up to date
    RUN apt-get update && apt-get upgrade -y
    Copy link

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    🛠️ Refactor suggestion

    Optimize package installation and avoid unnecessary apt-get upgrade.

    Running apt-get upgrade in a Dockerfile can lead to larger image sizes and potential unpredictability. It's generally advisable to avoid it. Also, combining apt-get update and apt-get install in a single RUN command reduces the number of layers and prevents caching issues.

    Consider applying the following changes:

    -RUN apt-get update && apt-get upgrade -y
    
    -RUN apt-get install wget unzip libgomp1 locales -y
    
    +RUN apt-get update && apt-get install -y \
    +    wget \
    +    unzip \
    +    libgomp1 \
    +    locales && \
    +    rm -rf /var/lib/apt/lists/*

    This approach installs the necessary packages and cleans up the apt cache to minimize the image size.

    Also applies to: 22-22

    Comment on lines +42 to +43
    # NOTE: It is entirely the user's responsibility to ensure compliance with DIA-NN license terms.
    # Please review the licensing terms for DIA-NN before using or distributing this Docker image.
    Copy link

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    ⚠️ Potential issue

    Ensure compliance with DIA-NN licensing terms to avoid legal issues.

    DIA-NN version 1.9.2 may have licensing restrictions that prohibit the distribution of Docker images containing the software. Building and sharing this Docker image could violate the license agreement. It's crucial to review the license terms and obtain the necessary permissions before proceeding.

    Would you like assistance in reviewing the licensing terms or exploring alternative methods to comply with the license, such as instructing users to download DIA-NN during the build process?

    @ypriverol ypriverol merged commit 6156052 into main Nov 21, 2024
    2 checks passed
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants