copyright | lastupdated | subcollection | ||
---|---|---|---|---|
|
2023-09-05 |
AnalyticsEngine |
{:new_window: target="_blank"} {:shortdesc: .shortdesc} {:codeblock: .codeblock} {:screen: .screen} {:note: .note} {:pre: .pre}
{: #best-practices-serverless}
Use the following set of recommended guidelines when provisioning and managing your serverless instances and when running Spark applications.
Best Practice | Description | Reference Link |
---|---|---|
Use separate {{site.data.keyword.iae_full_notm}} service instances for your development and production environments. | This is a general best practice. By creating separate {{site.data.keyword.iae_full_notm}} instances for different environments, you can test any configuration and code changes before applying them on the production instance. | NA |
Upgrade to the latest Spark version | As open source Spark versions are released, they are made available in {{site.data.keyword.iae_full_notm}} after a time interval required for internal testing. Watch out for the announcement of a new Spark versions in the Release Notes section and upgrade the runtime of your instance to move your applications to latest Spark runtime. Older runtimes are be deprecated and eventually removed as newer versions are released. Make sure you test your applications on the new runtime before making changes on the production instances. | - Release notes for {{site.data.keyword.iae_full_notm}} serverless instances |
Grant role-based access | You should grant role-based access to all users on the {{site.data.keyword.iae_full_notm}} instances based on their requirements. For example, only your automation team should have permissions to submit applications because it has access to secrets and your DevOps team should only be able to see the list of all applications and their states. | - Granting permissions to users |
Choose the right {{site.data.keyword.cos_full_notm}} configuration | - Disaster Recovery (DR) Resiliency: You should use the {{site.data.keyword.cos_full_notm}} Cross Regional resiliency option that backs up your data across several different cities in a region. In contrast, the Regional resiliency option back ups data in a single data center. \n- Encryption: {{site.data.keyword.cos_full_notm}} comes with default built-in encryption. You can also configure {{site.data.keyword.cos_short}} to work with the BYOK Key Protect service. \n- Service credentials: By default, {{site.data.keyword.cos_full_notm}} uses IAM-style credentials. If you want to work with AWS-style credentials, you need to use the "Include HMAC Credential" option as described in Service credentials. \n- Direct endpoints for {{site.data.keyword.cos_full_notm}}: Always use direct endpoints for connectivity to the {{site.data.keyword.cos_full_notm}} instance. This applies to the {{site.data.keyword.cos_full_notm}} home instance as well as endpoints used from your applications (either your code or what you pass as parameters in the configurations at instance level or application level). Direct endpoints provide better performance than public endpoints and do not incur charges for any outgoing or incoming bandwidth. | - Disaster Recovery (DR) Resiliency: {{site.data.keyword.cos_full_notm}} documentation. \n- Encryption: Getting started with encryption keys and {{site.data.keyword.cos_short}} manage encryption \n- Service credentials: Service credentials \n- Direct endpoints for {{site.data.keyword.cos_full_notm}}: Endpoints and storage locations |
Use private endpoints for the external Hive metastore | If you are using Spark SQL and want to use an external metastore such as use {{site.data.keyword.databases-for-postgresql_full_notm}} as your Hive metastore, you must use the private endpoint for the database connection for better performance and cost savings. | - Working with Spark SQL and an external metastore |
Running applications with resource overcommitment | There is a quota associated with each Analytics Engine Serverless instance. When applications are submitted on an instance, they are allocated resources from the instance quota. If an application requests resources beyond the available quota, the application will either not start or will run with less than the requested resources, which might result in the application running slower than expected or, in some cases, in the application failing. You should always monitor the current resource consumption on an instance to ensure that your applications are running comfortably within the given limits. You can adjust the limits through a support ticket if required. | - Default limits and quotas \n- Get current resource consumption |
Static allocation of resources versus autoscaling | When you submit applications, you can specify the number of executors upfront (static allocation) or use the autoscaling option (dynamic allocation). Before you decide whether to use static allocation or autoscaling, you might want to run a few benchmarking tests by varying different data sets with both static and autoscaling to find the right configuration. General considerations: \n- If you know the number of resources (cores and memory) required by your application and it doesn't vary across different stages of the application run, it is recommended to allocate static resources for better performance. \n- If you want to go for an optimized resource utilization, you can opt for autoscaling of executors where the executors are allotted based on the application's actual demand. Note that there might be a slight associated delay when using autoscaling in applications. | - Enabling application autoscaling |
Enable and fine-tune forward logging | - Enable forward logging for your service instance to help troubleshoot, show progress, and print or show outputs of your applications. Note that log forwarding incurs a cost based on the quantity of logs forwarded or retained in the {{site.data.keyword.la_full_notm}} instance. Based on your use case and need, you need to decide the optimal settings. \n- When you enable log forwarding using the Default API, only the driver logs are enabled. If you need executor logs as well, for example, if there are errors that you would see only on executors, you need to customize logging to enable executor logging as well. Executor logs can become very large, so balance out the options to optimize the amount of logs that get forwarded to your logging instance versus the information you get in logs for troubleshooting purposes. \n- Follow the best practices of {{site.data.keyword.la_full_notm}} when choosing the right configuration and searching techniques. For example, you might want to configure the {{site.data.keyword.la_full_notm}} instance plan for a 7 day search with the archival of logs to {{site.data.keyword.cos_full_notm}} to save on costs. Also refer to the {{site.data.keyword.la_full_notm}} documentation for techniques on searching for logs of your interest based on keywords, point in time, and so on. | - Configuring and viewing logs |
Customize your service instance | - You might need to customize your service instance to bring in Python or conda packages that are not preinstalled, or bring in some files(certificates or config files) that are to be made available to Spark applications. Based on your needs, customize your instance using library sets and use these library sets when submitting applications. \n - The size of your library set has a bearing on the application startup time and the executor startup time (when you auto-scale applications). Also note there is an upper limit for the size of a library set, namely 2 GB. So if different applications need different library sets, it is better for you to use separate library sets, so that they can be specified individually at the time the application is submitted. \n - Use customization only to bring in files that cannot be brought in by the application details parameters. See Parameters for submitting Spark applications. You must use the standard spark-submit equivalent parameter options such as the files , jars , packages and pyFiles options if that fits your use case. Only if you need files that don't fit into any of these categories, for example a self signed certificate, a JAAS configuration file, or a .so file, you should use the "customization for file download" option. |
- Customization options |
Apply filters when retrieving list of applications | When you need to retrieve list of applications either in UI or using the API or CLI, it is better to apply the appropriate filters and retrieve the set that you need. | - Spark application commands |
Use other services or tools for supporting functions | Apart from using an {{site.data.keyword.la_full_notm}} and {{site.data.keyword.cos_full_notm}} instance and depending on your use case, you might want to use other supporting tools and services. For instance, you can use Apache Airflow (managed by you) for orchestrating, scheduling and automating your applications. You can also make use of IBM Secrets Manager to store the secrets required for your applications and use your automation scripts to read the secrets from the Secrets Manager before submitting your applications. You can also get creative with your application arguments, passing a token required to read the required secrets from the Secrets Manager directly from within your application. | - Configuring Secrets Manager |
Use instances in alternate regions for backup and disaster recovery | Currently, {{site.data.keyword.iae_full_notm}} Serverless instances can be created in two regions, namely Dallas(us-south ) and Frankfurt(eu-de ). Although it is advisable to create your instances in the same region where your data is located, it is always useful to create a backup instance in an alternate region with the same set of configurations as your primary instance, in case the primary instance becomes unavailable or unusable. Your automations should enable switching application submissions between the two regions if required. |
NA |
Use separate buckets and service credentials for application files, data files, and home instance | Use the "separation of concerns" principle to distinguish the access between different resources. \n - Do not store data or application files in the home instance bucket. \n - Use separate buckets for data and application files. \n - Use separate access credentials (IAM Key based) with restricted access to the bucket for application files and the bucket that contains your data. | - Assigning access to an individual bucket |
Applications must run within 72 hours | There is a limit on the number of hours an application or kernel can run. For security and compliance patching, all runtimes that run for more than 72 hours are stopped. If you do have a large application, break your application into smaller chunks that will run within 72 hours. If you are running Spark streaming applications, make sure that you configure checkpoints and have monitoring in place to restart your applications if they are stopped | - Application limits |
Start and Stop Spark History only when needed | Always stop the Spark history server when you no longer need to use it. Keep in mind that the Spark history server consumes CPU and memory resources continuously while its state is started. | - Spark history server |
{: caption="Best practices when using serverless instances including detailed descriptions and reference links" caption-side="top"} | ||
{: #table-1} | ||
{: row-headers} |