Yes, this typically happens when the system runs out of memory. The OOM Killer (Out Of Memory Killer) in Linux terminates processes to free up RAM. You can check this using: dmesg | grep -i "killed process"
import psutil
# CPU usage
cpu_percent = psutil.cpu_percent(interval=1)
print(f"CPU Usage: {cpu_percent}%")
# RAM usage
memory = psutil.virtual_memory()
print(f"RAM Usage: {memory.percent}%")import json
#### Read JSON file
with open('text.json', 'r') as f:
data = json.load(f)
#### Access key-value pairs
for key, value in data.items():
print(f"{key}: {value}")
#### Modify or add a new key-value pair
data['new_key'] = 'new_value'
#### Write back to the file
with open('text.json', 'w') as f:
json.dump(data, f, indent= 4)sudo groupadd devteam
sudo usermod -aG devteam username
####Verify groups username
resource "aws_db_instance" "example" {
identifier = "mydb"
engine = "mysql"
instance_class = "db.t3.micro"
allocated_storage = 20
name = "mydb"
username = "admin"
password = "admin123"
vpc_security_group_ids = [aws_security_group.rds_sg.id]
db_subnet_group_name = aws_db_subnet_group.example.name
skip_final_snapshot = true
}
resource "aws_db_subnet_group" "example" {
name = "my-subnet-group"
subnet_ids = [aws_subnet.subnet1.id, aws_subnet.subnet2.id]
}
A: Use Route 53 with failover routing policy and health checks. Primary region serves traffic; if it fails, Route 53 redirects to the secondary region.
A: Use CloudWatch:
- Metrics: auto-collected
- Logs: via agents or SDK
- Alarms: for thresholds
- Dashboards: for visualization
- Events/EventBridge: for automation
A:
- Tag all resources with Environment=sbox/uat/prod
- Use CloudWatch filters and dashboards per tag
- In Cost Explorer, filter by tags
- Enable Cost Allocation Tags and use AWS Budgets
A:
- Attach Lambda to VPC with private subnets
- Ensure security group rules allow traffic
- Lambda uses ENI to talk privately inside VPC
A: AssumeRole lets a user or service temporarily get permissions of another IAM role using STS, ideal for cross-account access or secure delegation.
A: #!/bin/bash LOG_DIR="/path/to/logs" PATTERN="error" grep -iH "$PATTERN" "$LOG_DIR"/*.log
Answer: Possible reasons: API server is down or not reachable Port 6443 is blocked by firewall TLS certificate mismatch (use -k to ignore) IP or DNS name incorrect
Answer:
The ELK Stack is a powerful log aggregation and analytics platform composed of three main open-source components:
| Component | Function |
|---|---|
| Elasticsearch | A distributed, RESTful search and analytics engine that stores and indexes logs. |
| Logstash | A data processing pipeline that collects, filters, and forwards logs to Elasticsearch. |
| Kibana | A web-based UI to visualize and explore data stored in Elasticsearch. |
ELK is used for centralized logging, monitoring, and visualizing logs from servers, applications, containers, and cloud infrastructure.
Answer:
Here is a simple Bash script using ssh to fetch memory utilization from two remote Linux servers:
#!/bin/bash
# List of servers (replace with actual IPs or hostnames) servers=("server1.example.com" "server2.example.com")
# Loop through each server
for server in "${servers[@]}"; do echo "----- Memory usage on $server -----" ssh user@$server free -h echo "" done
How it works:
Uses ssh to connect to each server.
Runs free -h to display memory usage in a human-readable format.
Prints the output with a header for each server.
Prerequisites:
Passwordless SSH access (using SSH keys) must be set up.
Replace user with your actual username and hostnames accordingly.
15Q. You have static & dynamic web apps using high EC2 + NGINX, causing high cost & low availability. What's your solution?
Answer:
Use S3 + CloudFront for static content. Run dynamic apps on ECS/EKS/Fargate with auto-scaling. Replace high EC2s with smaller instances in ASG. Use ALB instead of standalone NGINX. Containerize the app for better resource usage.
Result: Lower cost, high availability, easier management.
Answer:
Run kubectl get pods to find the failed pod. Use kubectl logs and describe to diagnose. Check for resource issues or crash errors. Restart with kubectl rollout restart. Roll back if a new image/code caused the issue. Ensure HPA and probes are correctly set.
Result: Service is restored quickly, root cause identified.
17Q. A client wants to implement a new system in 3 months, but your analysis shows it will take 6 months. How would you handle this situation?
Answer:
Communicate Transparently Explain the findings and timeline based on a clear scope, technical complexity, and resource availability.
Break Down the Project Propose a phased approach: deliver core features in 3 months, with additional phases after that.
Explore Alternatives Identify options to accelerate delivery—like increasing the team size, reducing scope (MVP), or using pre-built solutions.
Provide Evidence Share data from similar past projects, effort estimates, and risk assessments to support your timeline.
Goal: Align client expectations with reality while still showing flexibility and commitment to delivery.
18Q. How would you assess whether an AI implementation would be beneficial for a specific business process?
Answer:
Understand the Process Evaluate if the process is data-driven, repetitive, and can benefit from pattern recognition or prediction.
Identify Pain Points Look for inefficiencies, manual work, or high error rates that AI can solve (e.g., forecasting, automation, classification).
Check Data Availability Confirm if there's sufficient, clean, and labeled data to train AI models.
Estimate ROI Compare AI implementation costs vs. potential benefits (time saved, error reduction, better decisions).
Pilot First Propose a small-scale proof of concept (PoC) to validate feasibility and effectiveness.
Goal: Ensure AI adds real value, is technically feasible, and aligns with business goals.
A: AWS Fargate is a serverless compute engine for containers that works with ECS and EKS. Key Differences from EC2-based ECS: • Fargate: No need to provision or manage EC2 instances. • EC2: You manage the EC2 infrastructure, networking, patching, etc. Use case: When you want to run containers without managing the underlying infrastructure.
A: AWS Glue is a fully managed ETL (Extract, Transform, Load) service used for: • Discovering, cataloging, and transforming data. • Preparing data for analytics and machine learning. Components: • Glue Crawlers – automatically detect schema and create metadata tables. • Glue Jobs – run PySpark or Python scripts for ETL. • Glue Data Catalog – central metadata repository.
A: Amazon EventBridge Scheduler (formerly CloudWatch Events Scheduler) is a fully managed scheduler for running tasks or workflows at defined times or intervals.
Use cases:
• Schedule Lambda functions or Step Functions.
• Start/stop EC2 or RDS instances.
• Trigger Glue jobs or ECS tasks periodically.
Example:
{
"ScheduleExpression": "rate(5 minutes)",
"Target": {
"Arn": "arn:aws:lambda:region:account-id:function:MyFunction"
}
}
A: Amazon ECS is a fully managed container orchestration service that lets you run and scale Docker containers.
Key Concepts:
• Cluster – logical grouping of tasks or services.
• Task Definition – blueprint for your container (image, CPU, memory, etc.).
• Services – keep tasks running, support scaling and load balancing.
Modes:
• EC2 launch type
• Fargate launch type (serverless)
A: Amazon ECR is a fully managed Docker container registry that makes it easy to store, manage, and deploy container images.
Features:
• Integrated with ECS, EKS, and Lambda.
• Secure access via IAM.
• Supports image versioning and scanning.
Common Commands:
#### Authenticate Docker to ECR
aws ecr get-login-password | docker login --username AWS --password-stdin <account_id>.dkr.ecr..amazonaws.com
#### Push Docker image
docker build -t my-image .
docker tag my-image:latest <account_id>.dkr.ecr..amazonaws.com/my-image:latest
docker push <account_id>.dkr.ecr..amazonaws.com/my-image:latest
#### 24Q. Do you know about UAT?
Answer: Yes. User Acceptance Testing is the final testing phase where end-users validate the software before going to production.
Answer: Minikube runs a local Kubernetes cluster inside a VM or container for learning and development purposes.
Q: What steps do you take when a Linux server runs out of disk space? A: I check disk usage with df -h and identify large files or directories using du -sh * | sort -h. I clear logs (/var/log), old Docker images (docker image prune), and cached packages (apt clean, yum clean all). If necessary, I increase the volume size or mount additional storage.
Q: How do you troubleshoot SSH connection issues? A: I check network reachability using ping or telnet, confirm correct IP, key permissions (chmod 400 for PEM), verify SSH service status, and review /var/log/auth.log or /var/log/secure. I also confirm the security group/firewall allows port 22.
Answer: A CRD lets you define a custom resource (e.g., MySQLCluster) and use it like a native Kubernetes object. It extends Kubernetes capabilities without modifying the core.
Answer: • Scaling workloads
• Monitoring & logging
• Rolling updates
• Backup & restore (etcd, volumes)
• Debugging pods
• Resource limits and quota management
Answer: A Service Mesh manages communication between services. Features:
• Traffic management
• Security (mTLS)
• Observability (metrics, tracing)
Examples: Istio, Linkerd
Answer: Sidecar injection is the process of automatically adding a sidecar container (like an Envoy proxy) to pods. This is used in service meshes (e.g., Istio) for traffic interception.
Answer: Envoy is a high-performance proxy used in service meshes (e.g., Istio) for:
• Load balancing
• Traffic routing
• TLS termination
• Observability
Answer: PDB ensures a minimum number of pods are always available during voluntary disruptions (like node drain). You can define:
• minAvailable
• maxUnavailable
Answer: Used to check pod health:
• Liveness Probe: Restarts container if it's stuck.
• Readiness Probe: Controls pod availability to services.
• Startup Probe: For slow-starting apps.
Answer: • Voluntary: Triggered by user (e.g., kubectl drain, rolling update).
• Involuntary: System-triggered (e.g., node crash, OOM).
Answer: • Safe Eviction: Graceful shutdown respecting Pod Disruption Budgets and lifecycle hooks.
• Hard Eviction: Forced eviction due to resource pressure or system errors.
Answer: (Deprecated in Kubernetes v1.25) PSPs controlled security-related settings like:
• Privileged mode
• Host namespaces
• Volume types
• User IDs
Use Pod Security Admission (PSA) instead in newer versions.
Answer: This occurs when a container keeps crashing repeatedly. Causes:
• Application error
• Misconfiguration
• Unavailable dependencies
Fix:
• Check logs, describe pod, check readiness/liveness probes.
A:AWS Step Functions is a service designed to help you orchestrate workflows in a serverless manner. It visualizes these workflows as state machines, which are a series of steps that your application follows. These steps can involve calling other AWS services, or your own custom code.
Here are some of the key benefits of using Step Functions:
Visual Workflows: Step Functions provides a drag-and-drop interface to define your workflows, making it easier to design and understand complex processes. Simplified Automation: You can automate workflows across a variety of AWS services without having to write and maintain a lot of custom code. Error Handling and Resilience: Step Functions includes built-in error handling, retries, and timeouts to make your workflows more robust. Event-Driven Architecture: Step Functions integrates well with event-driven architectures, allowing you to define workflows that are triggered by events. Overall, Step Functions is a powerful tool that can help you build and manage complex workflows in the AWS cloud.