-
Notifications
You must be signed in to change notification settings - Fork 343
Integrated code lifecycle
: Retry missing build jobs
#11330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrated code lifecycle
: Retry missing build jobs
#11330
Conversation
Integrated code lifecycle
: Retry missing build jobs
End-to-End (E2E) Test Results Summary
|
End-to-End (E2E) Test Results Summary
|
...ain/java/de/tum/cit/aet/artemis/programming/service/localci/LocalCIEventListenerService.java
Show resolved
Hide resolved
WalkthroughAdds a LocalCIMissingJobService to detect and retry MISSING build jobs, extends BuildJobRepository with a time-bounded query and retry increment, adds retry support to LocalCITriggerService and BuildJobQueueItem, removes the scheduled check from LocalCIEventListenerService, and updates tests to exercise retry behavior. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Scheduler
participant MissingSvc as LocalCIMissingJobService
participant Repo as BuildJobRepository
participant Dist as DistributedDataAccessService
Scheduler->>MissingSvc: checkPendingBuildJobsStatus()
MissingSvc->>Repo: find jobs with status QUEUED or BUILDING
Note over MissingSvc: ignore jobs submitted < 5 minutes ago
MissingSvc->>Dist: getQueuedJobs() / getProcessingJobIds()
loop each pending job
alt job present in queue/processing
MissingSvc->>MissingSvc: no change
else job absent
MissingSvc->>Repo: update buildStatus -> MISSING
end
end
sequenceDiagram
autonumber
actor Scheduler
participant MissingSvc as LocalCIMissingJobService
participant Repo as BuildJobRepository
participant PartRepo as ParticipationRepository
participant Trigger as LocalCITriggerService
participant Queue as Build Queue
Scheduler->>MissingSvc: retryMissingJobs()
MissingSvc->>Repo: findMissingJobsToRetryInTimeRange(lastHour, page)
loop each MISSING job
alt retryCount < maxMissingJobRetries
MissingSvc->>PartRepo: load Participation
MissingSvc->>Trigger: retryBuildJob(job, participation)
Trigger->>Queue: enqueue build with retryCount+1
MissingSvc->>Repo: incrementRetryCount(buildJobId)
else
MissingSvc->>MissingSvc: skip (limit reached)
end
end
Note over MissingSvc: logs if more pages exist for next run
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal). Please share your feedback with us on this Discord post. 📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (9)
src/test/java/de/tum/cit/aet/artemis/programming/test_repository/BuildJobTestRepository.java (1)
21-22
: Prefer ordering by submission date over stringbuildJobId
for “latest” lookups
buildJobId
is a String; lexicographic order can be brittle.buildSubmissionDate
directly captures temporal order and already exists in the model/queries. Consider switching to (or adding) a method ordered bybuildSubmissionDate DESC
for robustness.Apply:
-Optional<BuildJob> findFirstByParticipationIdOrderByBuildJobIdDesc(Long participationId); +Optional<BuildJob> findFirstByParticipationIdOrderByBuildSubmissionDateDesc(Long participationId);If you keep both, document when each should be used (e.g., fallback when start date is null).
src/main/java/de/tum/cit/aet/artemis/programming/service/localci/LocalCITriggerService.java (1)
174-176
: Cap/validate retryCount defensively in trigger pathEven if the scheduler enforces a max (e.g., 3), add a guard here to prevent accidental overflows from other callers.
Example:
-private void triggerBuild(ProgrammingExerciseParticipation participation, String commitHashToBuild, RepositoryType triggeredByPushTo, boolean triggerAll, int retryCount) +private void triggerBuild(ProgrammingExerciseParticipation participation, String commitHashToBuild, RepositoryType triggeredByPushTo, boolean triggerAll, int retryCount) throws LocalCIException { + retryCount = Math.max(0, Math.min(retryCount, 3));Also applies to: 227-235
src/test/java/de/tum/cit/aet/artemis/programming/icl/LocalCIIntegrationTest.java (2)
96-96
: Remove stale TODO.This file already uses @isolated; the note is outdated.
-// TODO re-enable tests. when Executed in isolation they work +
333-336
: Also clear processingJobs to avoid false negatives.If an item lingers in processing, the status check may skip marking MISSING.
queuedJobs.clear(); +processingJobs.clear(); localCIMissingJobService.checkPendingBuildJobsStatus();
src/main/java/de/tum/cit/aet/artemis/programming/service/localci/LocalCIMissingJobService.java (5)
75-78
: Make the 5-minute grace configurable.Externalize for ops tuning and tests.
- final int buildJobExpirationInMinutes = 5; // If a build job is older than 5 minutes, and it's status can't be determined, set it to missing + final int buildJobExpirationInMinutes = pendingJobGraceMinutes; // see configuration propertyAdd the field near the other @value fields:
@Value("${artemis.continuous-integration.max-missing-job-retries:3}") private int maxMissingJobRetries; +@Value("${artemis.continuous-integration.pending-job-grace-minutes:5}") +private int pendingJobGraceMinutes;
95-97
: Fix misleading comment.We don’t update build start date here.
- // If the build job is in an unknown state, set it to missing and update the build start date + // If the build job is in an unknown state, mark it as MISSING
101-106
: Clean up Javadoc (stray 'R' and phrasing).- /** - * Periodically retries missing build jobs. - * R - * retrieves a slice of missing build jobs from the last hour and attempts to retry them. - * If a build job has reached the maximum number of retries, it will not be retried again. - */ + /** + * Periodically retries missing build jobs. + * Retrieves a slice of MISSING jobs from the last hour and attempts to retry them. + * Jobs that reached the configured retry limit are skipped. + */
71-71
: Consider @transactional on scheduled updaters.Ensures repository writes are committed consistently per run.
- public void checkPendingBuildJobsStatus() { + @org.springframework.transaction.annotation.Transactional + public void checkPendingBuildJobsStatus() {- public void retryMissingJobs() { + @org.springframework.transaction.annotation.Transactional + public void retryMissingJobs() {Add import if not present:
import org.springframework.transaction.annotation.Transactional;Also applies to: 106-106
129-131
: Observability: add counters for marked-MISSING and retried jobs.Expose Micrometer counters/gauges to track how often jobs go MISSING and how many retries are issued.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
src/main/java/de/tum/cit/aet/artemis/programming/repository/BuildJobRepository.java
(1 hunks)src/main/java/de/tum/cit/aet/artemis/programming/service/localci/LocalCIEventListenerService.java
(0 hunks)src/main/java/de/tum/cit/aet/artemis/programming/service/localci/LocalCIMissingJobService.java
(1 hunks)src/main/java/de/tum/cit/aet/artemis/programming/service/localci/LocalCITriggerService.java
(3 hunks)src/test/java/de/tum/cit/aet/artemis/programming/AbstractProgrammingIntegrationLocalCILocalVCTestBase.java
(2 hunks)src/test/java/de/tum/cit/aet/artemis/programming/icl/LocalCIIntegrationTest.java
(6 hunks)src/test/java/de/tum/cit/aet/artemis/programming/test_repository/BuildJobTestRepository.java
(1 hunks)
💤 Files with no reviewable changes (1)
- src/main/java/de/tum/cit/aet/artemis/programming/service/localci/LocalCIEventListenerService.java
🧰 Additional context used
📓 Path-based instructions (2)
src/test/java/**/*.java
⚙️ CodeRabbit configuration file
test_naming: descriptive; test_size: small_specific; fixed_data: true; junit5_features: true; assert_use: assertThat; assert_specificity: true; archunit_use: enforce_package_rules; db_query_count_tests: track_performance; util_service_factory_pattern: true; avoid_db_access: true; mock_strategy: static_mocks; context_restart_minimize: true
Files:
src/test/java/de/tum/cit/aet/artemis/programming/test_repository/BuildJobTestRepository.java
src/test/java/de/tum/cit/aet/artemis/programming/AbstractProgrammingIntegrationLocalCILocalVCTestBase.java
src/test/java/de/tum/cit/aet/artemis/programming/icl/LocalCIIntegrationTest.java
src/main/java/**/*.java
⚙️ CodeRabbit configuration file
naming:CamelCase; principles:{single_responsibility,small_methods,no_duplication}; db:{perf_queries,datetime_not_timestamp}; rest:{stateless,singleton,delegate_logic,http_only,minimal_dtos}; dtos:{java_records,no_entities,min_data,single_resp}; di:constructor_injection; kiss:simple_code; file_handling:os_indep_paths; practices:{least_access,avoid_transactions,code_reuse,static_member_ref,prefer_primitives}; sql:{param_annotation,uppercase,avoid_subqueries};java:avoid_star_imports
Files:
src/main/java/de/tum/cit/aet/artemis/programming/repository/BuildJobRepository.java
src/main/java/de/tum/cit/aet/artemis/programming/service/localci/LocalCIMissingJobService.java
src/main/java/de/tum/cit/aet/artemis/programming/service/localci/LocalCITriggerService.java
🧠 Learnings (4)
📚 Learning: 2025-08-08T08:56:49.931Z
Learnt from: tobias-lippert
PR: ls1intum/Artemis#11248
File: src/main/java/de/tum/cit/aet/artemis/programming/web/repository/RepositoryResource.java:288-291
Timestamp: 2025-08-08T08:56:49.931Z
Learning: In the Artemis codebase, the LocalCI profile is only active when LocalVC is active (LocalCI ⇒ LocalVC). Therefore, in RepositoryResource.commitChanges, guarding the LocalVCServletService invocation with profileService.isLocalCIActive() is correct and guarantees the bean’s presence.
Applied to files:
src/test/java/de/tum/cit/aet/artemis/programming/AbstractProgrammingIntegrationLocalCILocalVCTestBase.java
src/main/java/de/tum/cit/aet/artemis/programming/service/localci/LocalCIMissingJobService.java
📚 Learning: 2025-08-10T18:33:22.476Z
Learnt from: Wallenstein61
PR: ls1intum/Artemis#10989
File: src/test/java/de/tum/cit/aet/artemis/programming/AbstractProgrammingIntegrationLocalCILocalVCTest.java:3-9
Timestamp: 2025-08-10T18:33:22.476Z
Learning: In the Artemis test framework, `MockitoBean` from `org.springframework.test.context.bean.override.mockito` is the standard annotation used for mocking beans in test classes, not `MockBean`. This is used consistently across test base classes like `AbstractProgrammingIntegrationLocalCILocalVCTest`, `AbstractSpringIntegrationIndependentTest`, and `AbstractSpringIntegrationLocalVCSamlTest`. The project also uses `MockitoSpyBean` from the same package.
Applied to files:
src/test/java/de/tum/cit/aet/artemis/programming/icl/LocalCIIntegrationTest.java
📚 Learning: 2025-07-07T11:43:11.736Z
Learnt from: bassner
PR: ls1intum/Artemis#10705
File: src/test/java/de/tum/cit/aet/artemis/lecture/service/SlideUnhideServiceTest.java:54-57
Timestamp: 2025-07-07T11:43:11.736Z
Learning: In the Artemis test framework, the AbstractArtemisIntegrationTest base class provides common MockitoSpyBean fields like instanceMessageSendService as protected fields, making them available to all test subclasses through inheritance.
Applied to files:
src/test/java/de/tum/cit/aet/artemis/programming/icl/LocalCIIntegrationTest.java
📚 Learning: 2025-07-27T16:51:47.707Z
Learnt from: ekayandan
PR: ls1intum/Artemis#11119
File: src/test/java/de/tum/cit/aet/artemis/buildagent/service/BuildAgentIntegrationTest.java:97-98
Timestamp: 2025-07-27T16:51:47.707Z
Learning: In BuildAgentIntegrationTest.java, the BuildAgentInformation constructor call with parameters (buildAgentDTO, 2, 0, new ArrayList<>(), BuildAgentInformation.BuildAgentStatus.IDLE, null, null, pauseAfterConsecutiveFailures, 2) is correct and does not require an additional maxConcurrentBuildsAllowed parameter.
Applied to files:
src/test/java/de/tum/cit/aet/artemis/programming/icl/LocalCIIntegrationTest.java
src/main/java/de/tum/cit/aet/artemis/programming/service/localci/LocalCITriggerService.java
🧬 Code graph analysis (1)
src/main/java/de/tum/cit/aet/artemis/programming/service/localci/LocalCIMissingJobService.java (1)
src/main/java/de/tum/cit/aet/artemis/programming/service/localci/LocalCIEventListenerService.java (1)
Lazy
(42-187)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Codacy Static Code Analysis
🔇 Additional comments (9)
src/test/java/de/tum/cit/aet/artemis/programming/AbstractProgrammingIntegrationLocalCILocalVCTestBase.java (1)
37-37
: LGTM: Injecting LocalCIMissingJobService into the LocalCI test baseImport and autowiring are consistent with the new service and LocalCI profile usage.
Also applies to: 116-118
src/main/java/de/tum/cit/aet/artemis/programming/service/localci/LocalCITriggerService.java (4)
140-141
: LGTM: Defaulting retries to 0 for triggerAll pathDelegation to the new 5-arg method with
retryCount = 0
is correct.
153-154
: LGTM: Defaulting retries to 0 for push-triggered pathConsistent with the new overloads.
161-173
: LGTM: Public overload with explicit retryCountClear API surface for callers that know the retry context.
227-229
: Constructor parameter order verified: the arguments in LocalCITriggerService match the BuildJobQueueItem record signature exactly; no changes needed.src/test/java/de/tum/cit/aet/artemis/programming/icl/LocalCIIntegrationTest.java (4)
55-56
: Good call on isolation.Using @isolated together with SAME_THREAD prevents cross-test interference for LocalCI.
Also applies to: 109-110
137-139
: Tests correctly pick up the configured retry limit.Property wiring mirrors the service default (3).
376-393
: Retry-limit test focuses well on the boundary.Asserting id/unchanged status/retryCount at the cap is sufficient here.
809-812
: LGTM on queue observation change.Peeking from queuedJobs aligns with the new queue wiring.
src/main/java/de/tum/cit/aet/artemis/programming/repository/BuildJobRepository.java
Show resolved
Hide resolved
src/main/java/de/tum/cit/aet/artemis/programming/service/localci/LocalCIMissingJobService.java
Show resolved
Hide resolved
src/main/java/de/tum/cit/aet/artemis/programming/service/localci/LocalCIMissingJobService.java
Outdated
Show resolved
Hide resolved
src/main/java/de/tum/cit/aet/artemis/programming/service/localci/LocalCIMissingJobService.java
Show resolved
Hide resolved
src/main/java/de/tum/cit/aet/artemis/programming/service/localci/LocalCITriggerService.java
Outdated
Show resolved
Hide resolved
src/main/java/de/tum/cit/aet/artemis/programming/service/localci/LocalCITriggerService.java
Show resolved
Hide resolved
src/test/java/de/tum/cit/aet/artemis/programming/icl/LocalCIIntegrationTest.java
Show resolved
Hide resolved
End-to-End (E2E) Test Results Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tested locally
This reverts commit 9d47fd9.
1f039f2
End-to-End (E2E) Test Results Summary
|
This reverts commit 68fa614.
End-to-End (E2E) Test Results Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reapprove code
End-to-End (E2E) Test Results Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reapprove
Checklist
General
Server
Changes affecting Programming Exercises
Motivation and Context
There are scenarios where build jobs get lost in the LocalCI system. We already have a scheduled mechanism that checks persisted build jobs that got queued but neither completed or are still running and marks them as
MISSING
. When a build job is already marked as Missing we should try to run it again.Description
Retrigger build for
MISSING
jobs of the last hour for whose participation no newer job exists already (to avoid retrying a missing job that succeeded /failed finally in the meantime) in a schedule with incremented retry count up to 3 times.Move existing
checkPendingBuildJobsStatus
and the newretryMissingJobs
schedules to newLocalCIMissingJobService
For example, I run 250 jobs and killed build agent 3 in the dev cluster during the processing.

The jobs that run on agent 3 at the moment first appear lost:
After the schedules run the initial jobs get marked missing and jobs will be retrigered for this participation:

Steps for Testing
Test Locally:
artemis.continuous-integration.check-job-status-interval-seconds:30
,artemis.continuous-integration.check-job-status-delay-seconds:15
and
artemis.continuous-integration.retry-missing-jobs-interval-seconds:30
,artemis.continuous-integration.retry-missing-jobs-delay-seconds:30
(Or after restarting in 4. grab a coffee for 5 minutes)Testserver States
You can manage test servers using Helios. Check environment statuses in the environment list. To deploy to a test server, go to the CI/CD page, find your PR or branch, and trigger the deployment.
Review Progress
Code Review
Manual Tests
Test Coverage
Summary by CodeRabbit
New Features
Tests