Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upon restart some stacks never start #647

Open
2 tasks done
liquidfrollo opened this issue Oct 23, 2024 · 22 comments
Open
2 tasks done

Upon restart some stacks never start #647

liquidfrollo opened this issue Oct 23, 2024 · 22 comments
Labels
bug Something isn't working

Comments

@liquidfrollo
Copy link

⚠️ Please verify that this bug has NOT been reported before.

  • I checked and didn't find similar issue

🛡️ Security Policy

Description

When the host is restarted (Truenas scale running jlmkr / dockge) some stacks start but others show exited or not up completely. If i go into the stacks manually and click start they start without issue.

image
image
image
image

👟 Reproduction steps

Restart the host

👀 Expected behavior

all stacks should start without issue

😓 Actual Behavior

some stacks do not start. Speculation that the docker sock exposed by traefik is not available and is required by other stacks, no way to do "depends on" between stacks.

Dockge Version

1.4.2

💻 Operating System and Arch

Truenas Scale (24.0.4.2.3) / Jlmkr running - Debian release 12 codename bookworm

🌐 Browser

Firefox - most current

🐋 Docker Version

20.10.24+dfsg1

🟩 NodeJS Version

No response

📝 Relevant log output

No response

@liquidfrollo liquidfrollo added the bug Something isn't working label Oct 23, 2024
@wsw70
Copy link

wsw70 commented Oct 26, 2024

the docker sock exposed by traefik is not available

I am not sure I understand: the docker socket is provided by docker (and the OS) and Traefik just makes use of it.

@louislam
Copy link
Owner

Might be the wrong status bug, which I still don't know how to 100% reproduce it. They maybe actually up.

@liquidfrollo
Copy link
Author

The apps themselves are NOT up. Take the arr's stack above where some say started and some are not. The overseerr app is up however sonarr / radarr is not and returns a 404 when i attempt to access. I wish i could stack rank or priority order start the compose files at a minimum (assuming i couldn't depend on another app in a diff stack) to buy time for the picky stacks such as my arr's stack.

With respect to the docker sock. from a security perspective, the docker sock is not directly exposed and instead is exposed through the socket-proxy service seen in the traefik stack. I tried making other stacks depend on the status of another however that doesn't work because you can't make one compose depend on another (from what i gathered)

@markwaters
Copy link

I'm having the same issue , but I am new to docker so it may be that.
I have linkding , readeck and uptime-kuma setup and working.
However readeck doesn't start , or perhaps does start and fails.
If I start it manually it works perfectly.

@liquidfrollo
Copy link
Author

same for me in that if i restart it manually it works. It just never fully loads on a server restart.

@markwaters
Copy link

I created a new ProxMox LXC , installed Docker - just the command line version this time.
Copied the data directory across.
This starts Readeck on Host restart every time so far.

@liquidfrollo
Copy link
Author

Glad your issue is resolved however, mine is not and i have no indication why those containers won't start without manual intervention

@InnocentRain
Copy link

I pretty much have the same issue, some containers don't start after a reboot, but only about 20% of the time.

@N0rga
Copy link

N0rga commented Dec 27, 2024

I'm having the same issue as @liquidfrollo. Any containers within Dockge don't automatically start when my server is rebooted. They're all in individual stacks as well, with ARR apps pointing back to Gluetun for VPN/Network. (Understand that this is probably not the ideal configuration)

@liquidfrollo
Copy link
Author

I'm wondering if it is quietly failing because there is no health check / dependency ability between stacks? Just speculating as i'm unsure why it wouldn't just come up as healthy. Also curious if we could set a priority of compose start order if it would resolve it. For instance if i start / wait for traefik stack and authentik stack (reverse proxy + socket security service, and authentication service) would everything else start without issue?

@N0rga
Copy link

N0rga commented Jan 15, 2025

I'm wondering if it is quietly failing because there is no health check / dependency ability between stacks? Just speculating as i'm unsure why it wouldn't just come up as healthy. Also curious if we could set a priority of compose start order if it would resolve it. For instance if i start / wait for traefik stack and authentik stack (reverse proxy + socket security service, and authentication service) would everything else start without issue?

Yeah, it seemed to me that when the container was destroyed and then recreated, it was given another ID or something. I also had this when certain stacks were updated through Dockge.
As they were using the Gluetun stack as the network mode, they for some reason couldn't find it any more, and each time trying to start said container "whole string of numbers and letter" could not be found or doesn't exist.

I had the issue before and couldn't fix it that time and has to recreate the stack again and it worked no problem.

I fixed my issues by putting all of GlueTun and the *arr's into a single stack and using the Depends_on and healthcheck commands to make all of the *arrs wait until the Gluetun service was healthy before starting.

@liquidfrollo
Copy link
Author

liquidfrollo commented Jan 15, 2025 via email

@DomiiBunn
Copy link
Contributor

Hia all,

Kinda late to the party.

What is the restart_policy on all of the offending container stacks?

Dockge does not auto restart the containers - docker it's self does depenging on the selected option.

Start containers automatically

By default docker won't auto restart containers. You can change this behavoiur by setting the policy on all containers to Unless Stopped

Image

Similar to always, except that when the container is stopped (manually or otherwise), it isn't restarted even after Docker daemon restarts.

or Always

Image

Always restart the container if it stops. If it's manually stopped, it's restarted only when Docker daemon restarts or the container itself is manually restarted.

Be sure to set the option on all the containers in the stack

@liquidfrollo
Copy link
Author

All offending stacks have "restart: unless-stopped" set on them yet still do not start. Thanks for the double-check!

@DomiiBunn
Copy link
Contributor

In that case could you please paste your compose file?

Still the issue woudn't likley be with Dockge but it's worth a poke

@liquidfrollo
Copy link
Author

liquidfrollo commented Jan 27, 2025

here is one of them. There are 4 that don't start with the same behavior. If i go manually click start they all work. Note attempted to use a code block but it removes all new lines so is hard to read

version: "3.8"
services:
plex:
image: plexinc/pms-docker:plexpass
restart: unless-stopped
container_name: plexms
ports:
- 32400:32400/tcp
- 3005:3005/tcp
- 8324:8324/tcp
- 32469:32459/tcp
- 1900:1900/udp
- 32410:32410/udp
- 32412:32412/udp
- 32413:32413/udp
- 32414:32414/udp
environment:
- PUID=1000
- PGID=1000
- TZ=America/Denver
- PLEX_CLAIM=${PLEX_CLAIM}
- HOSTNAME="XXXX"
volumes:
- ./config:/config
- ./transcodes:/transcode
- ${NAS_DIR}/Anime:/Anime
- ${NAS_DIR}/TVShows:/TVShows
- ${NAS_DIR}/Videos:/Videos
- ${NAS_DIR}/Audiobooks:/Audiobooks
networks:
- proxy
labels:
- traefik.enable=true
- traefik.http.routers.plex.entryPoints=https
- traefik.http.routers.plex.rule=Host(XXX) ||
HostRegexp({subdomain:[A-Za-z0-9](?:[A-Za-z0-9\-]{0,61}[A-Za-z0-9])?}XXXX)
&& PathPrefix(/outpost.goauthentik.io/)
- traefik.http.routers.plex.tls.certresolver=cloudflare
- traefik.http.services.plex.loadbalancer.server.port=32400
- traefik.frontend.headers.SSLRedirect=true
- traefik.frontend.headers.STSSeconds=315360000
- traefik.frontend.headers.browserXSSFilter=true
- traefik.frontend.headers.contentTypeNosniff=true
- traefik.frontend.headers.forceSTSHeader=true
- traefik.frontend.headers.SSLHost=XXXX
- traefik.frontend.headers.STSIncludeSubdomains=true
- traefik.frontend.headers.STSPreload=true
- traefik.frontend.headers.frameDeny=true
networks:
proxy:
external: true

@InnocentRain
Copy link

InnocentRain commented Jan 31, 2025

Here's one of mine that gives me the most trouble, most of the time it works just fine but sometimes one or more containers fail to start:

services:
  gluetun:
    image: qmcgaw/gluetun:latest
    container_name: gluetun
    restart: unless-stopped
    cap_add:
      - NET_ADMIN
    ports:
      - 8000:8000
    environment:
      - VPN_SERVICE_PROVIDER=
      - OPENVPN_USER=
      - OPENVPN_PASSWORD=
      - SERVER_COUNTRIES=
      - UPDATER_PERIOD=
      - TZ=Europe/Vienna
    volumes:
      - /srv/data/gluetun:/gluetun
    devices:
      - /dev/net/tun:/dev/net/tun
    networks:
      - proxy
  sabnzbd:
    image: lscr.io/linuxserver/sabnzbd:latest
    container_name: sabnzbd
    restart: unless-stopped
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Europe/Vienna
    volumes:
      - /srv/data/sabnzbd:/config
      - /mnt/SMB/HDD_Non_Encrypted/arr-stack:/data
    network_mode: service:gluetun
  prowlarr:
    image: lscr.io/linuxserver/prowlarr:latest
    container_name: prowlarr
    restart: unless-stopped
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Europe/Vienna
    volumes:
      - /srv/data/prowlarr:/config
    network_mode: service:gluetun
  sonarr:
    image: lscr.io/linuxserver/sonarr:latest
    container_name: sonarr
    restart: unless-stopped
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Europe/Vienna
    volumes:
      - /srv/data/sonarr:/config
      - /mnt/SMB/HDD_Non_Encrypted/arr-stack:/data
    network_mode: service:gluetun
  radarr:
    image: lscr.io/linuxserver/radarr:latest
    container_name: radarr
    restart: unless-stopped
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Europe/Vienna
    volumes:
      - /srv/data/radarr:/config
      - /mnt/SMB/HDD_Non_Encrypted/arr-stack:/data
    network_mode: service:gluetun
  readarr:
    image: lscr.io/linuxserver/readarr:develop
    container_name: readarr
    restart: unless-stopped
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Europe/Vienna
    volumes:
      - /srv/data/readarr:/config
      - /mnt/SMB/HDD_Non_Encrypted/arr-stack:/data
    network_mode: service:gluetun
  audiobookshelf:
    image: ghcr.io/advplyr/audiobookshelf:latest
    container_name: audiobookshelf
    restart: unless-stopped
    environment:
      - TZ=Europe/Vienna
    volumes:
      - /srv/data/audiobookshelf/config:/config
      - /srv/data/audiobookshelf/metadata:/metadata
      - /mnt/SMB/HDD_Non_Encrypted/arr-stack:/data
    network_mode: service:gluetun
  bazarr:
    image: lscr.io/linuxserver/bazarr:latest
    container_name: bazarr
    restart: unless-stopped
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Europe/Vienna
    volumes:
      - /srv/data/bazarr:/config
      - /mnt/SMB/HDD_Non_Encrypted/arr-stack:/data
    network_mode: service:gluetun
networks:
  proxy:
    external: true

@DomiiBunn
Copy link
Contributor

I can not reproduce this at all.

I tried on a standalone instance.
On a VM
bare metal
in a proxmox ct

nothing seams to cause said issue.

Last question I would have.

Does the issue happen when you start the stack by using your docker compose instead of via dockge.

@liquidfrollo
Copy link
Author

liquidfrollo commented Feb 3, 2025 via email

@InnocentRain
Copy link

same with me but dockge is installed on a bare metal debian

@DomiiBunn
Copy link
Contributor

Unless someone smarter than me says otherwise. This is not dockge related.

Or if we can get reproducible steps that prove it's dockge and not the docker host it's self

@liquidfrollo
Copy link
Author

DomiiBunn, any suggestions on how i could capture logs / repeatable steps etc that would expose if it is dockge or not? It looks like others in this thread with dockge experience the same issue. It seems to me as if some internal health check isn't being respected or something. Since i am unaware on any ability to control the sequence of how the containers start or make dependencies cross stack i don't really have the ability to control any starting functions. I don't want to combine stacks due to core dependency vs common apps. For instance, docker socket needed by most vs controlling all arr apps. I use that as an example of something i would like to do but don't know how to do yet due to dependencies across stacks. I would like to use a socket proxy vs exposing the socket directly for security but since i don't have cross stack dependency i haven't implemented this yet. If i were able to do this since there are health checks on the docker socket being avail it would functionally delay all other stacks potentially preventing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants
@markwaters @louislam @wsw70 @liquidfrollo @DomiiBunn @InnocentRain @N0rga and others