Skip to content

Triaging Issues

Juan Cruz Viotti edited this page Jan 16, 2017 · 9 revisions

This document aims to serve as an ever-evolving list of pointers and advice on how to debug user's issues.

General guidelines:

  • Always ask the operating system, version, and architecture
  • Always ask what Etcher version they are running, and ask them to try to reproduce with the latest one if that's not what they are running
  • Try to reproduce the issue yourself
  • Check the list of issues to ensure the reported issue was not recently fixed on master, or its not a duplicate one. If an issue already exists for it, then ensure the user runs any debugging steps we outlined on the existing issue.

Check the USER DOCUMENTATION document before.

Drive doesn't boot

If the issue can't be identified, ask the user to upload the image file somewhere for further investigation, and save it on resin.io's Google Drive, so we ensure we don't lose access to it.

These require special treatment that we don't do at the moment. See https://github.com/resin-io/etcher/issues/210

  • Check if the image includes a partition table

Some images don't include a partition table. We've seen this on some VMWare images. See https://github.com/resin-io/etcher/issues/553

  • Was the image downloaded completely?

Maybe calculate a checksum of the local file and compare it with one provided by the publisher?

VBScript errors

  • The only place where we use VBScript is on drivelist

This usually indicates a coding bug in our Windows drive detection script. Take note of the VBScript error stack trace.

Uncaught errors during writing/validation

If the issue can't be identified, or it can be consistently reproduced with a certain image, ask the user to upload the file somewhere for further investigation, and save it on resin.io's Google Drive, so we ensure we don't lose access to it.

  • Discard any decompression related issue by asking the user if his image is compressed, and ask him to flash to it after manually decompressing it.

  • The etcher-image-write modules exposes a nice CLI (see bin/cli.js) when installing globally (e.g: npm install -g etcher-image-write). Ask the user to try to reproduce that way, so we narrow the issue further.

  • Make sure the user provides a screenshot of the uncaught error along with the full stack trace

  • Narrow the issue by identifying at which point the usually happens (e.g: at the beginning of the write process, right after clicking "Flash", during the end of the validation phase)

    • If the issue happens right after clicking flash, before the elevation dialog was shown:

      • The process of elevating the child process is a complex one. This usually points out a coding bug there.
    • If the issue happens right after clicking flash, after the elevation dialog was shown:

      • If on GNU/Linux or OS X, the error might reside on the initial unmounting routine
        • Ask the user to manually unmount the drive first to confirm. The issue might be reproducible outside Etcher, otherwise, it might indeed be an unmounting issue in our application
      • If on Windows, the error might reside on the routine that cleans up the drive (wipes its partition table)
        • Ask the user to try to wipe it out manually (see the clean command of diskpart.exe)
      • It can be an error when spawning the writer process
        • We display the command the exact command that we run in DevTools. Check that the command has no obvious issues (e.g: quoting, special characters). If its not the case, ask the user to open a terminal emulator with administrator/sudo permissions and run the command manually, to see if that shows any other information. Since the child process communicates with an IPC server, ask the user to run the command without closing the main Etcher window, otherwise the IPC server will be closed
    • If the issue happens right before finishing the write process

    • If the issue happens right after starting the validation process

    • If the issue happens before finishing the validation process

      • It might an unmounting issue (confirm by asking the user to disable "unmounting after success" on settings)

Reproducible validation errors

  • This might indicate a bug in our validation routine (unlikely though, it has been battle tested for a while)
  • Check if the drive is getting mounted during the middle of the validation phase, causing the operating system to write dummy files like .DS_Store etc. This usually happens on images that contain partitions recognizable to the OS (like FAT). Ask the user to try another image that can't be read in their OS directly to confirm the issue
  • Ask the user to reproduce with the https://github.com/resin-io-modules/etcher-image-write CLI

Drive not detected

  • This usually means that a removable drive was incorrectly detected to be a system drive by https://github.com/resin-io-modules/drivelist. Ask the user to run the corresponding platform script inside scripts/ and confirm by checking that the drive in question is reported as system: true

    • In OS X, ask the user for the output of diskutil list, diskutil info /dev/diskN and mount. This is a bash script so you can probably figure out the root cause easily
    • In GNU/Linux, ask the user for the output of lsblk -d --output NAME, df --output=source,target, lsblk -b -d /dev/<device> --output SIZE,RO,RM,MODEL, udevadm info --query=property --export --export-prefix=UDEV_ --name=/dev/<device>. This is also a bash script, so it should be easy to figure out what's going on
    • In Windows, this is way trickier, since the script is a VBScript program. Read the script, inject Wscript.Echo calls to output things you think would be beneficial to debug and ask the user to run your modified copy
  • The drive might be a non-removable drive

Ask the user to enable unsafe mode. For safety purposes we will not attempt to interpret a non-removable drive as a removable drive using any kind of heuristic.

The user says his drive is corrupted

  • This is very unlikely, and its usually because the image that the user wrote contains partitions that are not recognized by their host OS (e.g: Linux partitions on Windows). If they say the drive looks fine when flashed by other programs, it might be that other programs are doing special treatment on the device or perform another flashing technique. The drive usually boots regardless.

  • If the user wants to reformat his drive to a normal state (which can be tricky on Windows specially), ask him to follow this guide: https://github.com/resin-io/etcher/blob/master/docs/USER-DOCUMENTATION.md#recovering-broken-drives

Application doesn't start on GNU/Linux

  • Check if the user is running from an AppImage. The AppImage has no way of declaring dependencies, and it requires certain things to be available from the host OS. Check the list here: https://github.com/resin-io/etcher/blob/master/docs/USER-DOCUMENTATION.md#runtime-gnulinux-dependencies. Ask the user to install any missing one. The tricky part about this is that if try to open the AppImage by doubly clicking on it, or by opening it from a desktop environment application menu, you won't see the output error, so ask the user to run the AppImage from the command line.

Users unable to run the app after installing from source

  • Check if the user is simply doing an npm install. That will not work, since we need to configure NPM to build to the right electron headers, etc. Ask the user to use our build scripts to install dependencies.

EIO errors during flashing/validating

  • This usually happens due to malfunctioning SDCards/SDCard adapters. Ask the user to try with another ones. We included a fix in v1.0.0-beta.18 where every I/O operating will be retried up to 10 times on EIO, so this issue should be rarer now.

  • Ensure we know if the error happens during flashing or validating, so we know its a read EIO or a write EIO.

AppImage doesn't run

  • AppImages require FUSE to work. Ensure the user has that dependency installed.