Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Containerfile/Dockerfile parser #3970

Open
westurner opened this issue Mar 27, 2024 · 12 comments
Open

Containerfile/Dockerfile parser #3970

westurner opened this issue Mar 27, 2024 · 12 comments

Comments

@westurner
Copy link

westurner commented Mar 27, 2024

STORY: Users can parse Dockerfile and Containerfile with universal-ctags in order to navigate and review with tool support.

@masatake
Copy link
Member

I have not read this issue well yet.
Here is .ctags I wrote ago:

#
# containerfile.ctags --- regex parser for Containerfile and Dockerfile
#
#  Copyright (c) 2023, Red Hat, Inc.
#  Copyright (c) 2023, Masatake YAMATO
#
#  Author: Masatake YAMATO <[email protected]>
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
# USA.
#
# Reference: https://docs.docker.com/engine/reference/builder/
# 
--langdef=Containerfile
--map-Containerfile=+(Containerfile)
--map-Containerfile=+(Dockerfile)

--kinddef-Containerfile=a,arg,arguments
--kinddef-Containerfile=e,env,envorment variables
--kinddef-Containerfile=i,image,images
--_roledef-Containerfile.{image}=from,specfied in FROM

--regex-Containerfile=/^ARG[[:space:]]+([^[:space:]=]+)/\1/a/{exclusive}
--regex-Containerfile=/^ENV[[:space:]]+([^[:space:]=]+)/\1/e/{exclusive}
--regex-Containerfile=/^FROM[[:space:]]+(--[^[:space:]]*)?[[:space:]]+([^[:space:]]+)([[:space:]]+(as|AS)[[:space:]]+([^[:space:]]+))?//{exclusive}{{
    \2 /image /from _reftag _commit pop
    \5 false ne{
        \5 /image _tag _commit \2 inherits:
    } if
}}

This can be used as a start point.

@masatake
Copy link
Member

The main task of ctags is to extract names newly introduced in a target file.
ctags extracts such names as definition tags.
Though we have extended the task to extract names referenced or used, extracting definition tags is a higher priority.

I don' think RUN introduces a new name.

As far as reading https://www.tohoho-web.com/docker/dockerfile.html (Japanese), LABEL introduces names. So, the .ctags file should support it.

The critical issue is the .ctags doesn't support a command with multiple lines like:

ENV DB_HOST="192.168.2.201" \
    DB_PORT="3306" \
    DB_USER="myapp" \
    DB_PASSWD="ZbGc7#adG87GBfVC" \
    DB_DATABASE="sample"

To extract DB_PORT, DB_USER, ..., we must switch the multi-table meta parser (https://docs.ctags.io/en/latest/optlib.html#advanced-pattern-matching-with-multiple-regex-tables) from the line-oriented meta parser.

In my experience, A Containerfile is not very large. The performance of the parser may not be important, so a regex-based optlib parser is enough for the purpose.

Do you want to implement such a parser by yourself?
I don't want to intend to take your joyful hacking time:-P

@westurner
Copy link
Author

westurner commented Mar 28, 2024

Thanks. I can't commit to owning a parser like this; but here's this for parsing from https://docs.docker.com/reference/dockerfile/ :

$$('article table:first-of-type tr code').map((el) => el.innerText).reduce((a,b) => a + "\n" + b)
"ADD
ARG
CMD
COPY
ENTRYPOINT
ENV
EXPOSE
FROM
HEALTHCHECK
LABEL
MAINTAINER
ONBUILD
RUN
SHELL
STOPSIGNAL
USER
VOLUME
WORKDIR"

@masatake
Copy link
Member

I don't understand why you want to show all the commands.
Ctags is not a general navigation tool.
It focuses on definitions. We need a list of all commands that define names or introduce NEW names.

@masatake
Copy link
Member

#
# containerfile.ctags --- regex parser for Containerfile and Dockerfile
#
#  Copyright (c) 2023, 2024, Red Hat, Inc.
#  Copyright (c) 2023, 2024, Masatake YAMATO
#
#  Author: Masatake YAMATO <[email protected]>
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301,
# USA.
#
# Reference: https://docs.docker.com/engine/reference/builder/
# 
--langdef=Containerfile
--map-Containerfile=+(Containerfile)
--map-Containerfile=+(Dockerfile)

--kinddef-Containerfile=a,arg,arguments
--kinddef-Containerfile=e,env,envorment variables
--kinddef-Containerfile=i,image,images
--_roledef-Containerfile.{image}=from,specfied in FROM
--kinddef-Containerfile=l,label,labels

--_tabledef-Containerfile=main
--_tabledef-Containerfile=skipComment
--_tabledef-Containerfile=next
--_tabledef-Containerfile=arg
--_tabledef-Containerfile=env
--_tabledef-Containerfile=label

--_mtable-regex-Containerfile=skipComment/#[^\n]*//

--_mtable-regex-Containerfile=next/\\\n//{tleave}
--_mtable-regex-Containerfile=next/\n//{tleave}{_advanceTo=0start}
--_mtable-regex-Containerfile=next/[^\\\n]+//

--_mtable-extend-Containerfile=main+skipComment
--_mtable-regex-Containerfile=main/(ARG[ \t]+(\\\n)?|ARG\\\n)//{tenter=env}
--_mtable-regex-Containerfile=main/(ENV[ \t]+(\\\n)?|ENV\\\n)//{tenter=env}
--_mtable-regex-Containerfile=main/(LABEL[ \t]+(\\\n)?|LABEL\\\n)//{tenter=label}
--_mtable-regex-Containerfile=main/FROM[ \t]+(--[^ \t]*[ \t]+)?([^ \t\n]+)([  t]+(as|AS)[  \t]+([^ \t\n]+))?//{{
     \2 /image /from @2 _reftag _commit pop
     \5 false ne{
         \5 /image @5 _tag _commit \2 inherits:
     } if
}}
--_mtable-regex-Containerfile=main/[^\n]+//
--_mtable-regex-Containerfile=main/.//

--_mtable-regex-Containerfile=arg/[ \t]+//
--_mtable-regex-Containerfile=arg/([^[:space:]=]+)/\1/a/{tenter=next}
--_mtable-regex-Containerfile=arg/\n//{tleave}
--_mtable-regex-Containerfile=env/[ \t]+//
--_mtable-regex-Containerfile=env/([^[:space:]=]+)/\1/a/{tenter=next}
--_mtable-regex-Containerfile=env/\n//{tleave}
--_mtable-regex-Containerfile=label/[ \t]+//
--_mtable-regex-Containerfile=label/([^[:space:]=]+)/\1/a/{tenter=next}
--_mtable-regex-Containerfile=label/\n//{tleave}

@westurner
Copy link
Author

My use case for [universal-]ctags (#354) is vim-tagbar, which:

Tagbar is a Vim plugin that provides an easy way to browse the tags of the current file and get an overview of its structure. It does this by creating a sidebar that displays the ctags-generated tags of the current file, ordered by their scope. This means that for example methods in C++ are displayed under the class they are defined in.

(FWIW where tagbar doesn't get it, vim-voom [2] has Markdown and RST outline editing. I still have a custom config, but e.g. SpaceVim [3] has TagBar installed too)

[1] https://github.com/preservim/tagbar
[2] https://github.com/vim-voom/VOoM/blob/master/doc/voom.txt
[3] https://spacevim.org/use-vim-as-ide/

So IDK if just all of the tokens are worth indexing for Containerfile. RUN and ENTRYPOINT and HEALTHCHECK are probably significant enough tokens in the file to be useful for navigation with tagbar and similar for e.g. vscode.

@westurner
Copy link
Author

westurner commented Mar 28, 2024

jupyter-docker-stacks Dockerfiles aren't that long because they extend FROM other Dockerfile, but as far as demonstrating the utility of tagbar+ctags with a useful Dockerfile, there's docker-stacks-foundation/Dockerfile which specifies the e.g. NB_USER arg and so on: https://jupyter-docker-stacks.readthedocs.io/en/latest/ https://github.com/jupyter/docker-stacks/blob/main/images/docker-stacks-foundation/Dockerfile

What's a better example of a gnarly Dockerfile where this functionality will be helpful?

@masatake
Copy link
Member

Regarding languages for Documentation, we violate the principle of "making a tag for definition."

However, about Cotainerfile/Dockerfile, I want to uphold the principle.
If I introduce a parser for the languages, the parser may only extract definitions.
If you want to make tags for objects other than definitions, extend the built-in parser with ---regex-Containerfile=... options in your .ctags.

https://github.com/containers/buildah/blob/main/tests/bud/multi-stage-builds/Dockerfile.extended

This Dockerfile is quite a good example. Thank you.

I am surprised at ENV "BUILD_LOGLEVEL"="5". The left-side variable is surrounded by double-quote characters.

FROM is used more than once. An image name specified at FROM/AS is a scope for ENV, ARG, and LABEL. If only FROM is used, the parser must generate an image name to fill the scope fields of ENV, ARG, and LABEL.

https://docs.podman.io/en/stable/markdown/podman-build.1.html

Podman-build runs CPP. Therefore, #define DEF may appear in a Container file.
ctags should extract DEF as a CPP macro.

Can we satisfy these requirements with .ctags? To get the answer to this question, I will implement the parser by myself.

@westurner
Copy link
Author

westurner commented Mar 30, 2024

  • A PEG and/or EBNF grammar for Containerfile / Dockerfile would probably make the job much easier. (Other projects could generate Dockerfile parsers in whatever language from such a grammar.)

Isn't it possible to ~ distill such a grammar from a number of examples, such as the already-reference buildah and docker container builder test cases? Podman builds containers with Buildah. Nerdctl and Docker > 23.0 build containers with BuildKit.

BuildKit; where are the Dockerfile syntax examples tested by BuildKit?:

Buildah's test Dockerfiles appear to be the most complete set of test Dockerfiles / Containerfiles I'm aware of.

masatake added a commit to masatake/ctags that referenced this issue Mar 31, 2024
masatake added a commit to masatake/ctags that referenced this issue Mar 31, 2024
@hholst80
Copy link

hholst80 commented May 12, 2024

It focuses on definitions. We need a list of all commands that define names or introduce NEW names.

FROM X AS Y is what you want to consider. There are implicit numerical names given.

I do not see a great value in trying to add references from commands like COPY, ADD, MOUNT, RUN to other layers. Having a way to navigate between FROM is more than enough. Tags support for Dockerfile seems fairly useless imho and just adds complexity to the tooling with few real use cases.

Example

FROM alpine:latest

# RUN ..

FROM debian:latest

# Here we copy from stage 0 (not given a symbolic name automatic name is 0)
COPY --from=0 /build/artifact /usr/local/bin
ENTRYPOINT ["/usr/local/bin/artifact"] 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants