Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Tile Format & Better tilemap identifiers #3123

Open
6 tasks done
swagtoy opened this issue Dec 2, 2024 · 3 comments
Open
6 tasks done

New Tile Format & Better tilemap identifiers #3123

swagtoy opened this issue Dec 2, 2024 · 3 comments

Comments

@swagtoy
Copy link
Contributor

swagtoy commented Dec 2, 2024

This has been discussed by me a few times, but I figured it would be healthy to move it to a Github issue so everyone can discuss it more formally. :-)

Anyway, many people creating levels or "plugins" might be aware of the nuances with the Supertux level format:

  1. Level IDs are.. static (meaning tile 0 is "nothing", tile 1 may be this, tile 2 may be that... no exceptions). This was probably acceptable early in development, but as the game seems to want to wither its way onto the internet, offering features such as downloadable libraries, tilesets, etc; this is becoming much less justifiable. In fact, I'd argue this is just a blocker for many people wanting to develop "reusable tilesets". Forget it if you want to, there's no way you can prevent these numbers from colliding besides claiming a "number namespace"
  2. The current format is just a bit gross! S-Expr "level" files, which are actually fairly acceptable for objects and metadata, are littered with 0 0 0 0 0 0 0 0 0 123 456 1024 2048 0 0 0 0... This clearly just can't stay here. It's unnecessary work for sexp to have to deal with, and it just really feels "disgusting" to do. It clearly can't scale well with multiple sectors and/or background objects. It may even be responsible for level loading feeling a bit "sluggish" for large levels.

I'd like to start with problem two, because it rolls into problem one (a little unevenly, sorry!). The obvious solution would just be inventing our own binary format. Well, almost... I'd love to mention more well-ingrained binary/serialization formats such as BSON as well as UBJSON... but I won't get into the details, these are just some that I found.

Essentially, we just... store ids in binary. How difficult is that? Seriously. Maybe 16 bit integers will do, but if we really want longevity, 32 bit integers couldn't hurt at all. (24 couldn't hurt btw, but just remember that computers do not stride optimally at 24 i think.)

Now, for "empty" or just mass repeated tiles, this can become more annoying, and unfortunately, I'm a little strung to a halt on how to handle this efficiently; we CAN make our own """compression""" algorithm. It sounds overkill, but... I mean, we are only trying to eliminate unnecessary "nothing" repetitions. Maybe in the format, we'd have the last 2 bytes indicate a "repetitive grouping", meaning for tiles that just repeat in an array constantly.

So, in hex for 32 bit numbers in big endian, we'd have EF0000AA 000000BA which would mean that we repeat (EF tells us that), AA (170) times, the tile of id "BA" (186). Seems simple? That's kind of because I think it is. We can THEN roll a compression library on top of that like zlib, but even then, it might be unnecessarily. Devices DO have lots of storage, and we've already compressed it down quite a lot compared to the monstrosity of what it is today.

The downside of any sort of compression would be that any sort of file-layer level parsing would be rendered impossible. There is no convenient indirect file access striding now because we added a dumb layer of compression. We don't do that anyways though, and generally speaking levels don't get obnoxiously large that file reading for data would ever even help too much (load once and move on... whatever)


Now, problem one isn't too tricky to solve, but there's a possibility for other solutions. We can just create a "mapping" per level. We essentially push a new "tile name" format (alongside perhaps the old generic id's for compatibility purposes), where tiles can have groups (and yes, even "namespaces" are up there on my proposal). However, parsing these strings and all that would be annoying.

A simple solution: for each level (without no true dependency on an algorithm here), we basically map a string to an integer, then for file loading/saving, this mapping is utilized; the bulk of the level's tile data would still be integers of some sort, but they get mapped through to their proper counterparts. This can very likely optimize level loading and the file size.

This would be hell to parse for people wanting to parse on their own (we can always offer convenience tools for this if this is truly something people want to do), but regardless, it will work. Happily, now we no longer have a hard dependency on magical numbers.

Overall we could end up with......... (in a terrible made up binary format that i hope you can piece together somehow)

{[S U P E R T U] + 0x01}    // array, magic header number for this format, 01 would mean "level format", not necessary but handy to reserve a byte
{origin level name}         // char[255], deference filename associated with the .stl file, unneeded but still handy
{zlib compressed}?          // data onwards will be compressed, minus the final 0xff byte

{sizeof(tile mapping)}
{tile mapping})
{sizeof(compressed tile data})
{compressed tile data}

{0xFF}             // Helpful terminator byte

I won't go into compatibility details right now because it's late and I am tired. The gist I can say is "let's keep the old tile compatibility for a couple versions", but still promote conversion in the editor. A warning for playing old levels can also be done. IMO I do not consider compatibility to be a major roadblocker for this since these "compatibility" shifts happen all the time in games. I just ask we make this transition well tested and 'not transparent, but easy' for people poking old levels.

In the future we must (keyword: MUST) store level version data at a minimum for any possible changes. New versions do not always mean upgrades, but we can use this to at least do checks for changes, as well as "stacking level conversions". A more formal system for this can be programmed, i.e. if no version is mentioned, then we obviously must assume and possibly even verify that it's an old 0.6.3 (or pre 0.6.3 level). This game truly does not have enough levels to warrant absurd compatibility, but it honestly can't hurt to maintain old compatibility as long as we can confirm it still works fine.
I believe in conversion all the way, but I believe even more in storing version numbers in levels from here on out. There are just no excuses not to do this; I'm open to objections but be prepared to fight to the death with me about this.

Guidelines For Reporting Issues

  • I have read https://github.com/SuperTux/supertux/blob/master/CONTRIBUTING.md#bug-reports.
  • I have verified this isn't a request that's already been submitted as an issue.
  • I have verified this isn't a discussion, or an issue with the game, but rather an actual feature request - a currently non-existent, but desired feature.
  • In this request, I have only included details about one (1) desired feature.
  • If I make a mistake while submitting this request, I agree to use the "Edit" feature to correct it, instead of closing this issue and opening a new one.
  • I lied about the checkboxes above. Please keep this as an issue if possible, because it is both an "issue" and a feature that I don't think belongs in something like Discussions. We should conclude on something.
@swagtoy
Copy link
Contributor Author

swagtoy commented Dec 2, 2024

tl;dr Storing tiles as numbers is bad. Storing tiles as strings of number is bad.

@biggeryetbetter
Copy link
Contributor

I don't have any comments right now about the level format (other than, why not just gzip the layer data, as it is?) but you touched on the tile system and I think that's something worth reworking in a future update post 7.0. I totally agree with the current system being ripe for incompatibility.

@swagtoy
Copy link
Contributor Author

swagtoy commented Dec 4, 2024

why not just gzip the layer data

Well, Vankata has already implemented a compression system of some sort in #3124 :-)

But there is honestly just no true need to just gzip data we know from the front and back of our heads. Also, not sure why you mention gzip when zstd is just really nice. The speed to compression ratio that I'm aware of is absurdly fixed and we can even "train" it some to get better compression results. If we really want to compress things, I'd personally go with zstd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants