-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature proposal: annotate source with token/character counts for functions #12
Comments
Currently, the minification steps modify both the node tree and tokens array. I'm not too happy about this either, and will probably improve this as part of a branch I'm (potentially) working on. Either way, you can traverse over the tokens in a node tree via traverse_tokens and add them to your own array, then count that. I'm not sure how easy it is to split the code and minify each function separately - especially if you want to do renaming, as that's an inherently global operation. It might be better to minify the whole thing at once. Currently, minify_code converts a node tree back to source at the end - using two approaches depending on whether minify_wspace was requested. Splitting this logic into a separate function would definitely be nice. (I may also do this in my branch) However, it sounds like you don't want to output minified code, just lightly modified code. The logic to do so would likely be somewhat similar (in structure) to the bottom else case of minify_code (minus the complications) - traverse over tokens, append their contents and the wspace between them, except with changes to the comments as needed. (The comments are part of the wspace between tokens and aren't captured explicitly in the node tree or the token list) |
Below are a simple cart I'm testing with and a pico-8 cartridge // http://www.pico-8.com
version 18
__lua__
-- comment
function funcname()
return true
end {
"children": [ {
"body": { "py/id": 25 },
"children": [
{
"children": { "py/tuple": [] },
"endidx": 19,
"idx": 11,
"modified": false,
"parent": { "py/id": 2 },
"py/object": "pico_tokenize.Token",
"source": {
"cart": {
"code": "-- comment\nfunction funcname()\n return true\nend",
"code_map": [ {
"py/newargs": { "py/tuple": [ { "py/tuple": [ 0, "test.p8", "-- comment\nfunction funcname()\n return true\nend", 0, 3 ] } ] },
"py/object": "pico_cart.CodeMapping",
"py/seq": [ 0, "test.p8", "-- comment\nfunction funcname()\n return true\nend", 0, 3 ]
} ],
"code_rom": null,
"meta": { "default_factory": { "py/type": "builtins.list" }, "py/object": "collections.defaultdict" },
"name": "test.p8",
"platform": "l",
"py/object": "pico_cart.Cart",
"rom": { "py/object": "pico_defs.Memory" },
"screenshot": null,
"version_id": 18,
"version_tuple": { "py/tuple": [ 0, 1, 23, 0 ] }
},
"mappings": { "py/id": 9 },
"py/object": "pico_process.CartSource"
},
"type": { "py/object": "pico_tokenize.TokenType", "value": 3 },
"value": "function",
"vline": 1
}, {
"children": [ {
"children": { "py/tuple": [] },
"endidx": 28,
"idx": 20,
"modified": false,
"parent": { "py/id": 12 },
"py/object": "pico_tokenize.Token",
"source": { "py/id": 6 },
"type": { "py/object": "pico_tokenize.TokenType", "value": 2 },
"value": "funcname",
"vline": 1
} ],
"endidx": 28,
"idx": 20,
"kind": { "py/object": "pico_parse.VarKind", "value": 1 },
"name": "funcname",
"new": false,
"parent": { "py/id": 2 },
"parent_scope": {
"depth": -1,
"funcdepth": 0,
"items": { "_ENV": { "implicit": true, "keys_kind": null, "name": "_ENV", "py/object": "pico_parse.Local", "scope": { "py/id": 19 } } },
"parent": null,
"py/object": "pico_parse.Scope"
},
"py/object": "pico_parse.Node",
"scopespec": null,
"source": { "py/id": 6 },
"type": { "py/object": "pico_parse.NodeType", "value": 0 },
"value": null,
"var": { "keys_kind": null, "name": "funcname", "py/object": "pico_parse.Global" },
"var_kind": null
}, {
"children": { "py/tuple": [] },
"endidx": 29,
"idx": 28,
"modified": false,
"parent": { "py/id": 2 },
"py/object": "pico_tokenize.Token",
"source": { "py/id": 6 },
"type": { "py/object": "pico_tokenize.TokenType", "value": 4 },
"value": "(",
"vline": 1
}, {
"children": { "py/tuple": [] },
"endidx": 30,
"idx": 29,
"modified": false,
"parent": { "py/id": 2 },
"py/object": "pico_tokenize.Token",
"source": { "py/id": 6 },
"type": { "py/id": 23 },
"value": ")",
"vline": 1
}, {
"children": [ {
"children": [
{
"children": { "py/tuple": [] },
"endidx": 38,
"idx": 32,
"modified": false,
"parent": { "py/id": 27 },
"py/object": "pico_tokenize.Token",
"source": { "py/id": 6 },
"type": { "py/id": 5 },
"value": "return",
"vline": 2
}, {
"children": [ {
"children": { "py/tuple": [] },
"endidx": 43,
"idx": 39,
"modified": false,
"parent": { "py/id": 30 },
"py/object": "pico_tokenize.Token",
"source": { "py/id": 6 },
"type": { "py/id": 5 },
"value": "true",
"vline": 2
} ],
"endidx": 43,
"idx": 39,
"parent": { "py/id": 27 },
"py/object": "pico_parse.Node",
"scopespec": null,
"source": { "py/id": 6 },
"token": { "py/id": 32 },
"type": { "py/object": "pico_parse.NodeType", "value": 3 },
"value": null
}
],
"endidx": 43,
"idx": 32,
"items": [ { "py/id": 30 } ],
"parent": { "py/id": 25 },
"py/object": "pico_parse.Node",
"scopespec": null,
"source": { "py/id": 6 },
"type": { "py/object": "pico_parse.NodeType", "value": 24 },
"value": null
} ],
"endidx": 43,
"idx": 32,
"parent": { "py/id": 2 },
"py/object": "pico_parse.Node",
"scopespec": { "py/tuple": [ false, [] ] },
"source": { "py/id": 6 },
"stmts": [ { "py/id": 27 } ],
"type": { "py/object": "pico_parse.NodeType", "value": 29 },
"value": null
}, {
"children": { "py/tuple": [] },
"endidx": 47,
"idx": 44,
"modified": false,
"parent": { "py/id": 2 },
"py/object": "pico_tokenize.Token",
"source": { "py/id": 6 },
"type": { "py/id": 5 },
"value": "end",
"vline": 3
}
],
"endidx": 47,
"idx": 11,
"kind": null,
"name": "funcname",
"params": [],
"parent": { "py/id": 0 },
"py/object": "pico_parse.Node",
"scopespec": { "depth": 1, "funcdepth": 1, "items": {}, "parent": { "py/id": 19 }, "py/object": "pico_parse.Scope" },
"source": { "py/id": 6 },
"target": { "py/id": 12 },
"type": { "py/object": "pico_parse.NodeType", "value": 15 },
"value": null
} ],
"endidx": 47,
"globals": { "__dict__": {}, "funcname": { "py/id": 18 }, "py/object": "utils.LazyDict" },
"idx": 11,
"parent": null,
"py/object": "pico_parse.Node",
"scopespec": { "py/tuple": [ false, [] ] },
"source": { "py/id": 6 },
"stmts": [ { "py/id": 2 } ],
"type": { "py/id": 36 },
"value": null
} |
Assuming I understood what you're asking for correctly, the idx and endidx attributes specify the range of the function's tokens within the source (e.g. node.source.text[node.idx : node.endidx]) |
OK, so, the Token with Its {
"body": { "py/id": 25 },
"children": [ "CUT" ],
"endidx": 47,
"idx": 11,
"kind": null,
"name": "funcname",
"params": [],
"parent": { "py/id": 0 },
"py/object": "pico_parse.Node",
"scopespec": { "depth": 1, "funcdepth": 1, "items": {}, "parent": { "py/id": 19 }, "py/object": "pico_parse.Scope" },
"source": { "py/id": 6 },
"target": { "py/id": 12 },
"type": { "py/object": "pico_parse.NodeType", "value": 15 },
"value": null
} which spans idx 11-47. That seems like everything from The parent of that Node is another Node which has the same idx/endidx but different properties. I think it might represent the whole source, minus any leading or trailing comments, although I'll need to experiment more to figure that out. |
Aha!
|
Yeah, node.type == NodeType.function Yeah, the parent of the function is the root node, it indeed spans the whole source minus any leading/trailing whitespace/comments |
I want to add per-function token and character counts, pre and post minification, to https://github.com/sparr/pico8lib. My initial idea was to use a script to parse the functions out of a source file, then feed them separately to
shrinko8 -c
, parse the output, prepend it as a comment to the function, then reassemble the source file. However, it occurs to me that a lot of the necessary functionality to do this better already exists in this project.My proposal is a mode for shrinko8 that does all of this directly. Parse a cart, identify all the functions, count their tokens and characters, minify them, count those tokens and characters, insert a comment into the original cart above the function (possibly before existing comments, and overwriting a previous comment from this process) and emit the annotated original cart.
I'm willing to do most of the work to make this happen, but I find the code here a bit daunting. I've made it to
process_code
where it callstokenize
thenparse
, and I can look throughNode
s underroot
, but I am not sure how to get back from aNode
to tokens thatcount_tokens
wants, or how to turn aNode
back into a cart after I've modified some of its grandchildren. Generally moving back and forth between the source, tokens, and nodes, in a way that keeps them mapped to each other for modifying one based on the other, is a mystery to me so far. I would appreciate any pointers you can share before I get started on this. Even if this isn't a welcome feature for the main repo, I'd still like to pursue it as a fork or patch so I can use it myself.The text was updated successfully, but these errors were encountered: