Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntax improvements #57

Open
joeyhub opened this issue Oct 1, 2015 · 7 comments
Open

Syntax improvements #57

joeyhub opened this issue Oct 1, 2015 · 7 comments

Comments

@joeyhub
Copy link

joeyhub commented Oct 1, 2015

I have a few issues with the syntax parsing in this case. It uses it's own parser so I would expect it to be better. Expressions should just work.

I have a weird little bit of code like this:

                let obj->{arr["prop"]}[arr["key"]] = [
                    "a": arr["key"], "b": []
                ];

This fails to parse on several levels but should parse.

A secondary issue is whether or not assigning the array at that property to a variable will create a cow array or array by reference. When you need to create large structures COW arrays are extremely inefficient. I would hope the framework would provide arrays by reference by default and then a copy method for when a copy is needed. I have tested and it is COW, which is a major drawback. There is no benefit to it being COW in PHP, they just made it that way in the beginning. Importing this quirk from PHP is a major design flaw in zephir that defeats its purpose in many cases. I have worked with passing PHP arrays by pointer/reference in C extensions by hand and there is no problem with it. zvals are pointers already so it's really easy:

I don't think it is a good idea to make it too safe or managed like PHP CPP because we tested PHP CPP and half the time it gave no gains or so little gains that PHP code optimisation or using another scripting language would have done the trick. Because I can't do what I am doing in PHP in zephir my PHP code is more memory efficient by a factor of 10. zephir is useless to me until this is fixed. The point of passing around arrays in an object collection is to get around the limitations of PHP so without supporting dereferencing there PHP is faster than zephir.

At the very least I should be able to tell it that I want a pointer to the array in such and such a location rather than a copy.

var also does not support complex expressions. You need...

 var x;
 let x = this->something();

var appears to only allow literal/defalt assignment.

I cannot create a dynamic class. In PHP you can do new $classname();. I would expect new (classname)() to work but it does not.

In a switch statement I cannot declare a local variable.

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@steffengy
Copy link

`var appears to only allow literal/defalt assignment``
should be fixed in the latest master. example

  • Regarding your (nested) syntax error:
    I also think it's annoying, but it's anything but trivial to resolve, so I don't think that will be fixed soon.

I would hope the framework would provide arrays by reference by default and then a copy method for when a copy is needed

  • That's how it currently is (atleast when you replace reference by pointer). When you have a zval which isn't used anywhere (and therefore safe, refcount=1) and assign it to the property, the zval won't be copied. It's still possible that the zend engine internally (not mentioning differences between ZE2 and ZE3 here, since ZE3 copies generally more) uses an intermediate step before actually writing into the property bucket, I currently am unsure about that, anybody feel free to correct me if I missed something here.
    Assuming it's copied you'll ofcourse lose some performance (even though it's no deep-copy and only a copy of the zval structure it sums up but shouldn't be a bottleneck in general zephir use-cases).

    If you really need any tiny bit of performance, zephir is currently really the wrong tool for you.

    (The reason that arrays cannot be passed as php-reference type currently is also quite simple: References are simply not implemented yet #609 )

TLDR: Also not trivial to fix

I cannot create a dynamic class. In PHP you can do new $classname();. I would expect new (classname)() to work but it does not.

  • Yes you can: new {class}(); or create_instance(className);

`In a switch statement I cannot declare a local variable.``

  • Zephir currently does not support any local variables, they all refer back to the
    start of the function body (also not trivial to fix, #1099 )

If I didn't answer a part of your question please let me know and ofcourse
PRs regarding the mentioned issues are always welcome.

Generally consider that zephir still is alpha, so it shouldn't
be used in production for anything that depends on extreme performance
(which honestly shouldn't be using PHP in any way) or in any area
which requires total stability (I don't think (and hope) anybody would use it in for example medical equipment)

@joeyhub
Copy link
Author

joeyhub commented Oct 1, 2015

Thanks, these answers help.

depends on extreme performance

If that is not what zephir is for then its purpose is a mystery :D.

so it shouldn't be used in production

I agree. I'm using this to create an initial extension that I will then be checked and modified by hand. I'm a bit worried when I see all the zephir* functions/macros. I've tried PHP-CPP for this and phpc but with out success for various reasons. Basically, I can write PHP extensions but it is much much slower than writing in PHP especially when the PHP to be converted is particularly complex. Having something that lays down an initial codebase that can be optimised/tuned is the best option for me. However, as proof on concept first it would be nice if it could actually parse.

medical equipment

Anything PHP related with anything safety critical is a laugh.

I'm not really thinking of the standard PHP references mechanism but how the array variables are handled internally in the traditional C style (actually leveraging C pointers transparently to the higher level code). Perhaps that is more complex because it requires a fundamental rethink of the library.

At the very least if zephir could support array access in the same way as PHP that would at least keep it on par. Minor annoyances aside it looks like everything else works fine with zephir except this case which is a blocker and makes porting code no longer viable.

                // Wont parse in zephir:
                let obj->{arr["prop"]}[arr["key"]] = ["a": arr["key"], "b": []];
                // Ok in PHP:
                $obj->{$arr["prop"]}[$arr["key"]] = ["a" => $arr["key"], "b" => []];

If you point me at where that is handled in the code maybe I can look at it.

The way PHP manages arrays by default is simply flawed. They are passed by reference for read but copied on a write. This isn't about microoptimisation but about when you have a task that needs to do heavy and complex work with arrays. Those tasks are not invalid. The issue is that PHP is lopsided in it's underlying implementation. Passing arrays as COW references is a legacy hangup. COW is great when you actually want to make a copy to be updated somewhere that will be JIT copied when it reaches its destination or will cost little if it turns out down the line you didn't want to make a modified copy. I think nearly all major languages pass arrays by true reference except ones that are truly functional for concurrency (PHP is very far departed from that). This is not an issue of doing something not supposed to be done but a poor internal implementation of arrays.

Even if the framework can't be switched to not use COW by default it should still behave as PHP does. The traditional work around is the odd object wrapper. This way you can modify an array as such: $ref->arr["key"]["key"] = 123; and of course I have a C extension that can update a dynamic set of keys.

It's not favourable to use objects everywhere because objects add a layer of indirection and other weird qualities. Basically the same problems with Javascript where objects and dictionaries are one and the same. A bit worse though because PHP arrays are already trying to be several things at once without also making them objects.

Here is an example of what I would personally do in my own extensions if I needed a variable to an array to work with...

 zval *a = NULL, *_0 = NULL, *b = NULL;

 ZEPHIR_MM_GROW();
 ZEPHIR_INIT_VAR(a);
 array_init(a);

 ZEPHIR_INIT_NVAR(a);
 zephir_create_array(a, 1, 0 TSRMLS_CC);
 ZEPHIR_INIT_VAR(_0);
 ZVAL_LONG(_0, 123);
 zephir_array_fast_append(a, _0);
 >>>
 ZEPHIR_CPY_WRT(b, a);
 ===
 b = a;
 <<<
 ZEPHIR_INIT_NVAR(_0);
 ZVAL_LONG(_0, 321);
 zephir_array_update_long(&b, 0, &_0, PH_COPY | PH_SEPARATE ZEPHIR_DEBUG_PARAMS_DUMMY);

Example of speed improvement with dereferencing:

 <?php

 function a($ref) {
           $ref->arr[0] = 123;
 }

 function b($arr) {
           $arr[0] = 123;
           return $arr;
 }

 function test($a, $name)
 {
           $ref = (object)[];
           $start = microtime(true);

           for($i = 0; $i < 1000000; $i++)
                     a($ref);

           $took = (microtime(true) - $start);
           echo "A took ".$took." {$name}s\n";
           $a = $took;

           $arr = [];
           $start = microtime(true);

           for($i = 0; $i < 1000000; $i++)
                     $arr = b($arr);

           $took = (microtime(true) - $start);
           echo "B took ".$took." {$name}s\n";
           $b = $took;

           echo "A is ".($b / $a)." times faster than B.\n";
 }


 test([], "empty");
 test(range(1,10), "ten items");
 test(range(1,200), "200 items");
 test(["keya" => 10, "keyb" => ["a"=>"x","b"=>"y"],"keyc"=>5], "nested items");

A took 0.23867607116699s empty
B took 0.40566420555115s empty
A is 1.699643385144 times faster than B.
A took 0.24967288970947s ten items
B took 0.56084012985229s ten items
A is 2.2462996703603 times faster than B.
A took 0.24555611610413s twenty items
B took 0.45346784591675s twenty items
A is 1.8466974193567 times faster than B.
A took 0.31718611717224s 200 items
B took 0.5273699760437s 200 items
A is 1.6626515080334 times faster than B.
A took 0.23305010795593s nested items
B took 0.36198592185974s nested items
A is 1.5532536115718 times faster than B.

A is faster despite the extra cost of resolving the property 'arr'.

This gets a lot hairier when your structure parsing is much more complex, memory usage and when you consider the COW imposes some limitations on your data traversal pattern/program flow. The copy also will be deep when your structure has nested items that you might modify.

@steffengy
Copy link

// Wont parse in zephir: let obj->{arr["prop"]}[arr["key"]] = ["a": arr["key"], "b": []];

  • anything nested even obj->a->b won't work, yes, that's unfortunately a limitation of the current implementation (probably done to keep code generation simpler, can't say why though, was before I even knew this project)
    Zephir generally is a more simple version of PHP, which unfortunately also has a impact on flexibility.

The way PHP manages arrays by default is simply flawed.

  • The problem with that point is that's not something zephir can change and also has a impact on zephir's implementation. Zephir generally uses the same structures as PHP and therefore also tries to imitate the exact behavior, in theory - except something else makes more sense/another design decision was made and is feasible to do, which in this case is very tricky. This just affects a big part of the code, which is why normally such invasive changes only happen directly after the first design decision was made. Perhaps that is more complex because it requires a fundamental rethink of the library. is pretty close to reality, but I don't think how generally you can fix all limitations of zephir currently without a rewrite, which won't happen in any near future.
    If somebody was to change that behavior, that at this point would also break backwards compatibility.
    It might help to get some benchmarks on zephir arrays since I also think they might be slower than they could be (also a point where PHP7 is faster).

Generally also stuff like var g = []; let c = g["a"].g["a"].g["b"]; is slower than var g = []; var a, b; let a = g["a"]; let b = g["b"]; let c = a.b; because zephir unfortunately doesn't optimize stuff like that, so it also might be that your implementation isn't exactly orienting on how zephir works.

 >>>
 ZEPHIR_CPY_WRT(b, a);
 ===
 b = a;
 <<<

Have you looked at the declaration of ZEPHIR_CPY_WRT?

#define ZEPHIR_CPY_WRT(d, v) \
    Z_ADDREF_P(v); \
    if (d) { \
        if (Z_REFCOUNT_P(d) > 0) { \
            zephir_ptr_dtor(&d); \
        } \
    } else { \
        zephir_memory_observe(&d TSRMLS_CC); \
    } \
    d = v;

So I don't quite get your zephir example, It might help, if you could add an example, that compiles, which shows the slow-down in comparison to PHP, where I might be able to help you more.
Generally having a browsergame implemented in zephir, I'm able to reach load times of 5ms locally, which is nowhere what you'd be able to achieve using PHP (with the same features).

TLDR: Changing that behavior isn't feasible afaik, bad design decisions are hard to get rid of, probably won't happen soon, if somebody doesn't have ideas or a better implementation

@joeyhub
Copy link
Author

joeyhub commented Oct 1, 2015

Legacy is always a pain. As far as I understand it a recursive grammar should make that easy to handle.

Off the top of my head (huge simplification/pseudo):
returner $IDENTIFIER
returner ${returner}
returner returner->IDENTIFIER
returner returner->{returner}
returner returner(returner_list)
returner returner returner[returner_or_nothing]
returner_or_nothing
returner_or_nothing returner
returner_list returner
returner_list returner , returner_list

Perhaps it is related to PHP having less support for dereferencing previously (being limited previously) or because of the difficulty of working with that parser. When I look at the parsing it gives me a headache. I'm not sure at this stage what generates what.

I tested it and it is definitely copying, however I never checked the code. It is the PH_SEPARATE flag that is doing it on update long.

I could make it dereference the array manually then edit the code so that it treats that variable as I want it, just a basic temporary zval pointer. It is a shame it can't work straight off.

@andresgutierrez
Copy link

@joeyhub You might want to take a look at this: zephir-lang/zephir#609

@joeyhub
Copy link
Author

joeyhub commented Oct 2, 2015

I will take a look and give some comments or suggestions if I find the time.

I think this issue needs more work in the future particularly in the design. I don't think it is enough to just add support for pass by reference on top. It looks like zephir implements it's own crude COW which doesn't always work. I don't recommend implementing COW on top of things and hooking every array function. It seems to make things even more complex, for example if you use array_shift it wont COW because it hasn't hooked that function, unless there is something I am missing you'll have to coerce the array into a copy first (let a[0] = a[0];). It is quite unpredictable there.

In my first test the zephir code is up to 2 times slower than PHP even though it is doing a huge amount of data processing which means there is little overhead crossing the PHP/C boundary. I'm seeing a possible hint of a small memory leak as well. On the other hand I have at least found a way to turn a huge PHP algorithm into C code that I can manage rather than having to write it from scratch. It wont be hard to get that to go from 2 times slower to > 10 times faster at all. The stability of the code produced is a bigger concern.

I get a segfault if I have:

 private value;
 let this->value = 0;
 this->value += 1;
 this->value -= 1;

At least here we can give the heads up that using php functions that change arrays by reference will change your passed array by reference even if passed indirectly through parameters.

@joeyhub
Copy link
Author

joeyhub commented Oct 5, 2015

The good news is that when I simply remove PH_SEPARATE it appears to behave identically to the PHP equivalent. I can't see any memory leaks so the memory management handles it. It is twice as fast as PHP like this and uses half as much memory. It should be functionally the same so I am a bit surprised about the lower memory usage. Although zephir is adding a lot of overhead it must be putting some of it to good use. Now it already compares to the expected performance improvement from using hhvm or php7 (possibly javascript as well which can be up to 10 times faster but I've seen it only 2 times faster for many tasks like this) and I haven't optimised it in anyway I hadn't optimised the original PHP where I had already coerced it not to COW for that one case. I have made it do everything exactly as my PHP version does it. -O2 might have also helped.

If I carefully switch it so that it doesn't separate everywhere unless necessary I anticipate it should be possible to reach the well over > 10 times performance mark which I am aiming for. This is something I can't do in PHP without using anonymous objects all the time which is undesirable.

Raw PHP-C could be around 100 times faster in the first version but would take 50 times longer to produce with ten times as many bugs that would be very hard to detect or fix.

It takes a bit of effort but zephir pulls through. Two days is still much better than two weeks.

I think it would be worth considering to pass arrays by reference the same as Javascript. This demonstrably yields massive performance gains in scripts that need to work heavily at mutating arrays.

Although not always concise this gives most of what you would want from the pass by reference feature:

 t = a[k1][k2];

 for i, v in t
     t[i] = v * v;

Perhaps before too many people adopt the language.

@sergeyklay sergeyklay transferred this issue from zephir-lang/zephir Dec 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants