Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There should be only one general sort function #17271

Closed
nbro opened this issue Dec 25, 2024 · 11 comments
Closed

There should be only one general sort function #17271

nbro opened this issue Dec 25, 2024 · 11 comments

Comments

@nbro
Copy link

nbro commented Dec 25, 2024

Description

I find it unnecessary to have a second sort function just to sort in descending order when we could very well just have an additional parameter that controls this (e.g. sort vs rsort).

Not only this, but I claim that there should be just one general sort function, which allows us

  • to provide a comparison function (i.e. usort, ursort, etc. are unnecessary)
  • allows us to control whether we should maintain the indices/keys or not
  • should allow us to sort by key or value (or both)
  • allows us to sort using the natural algorithm (actually, if we pass SORT_NATURAL to sort it already should be equivalent to natsort, which is thus unnecessary and imo should be deprecated)

In other words, I feel that sort could be extended to actually be able to do anything that all other sort functions do.

The only functions we might still want to keep are array_multisort() and shuffle(), but I'd also say that why shouldn't the general sort function also be able to sort multi-dimensional arrays?

By separating the functions, like we have now, we probably can gain a marginal speed-up, because we avoid some conditional branches, but I'm also not saying that we should maybe completely remove the standalone functions, maybe just eventually deprecated them. The problem is that we need to memorise a lot of functions. You could argue that, if we merge the functions, the general sort function would also become more complicated, but I still think it's better that we just check the documentation of a single function than looking for the documentation of multiple simpler sort functions.

I'd also recommend that there should be a second function, like sortn or sorted, which doesn't sort in place, similar to Python's sorted vs sort. I think something like this was already proposed or at least I think I found an RFC that talked about how the return values of these sort functions are a bit useless. I am not saying that if you sort in-place you should actually return the array (so I am not sure I agree with that RFC), because that's unnecessary, but that there should be a new sort function that sorts not-in-place and actually returns a new array without touching the original (e.g. it uses merge sort instead of quick sort).

This should not be difficult to implement because we should just need to adapt the single functions and put their code into the general one or something like that. The only concern is that we need to make sure we don't break any code, but we should be able to do that easily with reasonable default values for the new additional parameters, such that the current default sort behaviour remains the same.

@Girgias
Copy link
Member

Girgias commented Dec 25, 2024

I am struggling to see the value added here to be honest. Internally, those are already implemented as "one" underlying function.

And frankly I think it is better to have different functions rather than stuffing everything into a single function with a "what behaviour do I want" parameter. This also makes using different sort versions as higher order functions harder, as you would need to create a partially applied function (via a closure) just to get the same benefits of the current situation.

And that's without considering all the busy work this would do to people to upgrade for a marginal benefit.

In any case, if you want to pursue this, this needs discussion on the internals mailing list and an RFC.

@nbro
Copy link
Author

nbro commented Dec 25, 2024

More concisely, here are the benefits again

  • fewer functions to memorise, so newcomers or even experienced programmers don't have to deal with 13 sorting functions (which is a ridiculous number of functions), plus all other custom sorting functions that do not sort in-place, which sometimes are required, which users may need to take from somewhere or implement themselves
  • no other programming language (that I know of) provides so many sorting functions (which is the actual reasonable behaviour). For example, Python provides sort (in-place) and sorted (not-in-place), which accept a comparison callback and sorted also accepts a parameter just to reverse (which is what makes sense)
  • no need to switch functions every time just to sort in descending order

@iluuu1994
Copy link
Member

I'm not sure there's a point in discussing a theoretical scenario. These functions have been around for decades and removing them now would be a massive breaking change. Furthermore, sorting is often performed on large datasets where performance is critical, and forcing a single, simple signature is not feasible if it requires a function call for each comparison. You might make an argument for a separate, new search function, but that just makes the fragmentation problem you describe worse. https://xkcd.com/927/ If you still think that would be of benefit, please start a discussion on the internals mailing list.

@iluuu1994 iluuu1994 closed this as not planned Won't fix, can't repro, duplicate, stale Dec 27, 2024
@nbro
Copy link
Author

nbro commented Dec 27, 2024

@iluuu1994

I'm not sure there's a point in discussing a theoretical scenario.

What theoretical scenario? I opened this issue because of practical considerations. I've been working with PHP and I find it annoying and ridicolous that there are 12 sorting functions and shuffle and none of them actually can return a new sorted array without modifying the original one.

These functions have been around for decades and removing them now would be a massive breaking change

Who said we should remove the existing functions?

Furthermore, sorting is often performed on large datasets

What evidence do you have for this?

forcing a single, simple signature is not feasible if it requires a function call for each comparison

Can you elaborate? Which code scenarios would become problematic? The new sort function or the extended one would become more complicated and it should support a function call, but this is optional and there should be a reasonable default value for all parameters, so you should not need to change any of the existing functions calls. Moreover, some sort functions already accept additional parameters, so I really don't see your point.

@devnexen
Copy link
Member

Please nbro, discuss it over in the mailing list, this ticket had been closed.

@Girgias
Copy link
Member

Girgias commented Dec 27, 2024

@iluuu1994

I'm not sure there's a point in discussing a theoretical scenario.

What theoretical scenario? I opened this issue because of practical considerations. I've been working with PHP and I find it annoying and ridicolous that there are 12 sorting functions and shuffle and none of them actually can return a new sorted array without modifying the original one.

This case is being handled by an RFC I am pushing: https://wiki.php.net/rfc/array-sort-return-array
And there is no need to introduce a new functions, you would be able to just use the existing functions in a more functional way.

@nbro
Copy link
Author

nbro commented Dec 27, 2024

@Girgias Your proposal seems to be slightly different. You're proposing just to return a sorted copy of the array for the purposes of functional programming, but sorting is still performed in-place, so I suppose that also means that original array is also modified, unless you're proposing to first make copy of the array passed by reference and then sort that copy instead, which would effectively change the behaviour of these functions.

I think that, if functions sort in-place, there's no need to return a new array - I think that would just make things more conceptually complicated.

However, I think that these functions could actually be changed to allow us to sort not-in-place and, in that case, return a new sorted array, but I actually think the Python's approach to have a second sorted function would be better because we might want to use a different algorithm for sorting not-in-place, like merge sort.

For me it's so clear that the behaviour of these 12 sorting functions should be merged. How are other programming languages able to sort anything with only 1-2 sorting functions?!

@thg2k
Copy link
Contributor

thg2k commented Dec 27, 2024

@nbro What's the problem with implementing your own function in userspace? The sorting can still happen with the native function so you don't incur any significant penalty:

function copy_sort($a) {
  $copy = $a;
  sort($copy);
  return $copy;
}

Same for a function unifying all the other cases.

@nbro
Copy link
Author

nbro commented Dec 27, 2024

@thg2k PHP offers 12 sorting functions, but I still need to implement a new one just to sort not-in-place?

The whole point of this issue is to try to simplify PHP and make programming easier in PHP. Everyone knows that having 12 sorting functions is ridiculous and completely unnecessary, but some people here claim it makes sense (even though they actually know it doesn't make sense at all) and maybe in the future they will propose the RFC for this and will claim it's a good idea and its their own original idea to simplify this mess.

I've programmed in many other programming languages. PHP is the only one that I can think of that has so many sorting functions.

@iluuu1994
Copy link
Member

Who said we should remove the existing functions?

You did, right here:

There should be only one general sort function

As I mentioned, we cannot simply remove these functions, because people depend on them. Adding a new one can be discussed, but does this help your cause? Then we just have 14 instead of 12. This is also not the right place to discuss design, and given it's a controversial issue (demonstrated by the responses here) you'll need an RFC to add it.

maybe in the future they will propose the RFC for this and will claim it's a good idea and its their own original idea to simplify this mess.

Nobody is interested in stealing your credit for coming up with this idea. That's why we're asking: If you think this is a good idea, propose it on the mailing list. You'll get a feel about what people think and can then decide whether you want to propose an RFC for it (which is something everybody can do). Or maybe somebody else will see your e-mail and decide to do it.

@nbro
Copy link
Author

nbro commented Dec 27, 2024

@iluuu1994 I said there should be only one sort function but I did not say we should remove the existing ones. Don't try to twist it in your favour now. Again, you're pressing on the wrong button here by insisting that I'm saying we should remove them. I'll quote my first post to make it super clear

By separating the functions, like we have now, we probably can gain a marginal speed-up, because we avoid some conditional branches, but I'm also not saying that we should maybe completely remove the standalone functions, maybe just eventually deprecated them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants