Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fairer array sorting algorithm #1418

Merged
merged 2 commits into from
Nov 25, 2024
Merged

fairer array sorting algorithm #1418

merged 2 commits into from
Nov 25, 2024

Conversation

rBangay
Copy link
Contributor

@rBangay rBangay commented Nov 20, 2024

What does this PR change?

Replace the existing shuffle array sort function with an algorithm that fairly randomises the contents. The new approach is based on the fisher-yates algothrithm (https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle) a nice example of which you can see here: Fisher–Yates Shuffle

Current situation/background

@johnduffell rightly pointed out in a previous pr (#1415 (comment)) that the randomisation of the output array wasn't all that random;

results from his testing:

Object.entries([...Array(1000000).keys().map(() => [1,2,3,4,5,6,7,8,9,10].sort( () => .5 - Math.random() )[0])].sort().reduce((acc, num) => {const e = {...acc};e[num]=acc[num] + 1; return e;}, {1:0,2:0,3:0,4:0,5:0,6:0,7:0,8:0,9:0,10:0})).map(([num, count]) => `${num}: ${count/10000}%`).map((a) => console.log(a))
1: 19.5461%
2: 7.2563%
3: 9.6343%
4: 12.8512%
5: 8.3207%
6: 8.9094%
7: 9.9178%
8: 11.1279%
9: 6.0317%
10: 6.4046%

It shows a weighting more specifically to the first element in the array (ideally these figures ought to be as close to 10% each in order to show a fair distribution)

and with the new Fisher–Yates Shuffle algorithm we can see the results much more evenly distributed:

1: 9.9887%
2: 10.0008%
3: 9.9885%
4: 10.0155%
5: 10.0069%
6: 9.9565%
7: 9.9425%
8: 9.9936%
9: 10.0209%
10: 10.0861%

@rBangay rBangay requested a review from a team November 20, 2024 17:58
@johnduffell
Copy link
Member

nice one! good bit of algorithms. why not add a unit test to it to round it off?

let currentIndex = array.length;

// While there remain elements to shuffle...
while (currentIndex != 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as the last step, this will always swap the first element with itself. Which is not in itself a problem as it's a no-op, but it was interesting.

Copy link
Member

@johnduffell johnduffell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work, big improvement 👏

@rBangay
Copy link
Contributor Author

rBangay commented Nov 21, 2024

nice one! good bit of algorithms. why not add a unit test to it to round it off?

I have added a unit test, but it was tricky to know how to properly test this. I didn't want to add a test that ran too many cycles to find whether the spread was even or not so in the end I just wrote a test that calls the function 100 times and checks that the output array order is not the same as the input array order .... the input array size was 10, I guess in theory the test could fail if by chance the shuffle resulted in an array with the elements in the same order but I fugured that was incredibly unlikely enough to make this test valid.

@johnduffell
Copy link
Member

nice one! good bit of algorithms. why not add a unit test to it to round it off?

I have added a unit test, but it was tricky to know how to properly test this. I didn't want to add a test that ran too many cycles to find whether the spread was even or not so in the end I just wrote a test that calls the function 100 times and checks that the output array order is not the same as the input array order .... the input array size was 10, I guess in theory the test could fail if by chance the shuffle resulted in an array with the elements in the same order but I fugured that was incredibly unlikely enough to make this test valid.

probably not worth going around the houses too much but if we're running it 100 times we should expect each number should be in each position at least a few times

Copy link
Member

@johnduffell johnduffell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work 👏

@rBangay rBangay merged commit 809f37e into main Nov 25, 2024
13 checks passed
@rBangay rBangay deleted the fairer-array-shuffle branch November 25, 2024 10:33
@prout-bot
Copy link
Collaborator

Seen on PROD (merged by @rBangay 12 minutes and 40 seconds ago) Please check your changes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants