Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summary - Address feedback to improve the project #48

Open
21 tasks done
jessiezhang24 opened this issue Feb 2, 2025 · 1 comment
Open
21 tasks done

Summary - Address feedback to improve the project #48

jessiezhang24 opened this issue Feb 2, 2025 · 1 comment
Milestone

Comments

@jessiezhang24
Copy link
Collaborator

jessiezhang24 commented Feb 2, 2025

Feedbacks from Milestone 1

  • missing_values_summary seems to return only count and percentage which can be easily a tuple or a Series at most.
    • updated returns to series and added more edge-case test: 905315e a039574
  • There are cases of functions that have very obscure names and/or arguments that make it difficult to understand their purpose
  • One or more functions are only partially understood from reading their documentation. Documentation of the input object is not specific enough for the user to understand what it can be used on
  • get_summary_statistics does not tell the user what actually returns. The use of etc in description is not effective.
    • added actual return content and updated docstring for get_summary_statistics: cc2dafc
  • missing_values_summary is missing majority of what it does. Examples show very little to what to expect
    • added more examples to missing_values_summary function: 9187c8d, cf1cca9

Feedback from Milestone 2

  • data_summary_statistics function is missing example
  • check_csv lacks error explanation when raised
  • check_missing_value_summary need to add more edge-case for test
    • added more edge-case for check_missing_value_summary: a039574

Feedback from Peer Review - Forgive Agbesi

  • Kindly review and correct typos. EG: pyead provided to users a practical toolkit for data preprocessing and exploration, enabling users to work more effectively with csv datasets in various projects. in the README file
  • You have only 1 badge out of the required badges
  • The documentation looks good but if you can add a link to the documentation then it will be easy to access
    • added link to documentation: 8870d3a
  • If an email can be added to the code of conduct as means of reaching the team will be great
    • added email to code of conduct: 6a8f6e3

Feedback from Peer Review - Zhiwei Zhang

  • Currently, the code checks if a file is in CSV format but doesn’t handle other potential issues, such as file access permissions or data inconsistencies. Implementing comprehensive error handling would make the package more robust.
  • Add the readthedocs url in the about section in repo to make it easier to access.
    • added link to documentation: 8870d3a
  • If the numerical results from the get_summary_statistics function could be rounded to a consistent number of decimal places, it would make the output more standardized and easier to read.
    • added default value of 3 to decimal place in get_summary_statistics(): 449c218
  • Allow users to customize key parameters (like customize how many decimals to keep in the results) through configuration files or command-line arguments. This enhances the tool’s flexibility for various use cases.
    • added decimal place as an argument to get_summary_statistics(): 449c218
  • It would be great if the function results included more explanations about the output. For example, the missing_values_summary function could add a sentence summarizing which column has how many missing values and what percentage that represents, rather than just presenting a simple table. This functionality can also be achieved by adding optional parameters to the function. Please check the comment below for this suggestions.

Feedback from Peer Review - Derek Rodgers

  • I couldn't find it stated clearly anywhere which version(s) of Python are officially supported
    • added supported Python version badge: 8870d3a
  • The summary statistics returned by get_summary_statistics() should probably have fewer significant digits (round to 2 or 3 decimal places instead)
    • added decimal place argument to get_summary_statistics(): 449c218
  • There are some minor typos in the README, e.g. "for in_depth research" (should be "in-depth"). It would be a good idea to go through all of the documentation carefully
    • fixed typos and grammar errors in README: c6dcb6d
  • There are likewise also some grammar issues. For example, "pyead provided to users" should be "pyead provides to users"
    • fixed typos and grammar errors in README: 8870d3a
@ZhengHe-007
Copy link
Collaborator

ZhengHe-007 commented Feb 3, 2025

For review "It would be great if the function results included more explanations about the output. For example, the missing_values_summary function could add a sentence summarizing which column has how many missing values and what percentage that represents, rather than just presenting a simple table. This functionality can also be achieved by adding optional parameters to the function." Thank you for the suggestion! Our goal for missing_values_summary is to provide a structured output that is easy to interpret and use programmatically. We believe that keeping this function concise allows for better integration into data pipelines where a Pandas Series format is preferable. If there's a need for a more detailed explanation, we can consider adding a separate function for that, but we want to keep missing_values_summary simple and focused.

@MCatherine1994 MCatherine1994 moved this to In Progress in group 31 pyeda project Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

2 participants