Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[User Story] A novice user wants to understand the precondition for using the provided AI solutions on their data #18

Open
HuangYiran opened this issue Feb 8, 2022 · 3 comments
Assignees

Comments

@HuangYiran
Copy link
Collaborator

idea: two different condition arts should be considered:

  • task
    -- if it’s one of the problem in the demo above,
    -- be sure that the features can be influenced

  • data
    -- have a target
    -- have enough data (data size)
    -- should bring the data in the right form

@HuangYiran
Copy link
Collaborator Author

Methods: provide a notebook which,

  • asks our custom to check the tasks that datafactory can deal with (with a link)
  • includes a data format datachecker. It takes the data and target name as input and outputs whether the given data is in the right format or what is the problem of the given data.

The datachecker should include following function:

  • check if the given data in the right form. For the relational dataset, the given data should contain only one csv file with: (i) the last column is the target column; (ii) all the other columns are the features; (iii) each row represents a data point and no relation between two points (two rows are iid? → exclude time series);
  • feature and label in the same row?
  • check the type of inputs (asking for customer input, when confusing, e.g., id is recognized as nummric or price is recognized as characteric)
  • check the number of different value of each feature
  • check consistency (e.g. no text in numerical feature, must be careful of ID-numbers)
  • check the correlation between features (Multicollinearity check)
  • check the label distribution (balance)

@HuangYiran
Copy link
Collaborator Author

Acceptance criterion:

  • fulfill the function mentioned above, concretly:
    -- ?
  • do it automatically
  • if error, has clear feedback and suggesion

@riedel riedel changed the title As a German SME beginner, I want to know the condition of using AI, so that I have an idea about whether I can use AI in my task or not. As a novice user, I want to understand the precondition for using the provided AI solutions on my data Feb 8, 2022
@riedel riedel changed the title As a novice user, I want to understand the precondition for using the provided AI solutions on my data [User Story] A novice user wants to understand the precondition for using the provided AI solutions on their data Feb 8, 2022
@HuangYiran HuangYiran self-assigned this Feb 9, 2022
@HuangYiran
Copy link
Collaborator Author

HuangYiran commented Feb 9, 2022

collect situation:

  • webstore: volumn prediction
  • machine sensor:

don't forget the split sign of the data - autocheck, one column only
is index column included - atuocheck, correlation between first column and range (not a good method, better get the information from user)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants