You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
4. If *Apply automatically* is ticked, changes will be communicated automatically. Alternatively, click *Apply*.
29
28
30
-
Example
31
-
-------
29
+
*Output variable name*: Set the name of the computed attribute.
30
+
3. If *Apply automatically* is ticked, changes will be communicated automatically. Alternatively, click *Apply*.
31
+
32
+
Examples
33
+
--------
32
34
33
35
We will use iris data from the [File](../data/file.md) widget for this example and connect it to **Aggregate Columns**.
34
36
35
-
Say we wish to compute a sum of *sepal_length* and *sepal_width* attributes. We select the two attributes from the list.
37
+
Say we wish to compute a sum of *sepal_length* and *sepal_width* attributes. We select the two attributes from the list and apply the sum aggregator.
38
+
39
+
We can observe the output in a [Data Table](../data/datatable.md).
40
+
41
+

42
+
43
+
It is possible to pass the list of features to Aggregate Columns via its *Features* input. Say we find a pair of informative features in a [Scatter Plot](../visualize/scatterplot.md) and we would like to compute their sum.
44
+
45
+
Beside the *Data* input, we also pass the *Features* input from the Scatter Plot to Aggregate Columns. The widget will, by default, use the input signal for variable selection. We retain the sum operation and change the name of the output variable to *petal sum*.
Copy file name to clipboardExpand all lines: source/widgets/data/applydomain.md
+5-8Lines changed: 5 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,24 +14,21 @@ Given dataset and template transforms the dataset.
14
14
15
15
**Apply Domain** maps new data into a transformed space. For example, if we transform some data with PCA and wish to observe new data in the same space, we can use Apply Domain to map the new data into the PCA space created from the original data.
16
16
17
-

17
+
{width=70%}
18
18
19
19
The widget receives a dataset and a template dataset used to transform the dataset.
20
20
21
-
Side note
22
-
--------
23
-
24
-
Domain transformation works by using information from the template data. For example, for PCA, Components are not enough. Transformation requires information on the center of each column, variance (if the data is normalized), and if and how the data was preprocessed (continuized, imputed, etc.).
21
+
**Side note.** Domain transformation works by using information from the template data. For example, for PCA, Components are not enough. Transformation requires information on the center of each column, variance (if the data is normalized), and if and how the data was preprocessed (continuized, imputed, etc.).
25
22
26
23
Example
27
24
-------
28
25
29
-
We will use iris data from the [File](../data/file.md) widget for this example. To create two separate data sets, we will use [Select Rows](../data/selectrows.md) and set the condition to *iris is one of iris-setosa, iris-versicolor*. This will output a data set with a 100 rows, half of them belonging to iris-setosa class and the other half to iris-versicolor.
26
+
We will use *iris* data from the [File](../data/file.md) widget for this example. To create two separate data sets, we will use [Select Rows](../data/selectrows.md) and set the condition to *iris is one of iris-setosa, iris-versicolor*. This will output a data set with a 100 rows, half of them belonging to iris-setosa class and the other half to iris-versicolor.
30
27
31
-
We will transform the data with [PCA](../unsupervised/PCA.md) and select the first two components, which explain 96% of variance. Now, we would like to apply the same preprocessing on the 'new' data, that is the remaining 50 iris virginicas. Send the unused data from **Select Rows** to **Apply Domain**. Make sure to use the *Unmatched Data*output from **Select Rows** widget. Then add the *Transformed data* output from **PCA**.
28
+
We will transform the data with [PCA](../unsupervised/PCA.md) and select the first two components, which explain 96% of variance. Now, we would like to apply the same preprocessing on the 'new' data, that is the remaining 50 iris virginicas. Send the unused data from **Select Rows** to **Apply Domain**. Make sure to use the *Unmatched Data* from **Select Rows** widget and connect it to Apply Domain as *Data*. Then connect the *Transformed Data* as *Template Data* output from **PCA**.
32
29
33
30
**Apply Domain** will apply the preprocessor to the new data and output it. To add the new data to the old data, use [Concatenate](../data/concatenate.md). Use *Transformed Data* output from **PCA** as *Primary Data* and *Transformed Data* from **Apply Domain** as *Additional Data*.
34
31
35
-
Observe the results in a [Data Table](../data/datatable.md) or in a [Scatter Plot](../visualize/scatterplot.md) to see the new data in relation to the old one.
32
+
Observe the results in a [Scatter Plot](../visualize/scatterplot.md) to see how the new data can be plotted next to the old data.
The **Continuize** widget receives a data set in the input and outputs the same data set in which some or all categorical variables are replaced with continuous ones and numeric variables are scaled.
14
+
The **Continuize** widget receives a data set in the input and outputs the same data set in which some or all categorical variables are replaced with numeric (continuous) ones and numeric variables are scaled.
15
15
16
16

17
17
18
-
1. Select a categorical attribute to define its specific treatmen, or click the "Deafult" option above to set the default treatment for all categorical attributes without specific settings.
19
-
20
-
Multiple attributes can be chosen.
21
-
18
+
1. Select a categorical attribute to define its specific treatment, or click the "Use default setting" option above to set the default treatment for all categorical attributes without specific settings.
19
+
Multiple attributes can be chosen.
22
20
2. Define the treatment of categorical variables.
23
-
24
-
Examples in this section will assume that we have a categorical attribute *status* with values *low*, *middle* and *high*, listed in that order. Options for their transformation are:
25
-
21
+
Examples in this section will assume that we have a categorical attribute *status* with values *low*, *middle* and *high*, listed in that order. Options for their transformation are:
26
22
-**Use default setting**: use the default treatment.
27
-
28
23
-**Leave categorical**: leave the attribute as it is.
29
-
30
24
-**First value as base**: a N-valued categorical variable will be transformed into N-1 numeric variables, each serving as an indicator for one of the original values except for the base value. The base value is the first value in the list. By default, the values are ordered alphabetically; their order can be changed in [Edit Domain](../data/editdomain).
31
-
32
-
In the above case, the three-valued variable *status* is transformed into two numeric variables, *status=middle* with values 0 or 1 indicating whether the original variable had value *middle* on a particular example, and similarly, *status=high*.
33
-
25
+
In the above case, the three-valued variable *status* is transformed into two numeric variables, *status=middle* with values 0 or 1 indicating whether the original variable had value *middle* on a particular example, and similarly, *status=high*. *Status=low* is indicated by both values being 0.
34
26
-**Most frequent value as base**: similar to the above, except that the most frequent value is used as a base. So, if the most frequent value in the above example is *middle*, then *middle* is considered as the base and the two newly constructed variables are *status=low* and *status=high*.
35
-
36
-
-**One-hot encoding**: this option constructs one numeric variable per each value of the original variable. In the above case, we would get variables *status=low*, *status=middle* and *status=high*.
37
-
38
-
-**Remove if more than 3 values**: removes non-binary categorical variables from the data.
39
-
27
+
-**One-hot encoding**: this option constructs one numeric variable per each value of the original variable. In the above case, we would get three variables, *status=low*, *status=middle* and *status=high*.
28
+
-**Remove if more than 2 values**: removes non-binary categorical variables from the data.
40
29
-**Remove**: removes the attribute.
41
-
42
-
-**Treat as ordinal**: converts the variable into a single numeric variable enumerating the original values. In the above case, the new variable would have the value of 0 for *low*, 1 for *middle* and 2 for *high*. Again note that the order of values can be set in [Edit Domain](../data/editdomain).
43
-
30
+
-**Treat as ordinal**: converts the variable into a single numeric variable enumerating the original values. In the above case, the new variable would have the value of 0 for *low*, 1 for *middle* and 2 for *high*. Again note that the order of values can be set in [Edit Domain](../data/editdomain).
44
31
-**Treat as normalized ordinal**: same as above, except that values are normalized into range 0-1. In our example, the values of the new variable would be 0, 0.5 and 1.
45
-
46
-
3. Select attributes to set individual treatments or click "Default" to set the default treatment for numeric attributes.
47
-
32
+
3. Select attributes for individual treatment or click "Default" to set the default treatment for numeric attributes.
48
33
4. Define the treatment of numeric attributes.
49
-
50
-
-**Use default setting**: use the general default.
34
+
-**Use default setting**: use the default treatment.
51
35
-**Leave as it is**: do not change anything.
52
36
-**Standardize**: subtract the mean and divide by the standard deviation (not available for sparse data).
53
37
-**Center**: subtract the mean (not available for sparse data).
54
38
-**Scale**: divide by standard deviation.
55
39
-**Normalize to interval [-1, 1]**: linearly scale the values into interval [-1, 1] (not available for sparse data)
56
40
-**Normalize to interval [0, 1]**: linearly scale the values into interval [0, 1] (not available for sparse data)
57
-
58
-
5. If checked, the class attribute is converted in the same fashion as categorical attributes that are treated as ordinal (see above).
41
+
5.*Reset All* resets all variables to default option.
42
+
If *Apply automatically* is ticked, changes will be communicated automatically. Alternatively, click *Apply*.
59
43
60
44
Examples
61
45
--------
62
46
63
-
First, let's see what is the output of the **Continuize** widget. We feed the original data (the *Heart disease*data set) into the [Data Table](../data/datatable) and see how they look like. Then we continuize the discrete values using various options and observe them in another [Data Table](../data/datatable).
47
+
First, let's see what is the output of the **Continuize** widget. We feed the *heart_disease* data from the [File](../data/file.md) into the [Data Table](../data/datatable) and see how it looks like. Then we continuize the categorical variables using various options and observe them in another [Data Table](../data/datatable).
64
48
65
49

66
50
67
-
In the second example, we show a typical use of this widget - in order to properly plot the linear projection of the data, discrete attributes need to be converted to continuous ones and that is why we put the data through the **Continuize** widget before drawing it. Gender, for instance, is transformed into two attributes "*gender=female*" and *gender=male*.
51
+
In the second example, we show a typical use of this widget - in order to properly plot the linear projection of the data with the [Linear Projection](../visualize/linearprojection.md) widget, categorical attributes need to be converted to numeric ones and that is why we put the data through the **Continuize** widget before drawing it. Gender, for instance, is transformed into two attributes "*gender=female*" and *gender=male*.
Create class attribute from a string or categorical attribute.
5
5
6
6
**Inputs**
7
7
@@ -11,28 +11,30 @@ Create class attribute from a string attribute.
11
11
12
12
- Data: dataset with a new class variable
13
13
14
-
**Create Class** creates a new class attribute from an existing discrete or string attribute. The widget matches the string value of the selected attribute and constructs a new user-defined value for matching instances.
14
+
**Create Class** creates a new class attribute from an existing categorical or text attribute. The widget matches the string value of the selected attribute and constructs a new user-defined value for matching instances.
15
15
16
-

16
+
{width=50%}
17
17
18
-
1. The attribute the new class is constructed from.
19
-
2. Matching:
20
-
- Name: the name of the new class value
21
-
- Substring: regex-defined substring that will match the values from the above-defined attribute
22
-
- Instances: the number of instances matching the substring
23
-
- Press '+' to add a new class value
24
-
3. Name of the new class column.
25
-
4. Match only at the beginning will begin matching from the beginning of the string. Case sensitive will match by case, too.
26
-
5. Produce a report.
27
-
6. Press *Apply* to commit the results.
18
+
1. Name of the new class column.
19
+
2. Match by Substring:
20
+
*From column*: The attribute the new class is constructed from.
21
+
*Name*: the name of the new class value
22
+
*Substring*: substring that will match the values from the above-defined attribute
23
+
*Count*: the number of instances matching the substring
24
+
Press '+' to add a new class value
25
+
3. Options:
26
+
-*Use regular expressions* to match by [regular expressions](https://en.wikipedia.org/wiki/Regular_expression).
27
+
-*Match only at the beginning* will begin matching from the beginning of the string.
28
+
-*Case sensitive* will match by case, too.
29
+
4. Press *Apply* to commit the results.
28
30
29
31
Example
30
32
-------
31
33
32
-
Here is a simple example with the *auto-mpg* dataset. Pass the data to **Create Class**. Select *car_name* as a column to create the new class from. Here, we wish to create new values that match the car brand. First, we type *ford* as the new value for the matching strings. Then we define the substring that will match the data instances. This means that all instances containing *ford* in their *car_name*, will now have a value *ford* in the new class column. Next, we define the same for *honda* and *fiat*. The widget will tell us how many instance are yet unmatched (remaining instances). We will name them *other*, but you can continue creating new values by adding a condition with '+'.
34
+
Here is a simple example with the *auto-mpg* dataset. Pass the data to **Create Class**. Select *car name* as a column to create the new class from. Here, we wish to create new values that match the car brand.
33
35
34
-
We named our new class column*car_brand* and we matched at the beginning of the string.
36
+
First, we type *ford* as the new value for the matching strings. Then we define the substring that will match the data instances. This means that all instances containing *ford* in their *car name*, will now have a value *ford* in the new class column. We define the same for *honda* and *fiat*. The widget will tell us how many instance are yet unmatched (remaining instances). It says 326, plus the already defined 72. We will name the remaining instances *other*, but you can continue creating new values by adding a condition with '+'.
35
37
36
38

37
39
38
-
Finally, we can observe the new column in a [Data Table](../data/datatable.md) or use the value as color in the [Scatter Plot](../visualize/scatterplot.md).
40
+
Finally, we can observe the new column in a [Data Table](../data/datatable.md).
Copy file name to clipboardExpand all lines: source/widgets/data/csvfileimport.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ The **CSV File Import** widget reads comma-separated files and sends the dataset
12
12
13
13
*Data Frame* output can be used in the [Python Script](../data/pythonscript.md) widget by connecting it to the `in_object` input (e.g. `df = in_object`). Then it can be used a regular DataFrame.
14
14
15
-
###Import Options
15
+
## Import Options
16
16
17
17
The import window where the user sets the import parameters. Can be re-opened by pressing *Import Options* in the widget.
18
18
@@ -41,7 +41,7 @@ Right click on the column name to set the column type. Right click on the row in
41
41
-*Ignore*: do not output the column.
42
42
4. Pressing *Reset* will return the settings to the previously set state (saved by pressing OK in the Import Options dialogue). *Restore Defaults* will set the settings to their default values. *Cancel* aborts the import, while *OK* imports the data and saves the settings.
43
43
44
-
###Widget
44
+
## Widget
45
45
46
46
The widget once the data is successfully imported.
47
47
@@ -51,7 +51,7 @@ The widget once the data is successfully imported.
51
51
2. Information on the imported data set. Reports on the number of instances (rows), variables (features or columns) and meta variables (special columns).
52
52
3.*Import Options* re-opens the import dialogue where the user can set delimiters, encodings, text fields and so on. *Cancel* aborts data import. *Reload* imports the file once again, adding to the data any changes made in the original file.
53
53
54
-
###Encoding
54
+
## Encoding
55
55
56
56
The dialogue for settings custom encodings list in the Import Options - Encoding dropdown. Select *Customize Encodings List...* to change which encodings appear in the list. To save the changes, simply close the dialogue. Closing and reopening Orange (even with Reset widget settings) will not re-set the list. To do this, press *Restore Defaults*. To have all the available encodings in the list, press *Select all*.
0 commit comments