KNIME
Tableau are good to use in data visualization because of its simplicity but it is not intended for data mining or predictive analytics.
On the other hand, KNIME would be a better choice for you in compute data mining tasks.
KNIME, an open source data analytics platform with large set of building blocks and 3rd party tools.
From loading data to a final report or predictions on new values by using previously model, all can done within KNIME.
You can use Tableau along with KNIME which we can perform task of data mining and data visualization as well.
KNIME now is available in 4 flavor:
- Desktop / Professional,
- Team Space,
- Server and
- Cluster Execution
Desktop version is the only open source.
Support will be given to Professional subscription, as well as support on the future development of KNIME.
Let’s look on what KNIME contains.
When KNIME starts, you need to specify which workspace want to use where workspace are storage of files.
No just file, settings and logs are contain within as well.
Just like folders in a filesystem, workspace contain workflow groups and workflows which helps you organize your workflows.
Workflows are like execution plan, might be your programs and processes that describe the steps which should be applied to load, analyze, visualize or transform your data where the executable parts in workflows can edited by using workflow editor.
Basic processing units of a workflow are nodes which each of them has a number of input- and/or output ports and red lights indicate the node has to be configured in order to be able to be executed by right clicking and choosing ‘Configure’.
So let’s go with a simple example first, you can login to KNIME public workflow server using a fixed guest account, here’s the instructions.
Below are workflow of standard preprocessing (003002_StandardPreprocessing), an example workflow from KNIME server.
Based on the workflow, I have made a simple version of it to discuss some nodes within the workflow. We have a set of customer sample data where it contain age, education, marital-status, and so on. We want to find out how many of them are in young, middle or old age. So here’s are the simple workflow I have made for discussion on how KNIME can help us to find the answer we want.
Let’s have a short discussion about it.
Node 1 are used to read data from an ASCII file or URL location. There are many other node for you to read your data from reading csv file to connect database, you have discover by yourself with KNIME.
For Row Filter node, it contain 1 input port and 1 output port. We can define whether the output table of node are include or exclude rows by attribute value, number or row ID with matching criteria (based on column we choose to test) we desires. We can use pattern matching (by finding similar string) or by using range checking (by setting a range of number). For this workflow, it will only include missing values based on “native-country” column.
Then we can use Column Filter (Node 8) to include columns that we wanted.
With output table from Node 2 (Row Filter) and Node 8 (Column Filter) as inputs of Reference column filter (Node 9) to eliminate null values from columns by exclude columns from reference table. This is to allow us to find the numeric bins of age without null values. With 0 input ports (1st input port) are table from which columns are to be included or excluded and 1 input ports (2nd input port) are table with the columns used as references to obtain table with filtered columns.
By executing Node 9 (reference column filter), we will have output table as below:
With the table above, by select a column to define a number of intervals. In this example we will use age to define a range of young, middle and old age.
By using age, to define the age of person is young, middle or old with a defined range for each bins.
We can replace the column with bins or append a new column.
To visualize the number of ages, I have choose pie chart to visualize it. There are few types of pie chart we can choose just by download the feature with install KNIME Extensions.
Here are two types of pie chart,
So here’s the simple introduction about usage of KNIME.
If you interested, you can download it from here by choosing the right version of KNIME for your OS .