Selecting Model Data
The first step in the Akkio Flow is to connect data, as data is the fuel for any machine learning model. Akkio is a tabular AI tool, which means you’ll want historical data in a tabular format, such as a CSV, Excel file, Snowflake dataset, or Salesforce dataset. Other options include Google Sheets, Google BigQuery, and Hubspot.
In machine learning, both quality and quantity are important, so high quality, large datasets are preferred. “Quality” means things like having few missing values, having properly formatted data, and having data that’s indicative of the problem you’re trying to solve. There’s no minimum dataset size for connecting to Akkio, but ideally your dataset is at least a couple hundred rows, and preferably thousands (or millions) of rows.
Crucially, your dataset must be indicative of the problem at hand. If you want to predict churn, you’ll need a historical customer dataset with a churn column. If you want to predict employee attrition, you’ll need a dataset with an attrition column, and so on.
After connecting a dataset, you’ll have an overview of the data, including the name of the dataset, the number of rows and columns, current tranformations, the Chat Data Prep function and a scrollable preview of the dataset. You can also click to “Replace” the dataset, or hit “download” to export the dataset.
Akkio will automatically recognize the variable types in the dataset, which can be any of the following:
- Number (Integer)
You can change a column’s variable type by clicking its existing variable type. In the example below, the current variable type of the “Gender” column is correctly selected as “Category,” but you could select “Category” to change it to another variable type.