{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Handling Categorical Data\n", "\n", "More often than not a dataset is comprised of both **numeric**, and **categorical** data types. The supervisor divergence functions can handle both, but it needs to know which columns are categorical so that it can handle it properly. This notebook shows you how to do so when using the **supervisor** divergence package." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Dataset with Mixed Data Types" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Create a dataset\n", "To demonstrate, we will create a simple dataset with a mix of categorical and numeric columns. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | latitude | \n", "fruit | \n", "temp | \n", "city | \n", "longitude | \n", "
---|---|---|---|---|---|
0 | \n", "239 | \n", "apple | \n", "104 | \n", "Filly Downs | \n", "257 | \n", "
1 | \n", "181 | \n", "apple | \n", "11 | \n", "Coldport | \n", "303 | \n", "
2 | \n", "246 | \n", "raspberry | \n", "99 | \n", "Filly Downs | \n", "60 | \n", "
3 | \n", "187 | \n", "raspberry | \n", "91 | \n", "Coldport | \n", "90 | \n", "
4 | \n", "97 | \n", "raspberry | \n", "26 | \n", "Filly Downs | \n", "108 | \n", "