utils

column_indexes(df: DataFrame, cols: List[str])[source]
Parameters:
  • df – The dataframe

  • cols – a list of column names

Returns:

The column indexes of the column names

compute_divergence_crosstabs(data, datecol=None, format=None, show_progress=True, divergence=None)[source]

Compute the divergence crosstabs.

Parameters:
  • data – The data to compute the divergences on

  • datecol – The column representing the date. If None, will use the index, if the index is a datetimeindex

  • format – A function applied to datecol values for formatting e.g. format_date

  • show_progress – Whether the progress bar will be shown

  • divergence – The divergence function to use

compute_divergence_crosstabs_split(subsets, dates, format=None, show_progress=True, divergence=None)[source]

Compute the divergence crosstabs.

Parameters:
  • subsets – The data to compute the divergences on

  • dates – The list of dates for the subsets

  • format – A function applied to datecol values for formatting e.g. format_date

  • show_progress – Whether the progress bar will be shown

  • divergence – The divergence function to use

parallel(func, arr: Collection, max_workers=None, show_progress: bool = False)[source]
NOTE: This code was adapted from the parallel function

within Fastai’s Fastcore library. Key differences include returning a list with order preserved.

Run a function on a collection (list, set etc) of items :param func: The function to run :param arr: The collection to run on :param max_workers: How many workers to use. Will use

multiprocessing.cpu_count() if this is not provided

Returns:

a list of the results

plot_divergence_crosstabs_3d(divergences)[source]

Plot the divergences in 3d.

Params divergences:

The list of divergences

split(x, train_ratio=0.5, nprng=RandomState(MT19937) at 0x7FBAB0C26640)[source]