dtale.ppscore package¶

Submodules¶

dtale.ppscore.calculation module¶

dtale.ppscore.calculation.matrix(df, output='df', sorted=False, **kwargs)[source]¶

Calculate the Predictive Power Score (PPS) matrix for all columns in the dataframe

df : pandas.DataFrame: The dataframe that contains the data
output: str - potential values: “df”, “list”: Control the type of the output. Either return a pandas.DataFrame (df) or a list with the score dicts
sorted: bool: Whether or not to sort the output dataframe/list by the ppscore
kwargs:: Other key-word arguments that shall be forwarded to the pps.score method, e.g. sample, `cross_validation, `random_seed, `invalid_score, catch_errors

pandas.DataFrame or list of Dict: Either returns a tidy dataframe or a list of all the PPS dicts. This can be influenced by the output argument

dtale.ppscore.calculation.predictors(df, y, output='df', sorted=True, **kwargs)[source]¶

Calculate the Predictive Power Score (PPS) of all the features in the dataframe against a target column

df : pandas.DataFrame: The dataframe that contains the data
y : str: Name of the column y which acts as the target
output: str - potential values: “df”, “list”: Control the type of the output. Either return a pandas.DataFrame (df) or a list with the score dicts
sorted: bool: Whether or not to sort the output dataframe/list by the ppscore
kwargs:: Other key-word arguments that shall be forwarded to the pps.score method, e.g. sample, `cross_validation, `random_seed, `invalid_score, catch_errors

pandas.DataFrame or list of Dict: Either returns a tidy dataframe or a list of all the PPS dicts. This can be influenced by the output argument

dtale.ppscore.calculation.score(df, x, y, task='NOT_SUPPORTED_ANYMORE', sample=5000, cross_validation=4, random_seed=123, invalid_score=0, catch_errors=True)[source]¶

Calculate the Predictive Power Score (PPS) for “x predicts y” The score always ranges from 0 to 1 and is data-type agnostic.

A score of 0 means that the column x cannot predict the column y better than a naive baseline model. A score of 1 means that the column x can perfectly predict the column y given the model. A score between 0 and 1 states the ratio of how much potential predictive power the model achieved compared to the baseline model.

df : pandas.DataFrame: Dataframe that contains the columns x and y
x : str: Name of the column x which acts as the feature
y : str: Name of the column y which acts as the target
sample : int or None: Number of rows for sampling. The sampling decreases the calculation time of the PPS. If None there will be no sampling.
cross_validation : int: Number of iterations during cross-validation. This has the following implications: For example, if the number is 4, then it is possible to detect patterns when there are at least 4 times the same observation. If the limit is increased, the required minimum observations also increase. This is important, because this is the limit when sklearn will throw an error and the PPS cannot be calculated
random_seed : int or None: Random seed for the parts of the calculation that require random numbers, e.g. shuffling or sampling. If the value is set, the results will be reproducible. If the value is None a new random number is drawn at the start of each calculation.
invalid_score : any: The score that is returned when a calculation is invalid, e.g. because the data type was not supported.
catch_errors : bool: If True all errors will be catched and reported as unknown_error which ensures convenience. If False errors will be raised. This is helpful for inspecting and debugging errors.

Dict: A dict that contains multiple fields about the resulting PPS. The dict enables introspection into the calculations that have been performed under the hood

dtale.ppscore package¶

Submodules¶

dtale.ppscore.calculation module¶

Module contents¶

Table Of Contents

This Page