dtale.ppscore package

Submodules

dtale.ppscore.calculation module

dtale.ppscore.calculation.matrix(df, output='df', sorted=False, **kwargs)[source]

Calculate the Predictive Power Score (PPS) matrix for all columns in the dataframe

df : pandas.DataFrame
The dataframe that contains the data
output: str - potential values: “df”, “list”
Control the type of the output. Either return a pandas.DataFrame (df) or a list with the score dicts
sorted: bool
Whether or not to sort the output dataframe/list by the ppscore
kwargs:
Other key-word arguments that shall be forwarded to the pps.score method, e.g. sample, `cross_validation, `random_seed, `invalid_score, catch_errors
pandas.DataFrame or list of Dict
Either returns a tidy dataframe or a list of all the PPS dicts. This can be influenced by the output argument
dtale.ppscore.calculation.predictors(df, y, output='df', sorted=True, **kwargs)[source]

Calculate the Predictive Power Score (PPS) of all the features in the dataframe against a target column

df : pandas.DataFrame
The dataframe that contains the data
y : str
Name of the column y which acts as the target
output: str - potential values: “df”, “list”
Control the type of the output. Either return a pandas.DataFrame (df) or a list with the score dicts
sorted: bool
Whether or not to sort the output dataframe/list by the ppscore
kwargs:
Other key-word arguments that shall be forwarded to the pps.score method, e.g. sample, `cross_validation, `random_seed, `invalid_score, catch_errors
pandas.DataFrame or list of Dict
Either returns a tidy dataframe or a list of all the PPS dicts. This can be influenced by the output argument
dtale.ppscore.calculation.score(df, x, y, task='NOT_SUPPORTED_ANYMORE', sample=5000, cross_validation=4, random_seed=123, invalid_score=0, catch_errors=True)[source]

Calculate the Predictive Power Score (PPS) for “x predicts y” The score always ranges from 0 to 1 and is data-type agnostic.

A score of 0 means that the column x cannot predict the column y better than a naive baseline model. A score of 1 means that the column x can perfectly predict the column y given the model. A score between 0 and 1 states the ratio of how much potential predictive power the model achieved compared to the baseline model.

df : pandas.DataFrame
Dataframe that contains the columns x and y
x : str
Name of the column x which acts as the feature
y : str
Name of the column y which acts as the target
sample : int or None
Number of rows for sampling. The sampling decreases the calculation time of the PPS. If None there will be no sampling.
cross_validation : int
Number of iterations during cross-validation. This has the following implications: For example, if the number is 4, then it is possible to detect patterns when there are at least 4 times the same observation. If the limit is increased, the required minimum observations also increase. This is important, because this is the limit when sklearn will throw an error and the PPS cannot be calculated
random_seed : int or None
Random seed for the parts of the calculation that require random numbers, e.g. shuffling or sampling. If the value is set, the results will be reproducible. If the value is None a new random number is drawn at the start of each calculation.
invalid_score : any
The score that is returned when a calculation is invalid, e.g. because the data type was not supported.
catch_errors : bool
If True all errors will be catched and reported as unknown_error which ensures convenience. If False errors will be raised. This is helpful for inspecting and debugging errors.
Dict
A dict that contains multiple fields about the resulting PPS. The dict enables introspection into the calculations that have been performed under the hood

Module contents