The PyMC3 workflow¶

PyMC3 aims for intuitive and readable, yet powerful syntax that reflects how statisticians describe models. The modeling process generally follows these five steps:

Encode a probability model by defining the following:
1. The prior distributions that quantify knowledge and uncertainty about latent variables
2. The likelihood function that conditions the parameters on observed data

Analyze the posterior using one of the options:
1. Obtain a point estimate using MAP inference
2. Sample from the posterior using MCMC methods

Approximate the posterior using variational Bayes.
Check the model using various diagnostic tools.
Generate predictions.

The resulting model can be used for inference to gain detailed insights into parameter values as well as to predict outcomes for new data points.

Logistic regression with PyMC3¶

Logistic regression estimates a linear relationship between a set of features and a binary outcome, mediated by a sigmoid function to ensure the model produces probabilities. The frequentist approach resulted in point estimates for the parameters that measure the influence of each feature on the probability that a data point belongs to the positive class, with confidence intervals based on assumptions about the parameter distribution.

In contrast, Bayesian logistic regression estimates the posterior distribution over the parameters itself. The posterior allows for more robust estimates of what is called a Bayesian credible interval for each parameter with the benefit of more transparency about the model’s uncertainty.

Imports¶

import warnings
warnings.filterwarnings('ignore')

from pathlib import Path
import pickle
from collections import OrderedDict
import pandas as pd
import numpy as np
from scipy import stats

from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import (roc_curve, roc_auc_score, confusion_matrix, accuracy_score, f1_score, 
                             precision_recall_curve) 
import theano
import pymc3 as pm
from pymc3.variational.callbacks import CheckParametersConvergence
import statsmodels.formula.api as smf

import arviz as az
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.lines as mlines

import seaborn as sns
from IPython.display import HTML

sns.set(style="white", color_codes=True)
pd.set_option('display.max_columns', None)

data_path = Path('data')
fig_path = Path('figures')
model_path = Path('models')
for p in [data_path, fig_path, model_path]:
    if not p.exists():
        p.mkdir()

The Data¶

The goal of this dataset is to create a binary classification model that predicts whether or not over $11,000$ customers will subscribe to a term deposit after a marketing campaign the bank has performed, based on $17$ predictor variables. The target variable is given as deposit and takes on a value of $1$ if the customer has subscribed and $0$ otherwise.

This is a slightly imbalanced class problem because there are more customers that did not subscribe the term deposit than did.

Load from Disk¶

data = pd.read_csv('bank.csv')
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11162 entries, 0 to 11161
Data columns (total 17 columns):
age          11162 non-null int64
job          11162 non-null object
marital      11162 non-null object
education    11162 non-null object
default      11162 non-null object
balance      11162 non-null int64
housing      11162 non-null object
loan         11162 non-null object
contact      11162 non-null object
day          11162 non-null int64
month        11162 non-null object
duration     11162 non-null int64
campaign     11162 non-null int64
pdays        11162 non-null int64
previous     11162 non-null int64
poutcome     11162 non-null object
deposit      11162 non-null object
dtypes: int64(7), object(10)
memory usage: 1.4+ MB

data.head()

unique = data.nunique()
unique

age            76
job            12
marital         3
education       4
default         2
balance      3805
housing         2
loan            2
contact         3
day            31
month          12
duration     1428
campaign       36
pdays         472
previous       34
poutcome        4
deposit         2
dtype: int64

mask = data.nunique().values < 31
cat_features = list(unique[mask].index)
cat_features

['job',
 'marital',
 'education',
 'default',
 'housing',
 'loan',
 'contact',
 'month',
 'poutcome',
 'deposit']

Quick EDA¶

total = len(data)
plt.figure(figsize=(7,5))
g = sns.countplot(x='deposit', data=data)
g.set_ylabel('Count', fontsize=14)
for p in g.patches:
    height = p.get_height()
    g.text(p.get_x()+p.get_width()/2.,
            height + 1.5,
            '{:1.2f}%'.format(height/total*100),
            ha="center", fontsize=14, fontweight='bold')
plt.margins(y=0.1)
plt.show()

fig, axs = plt.subplots(ncols=2, nrows=0, figsize=(13, 45))
plt.subplots_adjust(right=2)
plt.subplots_adjust(top=2)

for i, feature in enumerate(list(data[cat_features]), 1):
    if(feature=='deposit'):
        break
        
    plt.subplot(len(cat_features), 3, i)
    sns.countplot(y=feature, hue='deposit', data=data)
    plt.title(f'{feature}', size=15, fontsize=12)    
    plt.legend(loc='best', prop={'size': 8})
    
plt.tight_layout()       
plt.show()

plt.figure(figsize=(8,4))
sns.countplot(x='day', hue='deposit', data=data);

num_features = [f for f in data.columns if f not in cat_features]
num_features.remove('day')
num_features

['age', 'balance', 'duration', 'campaign', 'pdays', 'previous']

basetable0 = pd.DataFrame(index=data.index)
basetable0['deposit'] = data['deposit']
deposit_dict = {'no': 0, 'yes': 1}
basetable0 = basetable0.replace({'deposit': deposit_dict})

for variable in num_features:
    if len(data.groupby(variable)) > 31:
        new_variable = "disc_" + variable
        basetable0[new_variable] = pd.qcut(data[variable], 5, duplicates='drop')

basetable0.head()

def create_pig_table(basetable, target, variable):
    
    groups = basetable[[target,variable]].groupby(variable)
    pig_table = groups[target].agg(Incidence = np.mean, Size = np.size).reset_index()
    
    return pig_table

def plot_pig(pig_table, variable):
    
    plt.figure(figsize=(10, 4))
    plt.ylabel("Size", rotation=0, rotation_mode="anchor", ha="right")
    pig_table["Size"].plot(kind="bar", width=0.5, color="lightgray", edgecolor="none") 
    pig_table["Incidence"].plot(secondary_y=True)
    
    plt.xticks(np.arange(len(pig_table)), pig_table[variable])
    plt.xlim([-0.5, len(pig_table) - 0.5])
    plt.ylabel("Incidence", rotation=0, rotation_mode="anchor", ha="left")
    plt.title(f'{variable}')
    
    plt.show()

variables = list(basetable0.drop('deposit', axis=1).columns)

pig_tables = {}
for variable in variables: 
    pig_table = create_pig_table(basetable0, 'deposit', variable)
    pig_tables[variable] = pig_table

for variable in variables: 
    pig_table = create_pig_table(basetable0, "deposit", variable)
    plot_pig(pig_table, variable)

Data Preprocessing¶

job_dict = {'management': 12, 'admin.': 11, 'entrepreneur': 10, 'technician': 9, 'services': 8, 'self-employed': 7,
            'blue-collar': 6, 'retired': 5, 'housemaid': 4, 'unemployed': 3, 'unknown': 2, 'student': 1}
data = data.replace({'job': job_dict})

marital_dict = {'married': 4, 'single': 3, 'divorced': 2, 'unknown': 1}
data = data.replace({'marital': marital_dict})
data['marital'] = data['marital'].astype(int)

edu_dict = {'tertiary': 4, 'secondary': 3, 'primary': 2, 'unknown': 1}
data = data.replace({'education': edu_dict})

default_dict = {'no': 1, 'yes': 0}
data = data.replace({'default': default_dict})

housing_dict = {'no': 1, 'yes': 0}
data = data.replace({'housing': housing_dict})

loan_dict = {'no': 1, 'yes': 0}
data = data.replace({'loan': loan_dict})

contact_dict = {'cellular': 3, 'telephone': 2, 'unknown': 1}
data = data.replace({'contact': contact_dict})

poutcome_dict = {'success': 4, 'other': 3, 'unknown': 2, 'failure': 1}
data = data.replace({'poutcome': poutcome_dict})

month_dict = {'jan': 1, 'feb': 2, 'aug': 8, 'nov': 11, 'jun': 6, 'apr': 4, 'jul': 7, 'may': 5, 'oct': 10, 
              'mar': 3, 'sep': 9, 'dec': 12}
data = data.replace({'month': month_dict})

pdays_dict = {-1: 0}
data = data.replace({'pdays': pdays_dict})

data.head()

data.dtypes

age           int64
job           int64
marital       int64
education     int64
default       int64
balance       int64
housing       int64
loan          int64
contact       int64
day           int64
month         int64
duration      int64
campaign      int64
pdays         int64
previous      int64
poutcome      int64
deposit      object
dtype: object

mask = data.nunique().values > 31
num_features = list(unique[mask].index)
num_features

['age', 'balance', 'duration', 'campaign', 'pdays', 'previous']

Distributions and transformations¶

data[num_features].hist(bins=30, figsize=(8,7))
plt.tight_layout();

from sklearn.preprocessing import MinMaxScaler
import scipy.stats as stats

colors = cm.rainbow(np.linspace(0, 1, len(num_features)))

for col, c in zip(num_features, colors):
    
    minmax = MinMaxScaler(feature_range=(0.0001, max(data[col].values)))
    if data[col].describe().iloc[3] <= 0.0:
        z = minmax.fit_transform(data[col].values.reshape(-1, 1))
        z = np.squeeze(z)
        print('BC MinMax Scaled')
    else: 
        z = data[col]
        
    z0, _ = stats.boxcox(z, lmbda=None)
    z1 = stats.boxcox(z, lmbda=-1)
    z2 = stats.boxcox(z, lmbda=-0.5)
    z3 = stats.boxcox(z, lmbda=0.0)
    z4 = stats.boxcox(z, lmbda=0.5)
    z5 = stats.boxcox(z, lmbda=2)
    z6 = stats.boxcox(z, lmbda=-2)
    z7 = np.cbrt(data[col])

    trans = [data[col], z0, z1, z2, z3, z4, z5, z6, z7]
    names = ['Original', 'Boxcox', 'Recip', 'Recip_Sqrt', 'Log', 'Sqrt', 'Square', 'Recip_Square', 'Cube_rt']

    fig = plt.figure(figsize=(12,6))
    for i, t, n in zip(range(1, 10), trans, names):
        ax = fig.add_subplot(3, 3, i)
        plt.hist(t, bins=30, color=c, alpha=0.6)
        plt.title(col+'_'+n)
        
    plt.tight_layout()
    plt.show();

BC MinMax Scaled

BC MinMax Scaled

BC MinMax Scaled

data.balance = np.cbrt(data.balance)

age, _ = stats.boxcox(data.age, lmbda=None)
data.age = age

duration, _ = stats.boxcox(data.duration, lmbda=None)
data.duration = duration

from sklearn.preprocessing import scale

cols = ['duration', 'balance', 'age']

data.loc[:, cols] = scale(data.loc[:, cols])

nums = data[num_features]
nums['deposit'] = data.deposit

sns.pairplot(nums, hue='deposit', diag_kind='hist');

deposit_dict = {'no': 0, 'yes': 1}
data = data.replace({'deposit': deposit_dict})

data.head()

Correlations¶

plt.figure(figsize=(13, 13))
corr = data.corr() 
mask = np.tri(*corr.shape).T 
sns.heatmap(corr.abs(), mask=mask, annot=True, annot_kws={"size": 9})
b, t = plt.ylim() 
b += 0.5 
t -= 0.5 
plt.ylim(b, t) 
plt.show()

n_fts = len(data.columns)
colors = cm.rainbow(np.linspace(0, 1, n_fts))

data.drop('deposit',axis=1).corrwith(data.deposit).sort_values(ascending=True).plot(kind='barh', 
                                                                                     color=colors, figsize=(12, 6))
plt.title('Correlation to Target (deposit)')
plt.show()

print('\n',data.drop('deposit',axis=1).corrwith(data.deposit).sort_values(ascending=False))

 duration     0.519315
contact      0.249847
poutcome     0.237089
housing      0.203888
balance      0.151666
pdays        0.151167
previous     0.139867
loan         0.110580
education    0.075583
default      0.040680
month        0.028645
age         -0.007167
job         -0.031694
day         -0.056326
marital     -0.068325
campaign    -0.128081
dtype: float64

Logistic Regression with one independent variable¶

Example logistic regression equation:

$$y = \frac{e^{(\beta_0 + \beta_1*x)}}{(1 + e^{(\beta_0 + \beta_1*x)})}$$

Where $y$ is the predicted output, $e$ is the base of the natural logarithms or the np.exp() function, $\beta_0$ is the bias or intercept term and $\beta_1$ is the coefficient for the single input value $(x)$. Each column in your input data has an associated $\beta$ coefficient (a constant real value) that must be learned from the training data.

We are going to begin with the simplest possible logistic model, using just one independent variable or feature, the duration.

y_simple = data['deposit']
x_n = 'duration' 
x_c = data[x_n].values

with pm.Model() as model_simple:
    # random variables for the coefficient with
    # uninformative priors for each parameter
    α = pm.Normal('α', mu=0, sd=10)
    β = pm.Normal('β', mu=0, sd=10)
    
    # Transform random variables into vector of probabilities 
    μ = α + pm.math.dot(x_c, β)    
    θ = pm.Deterministic('θ', pm.math.sigmoid(μ))
    
    # The decision boundary
    db = pm.Deterministic('db', -α/β)
    
    # Bernoulli random vector with probability of success
    # given by sigmoid function and actual data as observed
    y = pm.Bernoulli(name='logit', p=θ, observed=y_simple)

    trace_simple = pm.sample(1000, tune=1000, chains=4, init = 'adapt_diag', cores=5)

Auto-assigning NUTS sampler...
Initializing NUTS using adapt_diag...
Multiprocess sampling (4 chains in 5 jobs)
NUTS: [β, α]
Sampling 4 chains, 0 divergences: 100%|██████████| 8000/8000 [00:09<00:00, 875.08draws/s]

model_simple.model

The command pm.model_to_graphviz(manual_logistic_model) produces the plate notation displayed below.

It shows the unobserved parameters as light and the observed elements as dark circles. The rectangle indicates the number of repetitions of the observed model element implied by the data included in the model definition.

pm.model_to_graphviz(model_simple)

theta = trace_simple['θ'].mean(axis=0)
idx = np.argsort(x_c)

plt.plot(x_c[idx], theta[idx], color='red', lw=3)
plt.vlines(trace_simple['db'].mean(), 0, 1, color='k', ls='--')
plt.scatter(x_c, np.random.normal(y_simple, 0.02), marker='.', alpha=0.5, 
            color=[f'C{x}' for x in y_simple])

plt.xlabel(x_n)
plt.ylabel('θ', rotation=0)
locs, _ = plt.xticks()
plt.xticks(locs, np.round(locs + x_c.mean(), 1));

The above plot shows non deposit versus deposit $(y = 0, y = 1)$. The S-shaped (red) line is the mean value of $\theta$ (sigmoid function). This line can be interpreted as the probability of a deposit, given that we know that the last time contact duration (the value of the duration). The decision boundary is represented as a (black) vertical line. According to the decision boundary, the values of duration to the left correspond to $y = 0$ (non deposit), and the values to the right to $y = 1$ (deposit).

Create a data frame with trace summary statistics.¶

az.summary(trace_simple, var_names=['α', 'β', 'db'])

mean: Trace mean
sd: Trace standard deviation
mcse: Markov Chain Standard Error statistic
ess: Estimate of the effective sample size
r_hat: The R-hat diagnostic tests for lack of convergence by comparing the variance between multiple chains to the variance within each chain. If convergence has been achieved, the between-chain and within-chain variances should be identical. To be most effective in detecting evidence for nonconvergence, each chain should have been initialized to starting values that are dispersed relative to the target distribution.

Posterior Predictive Checks¶

PPCs are very useful for examining how well a model fits the data. They do so by generating data from the model using parameters from draws from the posterior. We use the function pm.sample_ppc for this purpose and obtain $n$ samples for each observation (the GLM module automatically names the outcome $y$):

ppc = pm.sample_ppc(trace_simple, model=model_simple, samples=500)
preds = np.rint(ppc['logit'].mean(axis=0)).astype('int')
y_score = np.mean(ppc['logit'], axis=0)

100%|██████████| 500/500 [00:00<00:00, 671.87it/s]

print('Accuracy of the simplest model:', accuracy_score(data['deposit'], preds))
print('AUC score of the simplest model:', roc_auc_score(data['deposit'], y_score))

Accuracy of the simplest model: 0.7170757928686615
AUC score of the simplest model: 0.8038875231925058

Models¶

simple_model = 'deposit ~ duration + pdays + previous'
full_model = 'deposit ~ age + job + marital + default + day + month + duration + pdays + previous + education + balance +  loan + poutcome + campaign +  housing + contact'

MAP Inference¶

A probabilistic program consists of observed and unobserved random variables (RVs). We define the observed RVs via likelihood distributions and unobserved RVs via prior distributions. PyMC3 includes numerous probability distributions for this purpose.

The PyMC3 library makes it very straightforward to perform approximate Bayesian inference for logistic regression. Logistic regression models the probability that individual $i$ subscribes to a deposit based on $k$ features:

$$ p(y_i = 1 \mid \beta) = \sigma (\beta_0 + \beta_1 x_{i1} + \dots + \beta_k x_{ik}) $$

Where $\sigma$ is the logistic function: $$ \sigma(t) = \frac{1}{1 + e^{-1}} $$

Manual Model Specification¶

We will use the context manager with to define a manual_logistic_model that we can refer to later as a probabilistic model:

The random variables for the unobserved parameters for intercept and $3$ features are expressed using uninformative priors that assume normal distributions with mean $0$ and standard deviation of $100$.

The likelihood combines the parameters with the data according to the specification of the logistic regression.

The outcome is modeled as a Bernoulli RV with success probability given by the likelihood.

with pm.Model() as manual_logistic_model:
    # random variables for coefficients with
    # uninformative priors for each parameter

    intercept = pm.Normal('intercept', 0, sd=100)
    beta_1 = pm.Normal('beta_1', 0, sd=100)
    beta_2 = pm.Normal('beta_2', 0, sd=100)
    beta_3 = pm.Normal('beta_3', 0, sd=100)

    # Transform random variables into vector of probabilities p(y_i=1)
    # according to logistic regression model specification.
    likelihood = pm.invlogit(intercept + beta_1 * data.duration + beta_2 * data.pdays + beta_3 * data.previous)

    # Bernoulli random vector with probability of success
    # given by sigmoid function and actual data as observed
    pm.Bernoulli(name='logit', p=likelihood, observed=data.deposit)

manual_logistic_model.model

pm.model_to_graphviz(manual_logistic_model)

MAP Estimate¶

Maximum a posteriori probability (MAP) estimation leverages that the evidence is a constant factor that scales the posterior to meet the requirements for a probability distribution. Since the evidence does not depend on $θ$, the posterior distribution is proportional to the product of the likelihood and the prior. Hence, MAP estimation chooses the value of $θ$ that maximizes the posterior given the observed data and the prior belief, that is, the mode of the posterior.

$$ \theta_{MAP} = \operatorname*{arg\,max}_\theta P(X \mid \theta) P(\theta) $$

We obtain point MAP estimates for the three parameters using the just defined model’s .find_MAP() method:

with manual_logistic_model:
    # compute maximum a-posteriori estimate
    # for logistic regression weights
    manual_map_estimate = pm.find_MAP()

logp = -5,766.1, ||grad|| = 217.22: 100%|██████████| 45/45 [00:00<00:00, 329.78it/s]

def print_map(result):
    return pd.Series({k: np.asscalar(v) for k, v in result.items()})

print_map(manual_map_estimate)

intercept   -0.389113
beta_1       1.487781
beta_2       0.002422
beta_3       0.140743
dtype: float64

GLM Model¶

PyMC3 includes numerous common models so that we can usually leave the manual specification for custom applications.

The following code defines the same logistic regression as a member of the Generalized Linear Models (GLM) family using the formula format inspired by the statistical language R and ported to python by the patsy library:

with pm.Model() as logistic_model:
    pm.glm.GLM.from_formula(simple_model,
                            data,
                            family=pm.glm.families.Binomial())

pm.model_to_graphviz(logistic_model)

PyMC3 solves the optimization problem of finding the posterior point with the highest density using the quasi-Newton Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm but offers several alternatives provided by the scipy library. The result is virtually identically to the corresponding statsmodels estimate:

model = smf.logit(formula=simple_model, data=data[['deposit', 'duration', 'pdays', 'previous']])
result = model.fit()
print(result.summary())

Optimization terminated successfully.
         Current function value: 0.514605
         Iterations 6
                           Logit Regression Results                           
==============================================================================
Dep. Variable:                deposit   No. Observations:                11162
Model:                          Logit   Df Residuals:                    11158
Method:                           MLE   Df Model:                            3
Date:                Fri, 09 Oct 2020   Pseudo R-squ.:                  0.2561
Time:                        14:43:13   Log-Likelihood:                -5744.0
converged:                       True   LL-Null:                       -7721.6
Covariance Type:            nonrobust   LLR p-value:                     0.000
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.3891      0.026    -14.920      0.000      -0.440      -0.338
duration       1.4878      0.031     48.209      0.000       1.427       1.548
pdays          0.0024      0.000      9.319      0.000       0.002       0.003
previous       0.1407      0.014      9.874      0.000       0.113       0.169
==============================================================================

print_map(manual_map_estimate)

intercept   -0.389113
beta_1       1.487781
beta_2       0.002422
beta_3       0.140743
dtype: float64

result.params

Intercept   -0.389111
duration     1.487782
pdays        0.002422
previous     0.140742
dtype: float64

Markov Chain Monte Carlo¶

Markov chains are stochastic models that describe sequences of possible events. Each event comes from a set of outcomes, and each outcome determines which outcome occurs next, according to a fixed set of probabilities. An important feature of Markov chains is that they are memoryless: everything that you would possibly need to predict the next event is available in the current state, and no new information comes from knowing historical events.

Monte Carlo methods rely on repeated random sampling to approximate results that may be deterministic, but that does not permit an analytic, exact solution.

Many algorithms apply the Monte Carlo method to a Markov Chain, and generally proceed as follows:

Start at the current position.
Draw a new position from a proposal distribution.
Evaluate the probability of the new position in light of data and prior distributions:
1. If sufficiently likely, move to the new position
2. Otherwise, remain at the current position
Repeat from step 1.
After a given number of iterations, return all accepted positions.

Define the Model¶

We will use a slightly more complicated model to illustrate Markov chain Monte Carlo inference:

full_model = 'deposit ~ age + job + marital + default + day + month + duration + pdays + previous + education + balance + loan + poutcome + campaign + housing + contact'

with pm.Model() as logistic_model:
    pm.glm.GLM.from_formula(formula=full_model,
                            data=data,
                            family=pm.glm.families.Binomial())

pm.model_to_graphviz(logistic_model)

Hamiltonian Monte Carlo – going NUTS¶

By default, PyMC3 automatically selects the most efficient sampler and initializes the sampling process for efficient convergence. For a continuous model, PyMC3 chooses the NUTS sampler. It also runs variational inference via ADVI to find good starting parameters for the sampler. One among several alternatives is to use the MAP estimate.

Hamiltonian Monte Carlo (HMC) is a hybrid method that leverages the first-order derivative information of the gradient of the likelihood to propose new states for exploration and overcome some of the challenges of MCMC. In addition, it incorporates momentum to efficiently jump around the posterior. As a result, it converges faster to a high-dimensional target distribution than simpler random-walk Metropolis or Gibbs sampling.

To see what the convergence looks like, we first draw $1,000$ samples after tuning the sampler for $1,000$ iterations that will be discarded. The sampling process can be parallelized for multiple chains using the cores argument (except when using GPU).

with logistic_model:
    trace = pm.sample(tune=1000, draws=1000, chains=4, init = 'adapt_diag', cores=5)

Auto-assigning NUTS sampler...
Initializing NUTS using adapt_diag...
Multiprocess sampling (4 chains in 5 jobs)
NUTS: [contact, housing, campaign, poutcome, loan, balance, education, previous, pdays, duration, month, day, default, marital, job, age, Intercept]
Sampling 4 chains, 0 divergences: 100%|██████████| 8000/8000 [02:44<00:00, 48.75draws/s]

Inspect Trace¶

pm.plot_trace(trace);

On the left, we have a KDE plot, — for each parameter value on the x-axis we get a probability on the y-axis that tells us how likely that parameter value is. The maximum posterior estimate of each variable (the peak in the left side distributions) is very close to the true parameters.

On the right, we get the individual sampled values at each step during the sampling. From the trace plot, we can visually get the plausible values from the posterior. Our sampling chains for the individual parameters (left) seem well converged and stationary (there are no large drifts or other odd patterns).

Plot joint distribution of parameters.¶

One of the major benefits of Bayesian data analysis is that we can be explicit about our uncertainty. Maximum likelihood returns a number, but how certain can we be that we found the right number? Instead, Bayesian inference returns a distribution over parameter values.

temp = pd.DataFrame(dict(job =trace['job'], edu = trace['education']))

kdeplot = sns.jointplot(data=temp, x='edu', y='job', kind="kde", cmap='jet', cbar=True, n_levels=100,
                        marginal_kws={'color': 'blue'}).annotate(stats.pearsonr)

kdeplot.fig.set_size_inches(9, 7)

plt.subplots_adjust(left=0.1, right=0.8, top=0.9, bottom=0.1)
pos_joint_ax = kdeplot.ax_joint.get_position()
pos_marg_x_ax = kdeplot.ax_marg_x.get_position()
kdeplot.ax_joint.set_position([pos_joint_ax.x0, pos_joint_ax.y0, pos_marg_x_ax.width, pos_joint_ax.height])
kdeplot.fig.axes[-1].set_position([.83, pos_joint_ax.y0, .07, pos_joint_ax.height])

plt.show()

Interpreting Model Coefficients¶

Here we fit a logistic regression model using MLE via the statsmodels.formula library and exctract the p-values and coefficients. We will also calculate the odds ratio of the coefficients to try and interpret the models feature importance.

tmp = data.copy()
tmp['deposit'] = data['deposit']

model = smf.logit(formula=full_model, data=tmp)
result = model.fit()

Optimization terminated successfully.
         Current function value: 0.436392
         Iterations 7

Alt text that describes the graphic

An odds ratio (OR) is a measure of association between a certain property $A$ and a second property $B$ in a population. Specifically, it tells you how the presence or absence of property $A$ has an effect on the presence or absence of property $B$. The OR is also used to figure out if a particular exposure (like eating processed meat) is a risk factor for a particular outcome (such as colon cancer), and to compare the various risk factors for that outcome. As long as you have two properties you think are linked, you can calculate the odds.

You have two choices for the formula: $$\frac{(a \div c)}{(b \div d)}$$

or, equivalently: $$\frac{(a \times d)}{(b \times c)}$$

$OR = 1$ indicates exposure does not affect odds of event occurrence
$OR > 1$ indicates increased occurrence of event
$OR < 1$ indicates decreased occurrence of event (protective exposure)

We can get the odds ratio of the model coefficients by exponentiating the coefficients.

log_df = pd.DataFrame(dict(coefs = result.params[1:], p_vals = result.pvalues[1:]), index=data.columns)
log_df['odds'] = np.exp(log_df['coefs'])
log_df['Pct_odds_increase'] = (log_df['odds'] - 1) * 100
log_df = log_df.sort_values('odds', ascending=False)
mask = log_df.p_vals <= 0.05
log_df[mask]

Each curve shows how the probability of subscribing to a term deposit changes with duration, given this customer owns a house.

The red curve represents success, the green curve represents unknown and, the blue failure poutcomes (outcome of the previous marketing campaign). For all three poutcomes levels, the probability of subscribing a term deposit increases with duration until approximately duration (last contact duration, in seconds) $0$, when the probability begins to decrease.

We are plotting $100$ different curves for each level of poutcomes. Each curve is a draw from our posterior distribution.

def lm_full(trace, duration, poutcome, housing):
    shape = np.broadcast(duration, poutcome, housing).shape
    x_norm = np.asarray([np.broadcast_to(x, shape) for x in [duration, poutcome, housing]])
    
    return 1 / (1 + np.exp(-(trace['Intercept'] + 
                             trace['duration']*x_norm[0] + 
                             trace['poutcome']*x_norm[1] +
                             trace['housing']*x_norm[2])))

lm = lambda x, samples: lm_full(samples, x, 1, 1)
lm2 = lambda x, samples: lm_full(samples, x,2, 1)
lm3 = lambda x, samples: lm_full(samples, x, 4, 1)

min_age = min(data.duration)
max_age = max(data.duration)

# Plot the posterior predictive distributions of P(deposit = yes) vs. duration
pm.plot_posterior_predictive_glm(trace, eval=np.linspace(
    min_age, max_age, 1000), lm=lm, samples=100, color="blue", alpha=.7)
pm.plot_posterior_predictive_glm(trace, eval=np.linspace(
    min_age, max_age, 1000), lm=lm2, samples=100, color="green", alpha=.7)
pm.plot_posterior_predictive_glm(trace, eval=np.linspace(
    min_age, max_age, 1000), lm=lm3, samples=100, color="red", alpha=.7)

blue_line = mlines.Line2D(['lm'], [], color='b', label='failure')
green_line = mlines.Line2D(['lm2'], [], color='g', label='unknown')
red_line = mlines.Line2D(['lm3'], [], color='r', label='success')
plt.legend(handles=[blue_line, green_line, red_line], loc='lower right')
plt.ylabel("P(deposit = yes)")
plt.xlabel("duration")
plt.show()

pm.trace_to_dataframe(trace).info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4000 entries, 0 to 3999
Data columns (total 17 columns):
Intercept    4000 non-null float64
age          4000 non-null float64
job          4000 non-null float64
marital      4000 non-null float64
default      4000 non-null float64
day          4000 non-null float64
month        4000 non-null float64
duration     4000 non-null float64
pdays        4000 non-null float64
previous     4000 non-null float64
education    4000 non-null float64
balance      4000 non-null float64
loan         4000 non-null float64
poutcome     4000 non-null float64
campaign     4000 non-null float64
housing      4000 non-null float64
contact      4000 non-null float64
dtypes: float64(17)
memory usage: 531.4 KB

with open(model_path / 'logistic_model_mh.pkl', 'wb') as buff:
    pickle.dump({'model': logistic_model, 'trace': trace}, buff)

pm.summary(trace)

draws = 100

trace_df = pm.trace_to_dataframe(trace).assign(
    chain=lambda x: x.index // draws)

trace_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4000 entries, 0 to 3999
Data columns (total 18 columns):
Intercept    4000 non-null float64
age          4000 non-null float64
job          4000 non-null float64
marital      4000 non-null float64
default      4000 non-null float64
day          4000 non-null float64
month        4000 non-null float64
duration     4000 non-null float64
pdays        4000 non-null float64
previous     4000 non-null float64
education    4000 non-null float64
balance      4000 non-null float64
loan         4000 non-null float64
poutcome     4000 non-null float64
campaign     4000 non-null float64
housing      4000 non-null float64
contact      4000 non-null float64
chain        4000 non-null int64
dtypes: float64(17), int64(1)
memory usage: 562.6 KB

Persist Results¶

with open(model_path / 'logistic_model_nuts.pkl', 'wb') as buff:
    pickle.dump({'model': logistic_model,
                 'trace': trace}, buff)

with open(model_path / 'logistic_model_nuts.pkl', 'rb') as buff:
    data0 = pickle.load(buff)  

logistic_model, trace_NUTS = data0['model'], data0['trace']

Combine Traces¶

draws = 10000

df = pm.trace_to_dataframe(trace_NUTS).iloc[200:].reset_index(
    drop=True).assign(chain=lambda x: x.index // draws)

trace_df = pd.concat([trace_df.assign(samples=100),
                      df.assign(samples=10000)])
trace_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 7800 entries, 0 to 3799
Data columns (total 19 columns):
Intercept    7800 non-null float64
age          7800 non-null float64
job          7800 non-null float64
marital      7800 non-null float64
default      7800 non-null float64
day          7800 non-null float64
month        7800 non-null float64
duration     7800 non-null float64
pdays        7800 non-null float64
previous     7800 non-null float64
education    7800 non-null float64
balance      7800 non-null float64
loan         7800 non-null float64
poutcome     7800 non-null float64
campaign     7800 non-null float64
housing      7800 non-null float64
contact      7800 non-null float64
chain        7800 non-null int64
samples      7800 non-null int64
dtypes: float64(17), int64(2)
memory usage: 1.2 MB

Visualize both traces¶

trace_df_long = pd.melt(trace_df, id_vars=['samples', 'chain'])
trace_df_long.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 132600 entries, 0 to 132599
Data columns (total 4 columns):
samples     132600 non-null int64
chain       132600 non-null int64
variable    132600 non-null object
value       132600 non-null float64
dtypes: float64(1), int64(2), object(1)
memory usage: 4.0+ MB

Convergence¶

We can visualize the samples over time and their distributions to check the quality of the results. The following charts show the posterior distributions after an initial $100$ and an additional $10,000$ samples, respectively, and illustrate how convergence implies that multiple chains identify the same distribution. The pm.trace_plot() function shows the evolution of the samples as well.

g = sns.FacetGrid(trace_df_long, col='variable', row='samples',
                  hue='chain', sharex='col', sharey=False)
g = g.map(sns.distplot, 'value', hist=False, rug=False)

Estimate Odds Ratio¶

b = trace['duration']
lb, ub = np.percentile(b, 2.5), np.percentile(b, 97.5)
lb, ub = np.exp(lb), np.exp(ub)
print(f'P({lb:.3f} < Odds Ratio < {ub:.3f}) = 0.95')

P(4.985 < Odds Ratio < 5.725) = 0.95

We can interpret something along those lines: "With probability $0.95$ the odds ratio is greater than $4.985$ and less than $5.725$, so the duration effect takes place because a person with a longer contact duration has at least $4.985$ higher probability to subscribe to a term deposit than a person with a lower duration, while holding all the other variables constant.

Computing Credible Intervals¶

We can compute the credible intervals, the Bayesian counterpart of confidence intervals, as percentiles of the trace. The resulting boundaries reflect our confidence about the range of the parameter value for a given probability threshold, as opposed to the number of times the parameter will be within this range for a large number of trials.

fig, ax = plt.subplots(figsize=(8, 4))
sns.distplot(np.exp(b), axlabel='Odds Ratio', ax=ax)
ax.set_title(f'Credible Interval: P({lb:.3f} < Odds Ratio < {ub:.3f}) = 0.95')
ax.axvspan(lb, ub, alpha=0.25, color='gray');

Variational Inference¶

Variational Inference (VI) is a machine learning method that approximates probability densities through optimization. In the Bayesian context, it approximates the posterior distribution as follows:

Select a parametrized family of probability distributions
Find the member of this family closest to the target, as measured by Kullback-Leibler divergence

Compared to MCMC, Variational Bayes tends to converge faster and scales to large data better. While MCMC approximates the posterior with samples from the chain that will eventually converge arbitrarily close to the target, variational algorithms approximate the posterior with the result of the optimization, which is not guaranteed to coincide with the target.

Variational Inference is better suited for large datasets and to quickly explore many models.In contrast, MCMC will deliver more accurate results on smaller datasets or when time and computational resources pose fewer constraints.

Run Automatic Differentation Variational Inference (ADVI)¶

The interface for variational inference is very similar to the MCMC implementation. We just use the fit() instead of the sample() function, with the option to include an early stopping CheckParametersConvergence callback if the distribution-fitting process converged up to a given tolerance:

with logistic_model:
    callback = CheckParametersConvergence(diff='absolute')
    approx = pm.fit(n=100000, callbacks=[callback])

Average Loss = 5,120.8: 100%|██████████| 100000/100000 [02:13<00:00, 751.37it/s] 
Finished [100%]: Average Loss = 5,120.7

Persist Result¶

with open(model_path / 'logistic_model_advi.pkl', 'wb') as buff:
    pickle.dump({'model': logistic_model,
                 'approx': approx}, buff)

Sample from approximated distribution¶

We can draw samples from the approximated distribution to obtain a trace object as above for the MCMC sampler:

trace_advi = approx.sample(2000)

# pm.summary(trace_advi).to_csv(model_path / 'trace_advi.csv')

Visualize the covariance structure of the models¶

az.plot_pair(trace_NUTS, figsize=(10, 10), kind='kde', divergences=True);

Let us visualize the covariance structure of the ADVI model

az.plot_pair(trace_advi, figsize=(10, 10), kind='kde', divergences=True);

Clearly, ADVI does not capture (as expected) the interactions between variables because of the mean field approximation, and so it underestimates the overall variance.

Model Diagnostics¶

Bayesian model diagnostics includes validating that the sampling process has converged and consistently samples from high-probability areas of the posterior, and confirming that the model represents the data well.

For high-dimensional models with many variables, it becomes cumbersome to inspect numerous traces. When using NUTS, the energy plot helps to assess problems of convergence. It summarizes how efficiently the random process explores the posterior. The plot shows the energy and the energy transition matrix that should be well matched as in the below example.

Energy Plot¶

When using NUTS, the energy plot helps to assess problems of convergence. It summarizes how efficiently the random process explores the posterior. The plot shows the energy and the energy transition matrix, which should be well-matched.

pm.energyplot(trace_NUTS);

Forest Plot¶

A forest plot, also known as a blobbogram, is a graphical display of estimated results from a number of scientific studies addressing the same question, along with the overall results. It was developed for use in medical research as a means of graphically representing a meta-analysis of the results of randomized controlled trials.

az.plot_forest([trace_advi, trace_NUTS], model_names=['trace_advi', 'trace_NUTS']);

Posterior Plot¶

pm.plot_posterior(trace_NUTS);

Model selection using Widely-applicable Information Criterion (WAIC)¶

WAIC (Watanabe 2010) is a fully Bayesian criterion for estimating out-of-sample expectation, using the computed log pointwise posterior predictive density (LPPD) and correcting for the effective number of parameters to adjust for overfitting.

One question that was immediately asked was what effect does campaign have on the model, and why should it be $\text{campaign}^2$ versus campaign? We’ll run the model with a few changes to see what effect increasing polynomial complexity has on this model in terms of WAIC.

def run_models(df, upper_order=5):
   
    models, traces = OrderedDict(), OrderedDict()

    for k in range(1,upper_order+1):

        nm = 'k{}'.format(k)
        fml = create_poly_modelspec(k)

        with pm.Model() as models[nm]:

            print('\nRunning: {}'.format(nm))
            pm.glm.GLM.from_formula(fml, df,
                                    family=pm.glm.families.Binomial())
            
            traces[nm] = pm.sample(1000, tune=1000, chains=4, init = 'adapt_diag', cores=5)

    return models, traces

def create_poly_modelspec(k=1):
   
    return ('deposit ~ age + job + marital + default + day + month + duration + pdays + previous + education + balance +  loan + poutcome + campaign +  housing + contact' + ' '.join(['+ np.power(campaign,{})'.format(j)
                                     for j in range(2,k+1)])).strip()

models_lin, traces_lin = run_models(data, 4)

Running: k1

Auto-assigning NUTS sampler...
Initializing NUTS using adapt_diag...
Multiprocess sampling (4 chains in 5 jobs)
NUTS: [contact, housing, campaign, poutcome, loan, balance, education, previous, pdays, duration, month, day, default, marital, job, age, Intercept]
Sampling 4 chains, 1 divergences: 100%|██████████| 8000/8000 [02:38<00:00, 50.50draws/s]

Running: k2

Auto-assigning NUTS sampler...
Initializing NUTS using adapt_diag...
Multiprocess sampling (4 chains in 5 jobs)
NUTS: [np.power(campaign, 2), contact, housing, campaign, poutcome, loan, balance, education, previous, pdays, duration, month, day, default, marital, job, age, Intercept]
Sampling 4 chains, 0 divergences: 100%|██████████| 8000/8000 [02:38<00:00, 50.40draws/s]

Running: k3

Auto-assigning NUTS sampler...
Initializing NUTS using adapt_diag...
Multiprocess sampling (4 chains in 5 jobs)
NUTS: [np.power(campaign, 3), np.power(campaign, 2), contact, housing, campaign, poutcome, loan, balance, education, previous, pdays, duration, month, day, default, marital, job, age, Intercept]
Sampling 4 chains, 0 divergences: 100%|██████████| 8000/8000 [03:05<00:00, 43.18draws/s]

Running: k4

Auto-assigning NUTS sampler...
Initializing NUTS using adapt_diag...
Multiprocess sampling (4 chains in 5 jobs)
NUTS: [np.power(campaign, 4), np.power(campaign, 3), np.power(campaign, 2), contact, housing, campaign, poutcome, loan, balance, education, previous, pdays, duration, month, day, default, marital, job, age, Intercept]
Sampling 4 chains, 1 divergences: 100%|██████████| 8000/8000 [09:54<00:00, 13.45draws/s]
There was 1 divergence after tuning. Increase `target_accept` or reparameterize.

model_trace_dict = dict()
ks = ['k1', 'k2', 'k3', 'k4']
for nm in ks:
    models_lin[nm].name = nm
    model_trace_dict.update({models_lin[nm]: traces_lin[nm]})

dfwaic = pm.compare(model_trace_dict, ic='WAIC')
dfwaic = dfwaic.reset_index(drop=True)
dfwaic['names'] = ks
dfwaic = dfwaic.sort_values('rank')
dfwaic.index = dfwaic.names

dfwaic

We prefer the model(s) with lower WAIC.

Next we use the pm.compareplot function takes the output of pm.compare and produces a summary plot.

pm.compareplot(dfwaic);

The empty circle represents the values of WAIC and the black error bars associated with them are the values of the standard deviation of WAIC.

The value of the lowest WAIC is also indicated with a vertical dashed grey line to ease comparison with other WAIC values.

The filled in black dots are the in-sample deviance of each model, which for WAIC is 2 pWAIC from the corresponding WAIC value.

For all models except the top-ranked one, we also get a triangle, indicating the value of the difference of WAIC between that model and the top model, and a grey error bar indicating the standard error of the differences between the top-ranked WAIC and WAIC for each model.

This confirms that the original model is better than the models with increasing polynomial complexity.

Posterior Predictive Checks¶

PPCs are very useful for examining how well a model fits the data. They do so by generating data from the model using parameters from draws from the posterior. We use the function pm.sample_ppc for this purpose and obtain $n$ samples for each observation (the GLM module automatically names the outcome $y$):

ppc = pm.sample_ppc(trace_NUTS, samples=500, model=logistic_model)

100%|██████████| 500/500 [00:00<00:00, 517.36it/s]

ppc['y'].shape

(500, 11162)

Check AUC Score¶

y_score = np.mean(ppc['y'], axis=0)
pred_scores = dict(y_true=data.deposit,y_score=y_score)
preds = np.rint(ppc['y'].mean(axis=0)).astype('int')

print('Accuracy of the full model: ', accuracy_score(data.deposit, preds))
print('AUC of the full model: ', roc_auc_score(**pred_scores))

Accuracy of the full model:  0.8011109120229349
AUC of the full model:  0.8795237358009937

Prediction¶

Follows PyMC3 docs

Predictions use theano’s shared variables to replace the training data with test data before running posterior predictive checks. To facilitate visualization, we create the train and test datasets, and convert the former to a shared variable. Note that we need to use numpy arrays and provide a list of column labels:

Train-test split¶

X = data.drop('deposit', axis=1)
y = data.deposit

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
labels = X_train.columns

Create shared theano variable¶

X_shared = theano.shared(X_train.values)

Define logistic model¶

with pm.Model() as logistic_model_pred:
    pm.glm.GLM(x=X_shared, labels=labels,
               y=y_train, family=pm.glm.families.Binomial())

Run NUTS sampler¶

with logistic_model_pred:
    pred_trace = pm.sample(draws=1000, tune=1000, chains=4, cores=5, init='adapt_diag')

Auto-assigning NUTS sampler...
Initializing NUTS using adapt_diag...
Multiprocess sampling (4 chains in 5 jobs)
NUTS: [poutcome, previous, pdays, campaign, duration, month, day, contact, loan, housing, balance, default, education, marital, job, age, Intercept]
Sampling 4 chains, 0 divergences: 100%|██████████| 8000/8000 [02:45<00:00, 48.19draws/s]

Replace shared variable with test set¶

We then run the sampler as before, and apply the pm.sample_ppc function to the resulting trace after replacing the train with test data:

X_shared.set_value(X_test)

ppc = pm.sample_ppc(pred_trace,
                    model=logistic_model_pred,
                    samples=5000)

100%|██████████| 5000/5000 [00:10<00:00, 454.75it/s]

AUC Score¶

y_score = np.mean(ppc['y'], axis=0)
preds = np.rint(ppc['y'].mean(axis=0)).astype('int')

print('Accuracy on test data: ', accuracy_score(y_test, preds))
print('AUC on test data: ', roc_auc_score(y_score=y_score, y_true=y_test))

Accuracy on test data:  0.793999104343932
AUC on test data:  0.8756986854986891

Thresholding probabilities¶

from plot_utils import plot_thresholds, comp_conf_matrix

plot_thresholds(X_test, y_test, y_score, name='Bayesian Logistic Regression', fig_x=13, fig_y=4, 
                sizes=[0.55, -95, -45])

g_ts_pred = np.where(y_score > 0.44, 1, 0)
f1_ts_pred = np.where(y_score > 0.40, 1, 0)
prl_ts_pred = np.where(y_score > 0.50, 1, 0)

f1_ts_cm = np.around(confusion_matrix(y_test, f1_ts_pred, normalize='true'), decimals=3)
g_ts_cm = np.around(confusion_matrix(y_test, g_ts_pred, normalize='true'), decimals=3)
prl_ts_cm = np.around(confusion_matrix(y_test, prl_ts_pred, normalize='true'), decimals=3)
no_ts_cm = np.around(confusion_matrix(y_test, preds, normalize='true'), decimals=3)

comp_conf_matrix(mtx1=no_ts_cm, mtx2=f1_ts_cm, labels=['deposit', 'no deposit'], 
                 titles=['No Threshold', 'F1 Thresholded'], sizes=[10, 4])

comp_conf_matrix(mtx1=prl_ts_cm, mtx2=g_ts_cm, labels=['deposit', 'no deposit'], 
                 titles=['PRL Thresholded', 'G Thresholded'], sizes=[10, 4])

	deposit	disc_age	disc_balance	disc_duration	disc_campaign	disc_pdays	disc_previous
0	1	(52.0, 95.0]	(2223.0, 81204.0]	(585.0, 3881.0]	(0.999, 2.0]	(-1.001, 95.0]	(-0.001, 1.0]
1	1	(52.0, 95.0]	(-6847.001, 62.0]	(585.0, 3881.0]	(0.999, 2.0]	(-1.001, 95.0]	(-0.001, 1.0]
2	1	(36.0, 42.0]	(862.6, 2223.0]	(585.0, 3881.0]	(0.999, 2.0]	(-1.001, 95.0]	(-0.001, 1.0]
3	1	(52.0, 95.0]	(2223.0, 81204.0]	(323.0, 585.0]	(0.999, 2.0]	(-1.001, 95.0]	(-0.001, 1.0]
4	1	(52.0, 95.0]	(62.0, 337.0]	(585.0, 3881.0]	(0.999, 2.0]	(-1.001, 95.0]	(-0.001, 1.0]

	mean	sd	hpd_3%	hpd_97%	mcse_mean	ess_mean	ess_sd	ess_bulk	ess_tail	r_hat
α	-0.135	0.023	-0.177	-0.093	0.000	3629.0	3572.0	3628.0	2898.0	1.0
β	1.439	0.031	1.385	1.498	0.001	3367.0	3365.0	3349.0	2621.0	1.0
db	0.094	0.016	0.064	0.123	0.000	3656.0	3592.0	3656.0	2771.0	1.0

	coefs	p_vals	odds	Pct_odds_increase
duration	1.671703	0.000000e+00	5.321223	432.122345
housing	0.939173	7.765103e-66	2.557865	155.786525
poutcome	0.670200	2.952181e-68	1.954629	95.462907
contact	0.580022	7.722039e-60	1.786078	78.607753
loan	0.558923	1.560255e-12	1.748787	74.878725
balance	0.257488	1.190841e-21	1.293677	29.367658
education	0.167433	3.239794e-06	1.182266	18.226588
previous	0.057102	1.498422e-04	1.058764	5.876393
pdays	0.002370	2.181593e-16	1.002373	0.237272
day	-0.006962	2.105777e-02	0.993062	-0.693806
job	-0.021345	2.051986e-02	0.978881	-2.111852
campaign	-0.095100	2.462751e-13	0.909282	-9.071792
marital	-0.163300	1.069260e-05	0.849336	-15.066392

	mean	sd	hpd_3%	hpd_97%	mcse_mean	mcse_sd	ess_mean	ess_sd	ess_bulk	ess_tail	r_hat
Intercept	-3.753	0.302	-4.337	-3.209	0.005	0.004	3379.0	3361.0	3375.0	2984.0	1.0
age	-0.045	0.026	-0.096	0.002	0.000	0.000	4591.0	3761.0	4598.0	3133.0	1.0
job	-0.021	0.009	-0.038	-0.004	0.000	0.000	5148.0	4153.0	5169.0	2622.0	1.0
marital	-0.163	0.037	-0.230	-0.093	0.001	0.000	4966.0	4773.0	4949.0	3204.0	1.0
default	0.142	0.216	-0.265	0.548	0.003	0.003	4230.0	2702.0	4233.0	2914.0	1.0
day	-0.007	0.003	-0.013	-0.001	0.000	0.000	5646.0	4676.0	5668.0	3090.0	1.0
month	-0.018	0.010	-0.036	0.001	0.000	0.000	5087.0	3955.0	5079.0	2794.0	1.0
duration	1.676	0.035	1.610	1.743	0.001	0.000	4946.0	4946.0	4940.0	2664.0	1.0
pdays	0.002	0.000	0.002	0.003	0.000	0.000	4299.0	4299.0	4317.0	3234.0	1.0
previous	0.058	0.015	0.029	0.084	0.000	0.000	3964.0	3762.0	3962.0	3283.0	1.0
education	0.168	0.036	0.099	0.234	0.001	0.000	4533.0	4441.0	4529.0	3066.0	1.0
balance	0.258	0.027	0.210	0.309	0.000	0.000	5051.0	4904.0	5057.0	2872.0	1.0
loan	0.559	0.079	0.415	0.706	0.001	0.001	5613.0	5565.0	5612.0	2899.0	1.0
poutcome	0.672	0.039	0.603	0.747	0.001	0.000	5267.0	5267.0	5281.0	2896.0	1.0
campaign	-0.096	0.013	-0.120	-0.071	0.000	0.000	5846.0	5693.0	5850.0	2901.0	1.0
housing	0.942	0.055	0.841	1.047	0.001	0.001	5093.0	5058.0	5117.0	2718.0	1.0
contact	0.581	0.036	0.517	0.650	0.001	0.000	5095.0	5095.0	5105.0	3042.0	1.0

	rank	waic	p_waic	d_waic	weight	se	dse	warning	waic_scale	names
names
k1	0	-4869.19	19.8589	0	0.516058	59.0606	0	True	log	k1
k2	1	-4869.27	19.7413	0.0823213	0.478088	59.0329	0.417066	True	log	k2
k3	2	-4883.47	20.8359	14.2835	0.00574277	59.1341	6.34255	True	log	k3
k4	3	-4888.31	17.5975	19.1204	0.000111089	59.166	5.90907	False	log	k4

	age	job	marital	education	default	balance	housing	loan	contact	day	month	duration	campaign	pdays	poutcome	deposit
0	59	admin.	married	secondary	no	2343	yes	no	unknown	5	may	1042	1	-1	unknown	yes
1	56	admin.	married	secondary	no	45	no	no	unknown	5	may	1467	1	-1	unknown	yes
2	41	technician	married	secondary	no	1270	yes	no	unknown	5	may	1389	1	-1	unknown	yes
3	55	services	married	secondary	no	2476	yes	no	unknown	5	may	579	1	-1	unknown	yes
4	54	admin.	married	tertiary	no	184	no	no	unknown	5	may	673	2	-1	unknown	yes

	age	job	marital	education	default	balance	housing	loan	contact	day	month	duration	campaign	poutcome	deposit
0	1.393156	11	4	3	1	0.800181	0	1	1	5	5	1.633998	1	2	1
1	1.225089	11	4	3	1	-0.723010	1	1	1	5	5	2.107157	1	2	1
2	0.163473	9	4	3	1	0.416053	0	1	1	5	5	2.029763	1	2	1
3	1.166438	8	4	3	1	0.838821	0	1	1	5	5	0.881662	1	2	1
4	1.106378	11	4	4	1	-0.389262	1	1	1	5	5	1.067320	2	2	1