Service Levels in transportation through an A/B lens

R and pandas library in Python

Diversifying sales and client communication into a wide array of chanels is a useful way to maximize profit through the improvement of Service Levels; «SL» is a rational and quantitative element which evaluates the peformance delivered to the customer base. In the incumbent endeavour, one is presented with a dataset rooted in a transportation company´s customer-order entrances; such entrances or chanels will be divided into three: Website message board, executives Emails, and a web whatsapp chat.

The dataset collection will follow a quite conventional sales report layout: a customer name, Tax id, number of order, date of registration, the executive assigned to the service request amongst other variables irrelevant to the scope of this examination. The data comprised stems from the period of July 2015 to June 2016. The key variable will be the «valor venta» column, which is spanish for sales value. Negative values will mean that the particular order was not fulfilled, the quotation was logged into the system with all the required parameters by one of the three chanels, but due to snags in the response procedure, the customer balked from the order; therefore generating an «opportunity» sale. The invoice value will differ greatly from one order to another, all freight services will have the port of Manzanillo, Mexico as its departure point and destinations will be strewn across the entire Mexican Republic,as for the cargo description, it could be a twenty-foot container, forty or LCL goods, hence the ample cost variation. As stipulated earlier, the desired outcome will be the comparison parameter named Service Level between the three entry chanels of the customer order. Sales variation depending on the first point of contact between the potential customer and the organization: «A» ,for website message board; B, for executives emails, lastly «C» for whatsapp chat run by a manager adjutant.

The service level will be nothing more than a quotient between the actual orders delivered and the total amount of orders entered into the system; simple addition of the opportunity sales and those already fullfilled.

Importing libraries

import pandas as pd
import datetime as dt
import numpy as np
With this chunk of code one  obtains the preliminary look of the dataset sales_dataset=pd.read_csv(r'C:\Users\Alonso\Downloads\VENTAS CLIENTES.csv') 
Image 1: General glance of the de initial dataset, personal data has been concealed

For the sake of fairness, one must be sure that the order input are equaly distributed amongst the three chanels . The ninth column from left to right- «tipocanal» spanish for chanel type- labels the entrance into the system («A», «B», and «C»).

chanel_types_prop=sales_dataset_date['tipocanal'].value_counts(normalize=True)
Image 2: Clearly the order distribution is evenly distributed into the three entry points

Perusing the first image, one sees that the eight column «valor_venta» or sales value has commas and the the $ mexican peso symbol. Both characters should be removed and transform into a float variable with a new column named ‘invoice_value’.

sales_dataset_date['invoice_value']=sales_dataset_date['valor_venta'].replace('[\$,]','', regex=True).astype(float)

Next step will be to see the values behaviour across time with R Studio. This task will only require three columns of the prior dataset: «Date», «invoice_value» and «chanel type. It will be later transformed into a csv file.

sales_dataset_date_yr=sales_dataset_date[['fecha','invoice_value','tipocanal']]
sales_dataset_date_yr.to_csv(r"C:\Users\Alonso\Desktop\doe3.csv")

In R three libraries will be downloaded: lubridate, for date format transformation; dplyr, for manipulating the three imported columns, and ggplot2 for creating plots.

library(lubridate)
library(dplyr)
library(ggplot2)
doe3$month<-format(as.Date(doe3$fecha, format="%Y-%m-%d","%m"))
doe3$month<-month(doe3$fecha)
doe3$year_month<-format(as.Date(doe3$month),format="%Y-%m")
Image3: Output for the data transformation in R denominated doe3. It includes actual sales and opportunity sales.

By applying the dplyr library, an average of actual services delivered and lost ones will be created in a monthly basis.

#Positive values 
time_series_yr_ch<-doe3%>%
  group_by(year_month,tipocanal)%>%
  summarize(avg=mean(invoice_value))

#Negative values
time_series_yr_ch_loss<-doe3%>%
  group_by(year_month,tipocanal)%>%
  filter(invoice_value< 0)%>%
  summarize(aver=mean(invoice_value))

Time Series Plots

time_series_yr_ch%>% ggplot()+ geom_line(aes(year_month,avg,  colour=tipocanal,group=tipocanal,size=2))+theme(legend.position = "bottom", legend.key.size = unit(2,'cm'),legend.key.height = unit(2,'cm'),legend.key.width = unit(2,'cm'))
Image 4: A time series average of the three input chanels for delivered freights
time_series_yr_ch_loss%>% ggplot()+ geom_line(aes(year_month,aver,  colour=tipocanal,group=tipocanal,size=2))+theme(legend.position = "bottom", legend.key.size = unit(2,'cm'),legend.key.height = unit(2,'cm'),legend.key.width = unit(2,'cm'))
Image 5: Time series average for lost sales. Clearly february of 2016 there was a dramatic drop in the three chanels. With A and B leading the losses

In order to create a semblance of the required Service Level indicator an export of the R dataset will take place back into python through a csv.file

write.csv(b_chanel_sales,"C:\\Users\\Alonso\\Desktop\\b_chanel_sales.csv",row.names=FALSE)
write.csv(b_chanel_loss,"C:\\Users\\Alonso\\Desktop\\b_chanel_loss.csv",row.names=FALSE)
#One reintroduces the R datasets into python
b_chanel_sales_dataset=pd.read_csv(r'C:\Users\Alonso\Desktop\b_chanel_sales.csv')
b_chanel_sales_loss=pd.read_csv(r'C:\Users\Alonso\Desktop\b_chanel_loss.csv')
#Transform the opportunity values to positive ones by multiplying it times -1
b_chanel_sales_dataset['opp_sale']=b_chanel_sales_loss['negavg']*-1

With these to variables one can proceed to compute the service level, for instance in «B» chanel in a monthly basis.

# The Service Level quotient
b_chanel_sales_dataset['service_level']= b_chanel_sales_dataset['avg']/(b_chanel_sales_dataset['opp_sale']+b_chanel_sales_dataset['avg'])
Image 6: Through the previous code, one calculates the Service Level indicator. As seen in the opportunity sales average graph, February 2016 has one the lowest Service Levels.

In order to attain greater statistical formality, one should compare the three chanels in a same parametric basis. When reviewing the theoretical background, we utilize the Central Limit Theorem, which states that regardless of the distribution of a given dataset; if one extracts random units, proceeds to computes their means, repeats this process for a few iterations, such outputs will build a new dataset;which will follow a normal distribution. The following code example will demonstrate that.

# To remove emty values, we apply the following function:

sales_dataset_nna=sales_dataset_date.dropna()

The next step will be to create 2000 means from 5 random positive invoice values from each chanel, and insert them into an empty list. These values will generate the numerator for the Service Level quotient, and the same process will be carried out for the negative values; which added to the positive ones will equate the denominator.


sample_means_c=[]
for i in range(2000):
              sample_positive_c=sales_dataset_nna[(sales_dataset_nna['tipocanal'] == ' C') & (sales_dataset_nna['invoice_value']>0)]['invoice_value'].sample(5, replace=True) 
             sample_mean=np.mean(sample_positive_c)
             sample_means_c.append(sample_mean)

The same for negative values.

sample_means_c_neg=[]
for i in range(2000):
    sample_negative_c=sales_dataset_nna[(sales_dataset_nna['tipocanal'] == ' C') & (sales_dataset_nna['invoice_value']<0)]['invoice_value'].sample(5, replace=True) 
    sample_negative_c=sample_negative_c*-1
    sample_mean=np.mean(sample_negative_c)
    sample_means_c_neg.append(sample_mean)
service_level_numerator_c=sample_means_c
service_level_denominator_c=np.add(sample_means_c,sample_means_c_neg)
service_level_c=service_level_numerator_c/service_level_denominator_c

Departing from the previous service level vector for each chanel, a recreation of the sampling process will take place by the application of the random function created for both the numerator and the denominator; this time applied to the Service Level quotient for the «C» chanel.

means_service_level_c=[]
for i in range(1000):
    c_service_prop_array=np.random.choice(service_level_c, size=5, replace=True)
    c_service_level_mean=np.mean(c_service_prop_array)
    means_service_level_c.append(c_service_level_mean)

To visualize the distribution of the previous means array, «means_service_level_c», one ought to plot a histogram in order to prove the apparent normality.

Image 7: The distribution for the means service level seems quite normal, as it is quite symmetric around the mean. The same criteria applies to chanes «A» and «B».

Displaying the Service Levels of the three chanels as Normal Plots

Arriving to the final stage of the research, one should plot three curves from the three different mean service levels using the standard deviation and the mean as defining parameters of the normal distribution.

As independent or x variable, we create an array from the lowest possible service level to the highest up to 95% with a length of 1000, proceed with the computation of the means and standard deviation for each chanel; labeling them as a_bar, b_bar, and c_bar respectively.

x=np.linspace(0.15,0.95,1000)
a_bar=np.mean(means_service_level_a)
b_bar=np.mean(means_service_level_b)
c_bar=np.mean(means_service_level_c)

Now, compute the probability density function with these four variables plus the «sd» or standard deviation.

norm_a=norm.pdf(x,a_bar,np.std(means_service_level_a))
norm_b=norm.pdf(x,b_bar,np.std(means_service_level_b))
norm_c=norm.pdf(x,c_bar,np.std(means_service_level_c))

With the probability density function and feature parameters, one can draw the three curves centered around their means named as «a_bar», «b_bar», and «c_bar» plotted also as dotted lines.

Image 8: The visual differences are quite notorious amongst the three order entry chanels
import matplotlib.pyplot as plt
sns.lineplot(x,norm_a,label='chanel a')
sns.lineplot(x,norm_b, color='red',label='chanel b')
sns.lineplot(x,norm_c, color='orange',label='chanel c')
plt.axvline(x=a_bar,color='fuchsia',linestyle='--')
plt.axvline(x=b_bar,color='darkgreen',linestyle='--')
plt.axvline(x=c_bar, color='indigo',linestyle='--')

plt.xlabel('Service Levels')
plt.show()

It is quite conspicuous that «A», or the website message board has the highest fulfillment rate and greater follow up of customer orders as it travels along the company´s system.

For the analysis to be framed in a more quantitative scope, z-score formula will be used to determine the spread between the three chanels . The «C» option, Whatssapp chat, will be the pivot point due to its further left value. How many «C chanel» standard deviations are the other two mean values swayed to the right ?

Image 9: Z score standard formula for measuring the spread of the three chanels means
z_a_chanel_score=(a_bar-c_bar)/np.std(means_service_level_c)
z_b_chanel_score=(b_bar-c_bar)/np.std(means_service_level_c)

Computing this last chunk of code, one gets the distance of the «A» and «B» with respect to C. The former is equal to 3.17 sd and the latter is equal to 0.63. One can clearly state that those customer requests entering the system through the website message board will have a far greater chance of becoming income stream than those by email or whatsapp chat, both having practically the same service level (0.49 and 0.56).

A Service Level framework of thought is an optimal template for outlining business practices and align them with management goals, due to its panoramic view and its simplicity in terms of computation it is a fruitful primer for the inception of a Strategic Planing mindset. A/B tests are magnificent opportunities to design experiments that will enable the researcher to explore new alternatives that will yield greater returns or device improvements in the existing ones. The implementation of these tests will grant management with the necessary performance indicators to edify that goal oriented drive.


Deja un comentario