The Skeptic Dataist

Categoría: Sin categoría

Service Levels in transportation through an A/B lens

julio 24th, 2023
R and pandas library in Python

Diversifying sales and client communication into a wide array of chanels is a useful way to maximize profit through the improvement of Service Levels; «SL» is a rational and quantitative element which evaluates the peformance delivered to the customer base. In the incumbent endeavour, one is presented with a dataset rooted in a transportation company´s customer-order entrances; such entrances or chanels will be divided into three: Website message board, executives Emails, and a web whatsapp chat.

The dataset collection will follow a quite conventional sales report layout: a customer name, Tax id, number of order, date of registration, the executive assigned to the service request amongst other variables irrelevant to the scope of this examination. The data comprised stems from the period of July 2015 to June 2016. The key variable will be the «valor venta» column, which is spanish for sales value. Negative values will mean that the particular order was not fulfilled, the quotation was logged into the system with all the required parameters by one of the three chanels, but due to snags in the response procedure, the customer balked from the order; therefore generating an «opportunity» sale. The invoice value will differ greatly from one order to another, all freight services will have the port of Manzanillo, Mexico as its departure point and destinations will be strewn across the entire Mexican Republic,as for the cargo description, it could be a twenty-foot container, forty or LCL goods, hence the ample cost variation. As stipulated earlier, the desired outcome will be the comparison parameter named Service Level between the three entry chanels of the customer order. Sales variation depending on the first point of contact between the potential customer and the organization: «A» ,for website message board; B, for executives emails, lastly «C» for whatsapp chat run by a manager adjutant.

The service level will be nothing more than a quotient between the actual orders delivered and the total amount of orders entered into the system; simple addition of the opportunity sales and those already fullfilled.

Importing libraries
```
import pandas as pd
import datetime as dt
import numpy as np
```
```
With this chunk of code one  obtains the preliminary look of the dataset sales_dataset=pd.read_csv(r'C:\Users\Alonso\Downloads\VENTAS CLIENTES.csv') 
```
Image 1: General glance of the de initial dataset, personal data has been concealed

For the sake of fairness, one must be sure that the order input are equaly distributed amongst the three chanels . The ninth column from left to right- «tipocanal» spanish for chanel type- labels the entrance into the system («A», «B», and «C»).
```
chanel_types_prop=sales_dataset_date['tipocanal'].value_counts(normalize=True)
```
Image 2: Clearly the order distribution is evenly distributed into the three entry points

Perusing the first image, one sees that the eight column «valor_venta» or sales value has commas and the the $ mexican peso symbol. Both characters should be removed and transform into a float variable with a new column named ‘invoice_value’.
```
sales_dataset_date['invoice_value']=sales_dataset_date['valor_venta'].replace('[\$,]','', regex=True).astype(float)
```
Next step will be to see the values behaviour across time with R Studio. This task will only require three columns of the prior dataset: «Date», «invoice_value» and «chanel type. It will be later transformed into a csv file.
```
sales_dataset_date_yr=sales_dataset_date[['fecha','invoice_value','tipocanal']]
```
```
sales_dataset_date_yr.to_csv(r"C:\Users\Alonso\Desktop\doe3.csv")
```
In R three libraries will be downloaded: lubridate, for date format transformation; dplyr, for manipulating the three imported columns, and ggplot2 for creating plots.
```
library(lubridate)
library(dplyr)
library(ggplot2)
doe3$month<-format(as.Date(doe3$fecha, format="%Y-%m-%d","%m"))
doe3$month<-month(doe3$fecha)
doe3$year_month<-format(as.Date(doe3$month),format="%Y-%m")
```
Image3: Output for the data transformation in R denominated doe3. It includes actual sales and opportunity sales.

By applying the dplyr library, an average of actual services delivered and lost ones will be created in a monthly basis.
```
#Positive values 
time_series_yr_ch<-doe3%>%
  group_by(year_month,tipocanal)%>%
  summarize(avg=mean(invoice_value))
```
```
#Negative values
time_series_yr_ch_loss<-doe3%>%
  group_by(year_month,tipocanal)%>%
  filter(invoice_value< 0)%>%
  summarize(aver=mean(invoice_value))
```
Time Series Plots
```
time_series_yr_ch%>% ggplot()+ geom_line(aes(year_month,avg,  colour=tipocanal,group=tipocanal,size=2))+theme(legend.position = "bottom", legend.key.size = unit(2,'cm'),legend.key.height = unit(2,'cm'),legend.key.width = unit(2,'cm'))
```
Image 4: A time series average of the three input chanels for delivered freights
```
time_series_yr_ch_loss%>% ggplot()+ geom_line(aes(year_month,aver,  colour=tipocanal,group=tipocanal,size=2))+theme(legend.position = "bottom", legend.key.size = unit(2,'cm'),legend.key.height = unit(2,'cm'),legend.key.width = unit(2,'cm'))
```
Image 5: Time series average for lost sales. Clearly february of 2016 there was a dramatic drop in the three chanels. With A and B leading the losses

In order to create a semblance of the required Service Level indicator an export of the R dataset will take place back into python through a csv.file
```
write.csv(b_chanel_sales,"C:\\Users\\Alonso\\Desktop\\b_chanel_sales.csv",row.names=FALSE)
write.csv(b_chanel_loss,"C:\\Users\\Alonso\\Desktop\\b_chanel_loss.csv",row.names=FALSE)
```
```
#One reintroduces the R datasets into python
b_chanel_sales_dataset=pd.read_csv(r'C:\Users\Alonso\Desktop\b_chanel_sales.csv')
b_chanel_sales_loss=pd.read_csv(r'C:\Users\Alonso\Desktop\b_chanel_loss.csv')
```
```
#Transform the opportunity values to positive ones by multiplying it times -1
b_chanel_sales_dataset['opp_sale']=b_chanel_sales_loss['negavg']*-1
```
With these to variables one can proceed to compute the service level, for instance in «B» chanel in a monthly basis.
```
# The Service Level quotient
b_chanel_sales_dataset['service_level']= b_chanel_sales_dataset['avg']/(b_chanel_sales_dataset['opp_sale']+b_chanel_sales_dataset['avg'])
```
Image 6: Through the previous code, one calculates the Service Level indicator. As seen in the opportunity sales average graph, February 2016 has one the lowest Service Levels.

In order to attain greater statistical formality, one should compare the three chanels in a same parametric basis. When reviewing the theoretical background, we utilize the Central Limit Theorem, which states that regardless of the distribution of a given dataset; if one extracts random units, proceeds to computes their means, repeats this process for a few iterations, such outputs will build a new dataset;which will follow a normal distribution. The following code example will demonstrate that.
```
# To remove emty values, we apply the following function:

sales_dataset_nna=sales_dataset_date.dropna()
```
The next step will be to create 2000 means from 5 random positive invoice values from each chanel, and insert them into an empty list. These values will generate the numerator for the Service Level quotient, and the same process will be carried out for the negative values; which added to the positive ones will equate the denominator.
```
sample_means_c=[]
for i in range(2000):
              sample_positive_c=sales_dataset_nna[(sales_dataset_nna['tipocanal'] == ' C') & (sales_dataset_nna['invoice_value']>0)]['invoice_value'].sample(5, replace=True) 
             sample_mean=np.mean(sample_positive_c)
             sample_means_c.append(sample_mean)
```
The same for negative values.
```
sample_means_c_neg=[]
for i in range(2000):
    sample_negative_c=sales_dataset_nna[(sales_dataset_nna['tipocanal'] == ' C') & (sales_dataset_nna['invoice_value']<0)]['invoice_value'].sample(5, replace=True) 
    sample_negative_c=sample_negative_c*-1
    sample_mean=np.mean(sample_negative_c)
    sample_means_c_neg.append(sample_mean)
```
```
service_level_numerator_c=sample_means_c
```
```
service_level_denominator_c=np.add(sample_means_c,sample_means_c_neg)
```
```
service_level_c=service_level_numerator_c/service_level_denominator_c
```
Departing from the previous service level vector for each chanel, a recreation of the sampling process will take place by the application of the random function created for both the numerator and the denominator; this time applied to the Service Level quotient for the «C» chanel.
```
means_service_level_c=[]
for i in range(1000):
    c_service_prop_array=np.random.choice(service_level_c, size=5, replace=True)
    c_service_level_mean=np.mean(c_service_prop_array)
    means_service_level_c.append(c_service_level_mean)
```
To visualize the distribution of the previous means array, «means_service_level_c», one ought to plot a histogram in order to prove the apparent normality.

Image 7: The distribution for the means service level seems quite normal, as it is quite symmetric around the mean. The same criteria applies to chanes «A» and «B».

Displaying the Service Levels of the three chanels as Normal Plots

Arriving to the final stage of the research, one should plot three curves from the three different mean service levels using the standard deviation and the mean as defining parameters of the normal distribution.

As independent or x variable, we create an array from the lowest possible service level to the highest up to 95% with a length of 1000, proceed with the computation of the means and standard deviation for each chanel; labeling them as a_bar, b_bar, and c_bar respectively.
```
x=np.linspace(0.15,0.95,1000)
a_bar=np.mean(means_service_level_a)
b_bar=np.mean(means_service_level_b)
c_bar=np.mean(means_service_level_c)
```
Now, compute the probability density function with these four variables plus the «sd» or standard deviation.
```
norm_a=norm.pdf(x,a_bar,np.std(means_service_level_a))
norm_b=norm.pdf(x,b_bar,np.std(means_service_level_b))
norm_c=norm.pdf(x,c_bar,np.std(means_service_level_c))
```
With the probability density function and feature parameters, one can draw the three curves centered around their means named as «a_bar», «b_bar», and «c_bar» plotted also as dotted lines.

Image 8: The visual differences are quite notorious amongst the three order entry chanels
```
import matplotlib.pyplot as plt
sns.lineplot(x,norm_a,label='chanel a')
sns.lineplot(x,norm_b, color='red',label='chanel b')
sns.lineplot(x,norm_c, color='orange',label='chanel c')
plt.axvline(x=a_bar,color='fuchsia',linestyle='--')
plt.axvline(x=b_bar,color='darkgreen',linestyle='--')
plt.axvline(x=c_bar, color='indigo',linestyle='--')

plt.xlabel('Service Levels')
plt.show()
```
It is quite conspicuous that «A», or the website message board has the highest fulfillment rate and greater follow up of customer orders as it travels along the company´s system.

For the analysis to be framed in a more quantitative scope, z-score formula will be used to determine the spread between the three chanels . The «C» option, Whatssapp chat, will be the pivot point due to its further left value. How many «C chanel» standard deviations are the other two mean values swayed to the right ?

Image 9: Z score standard formula for measuring the spread of the three chanels means
```
z_a_chanel_score=(a_bar-c_bar)/np.std(means_service_level_c)
z_b_chanel_score=(b_bar-c_bar)/np.std(means_service_level_c)
```
Computing this last chunk of code, one gets the distance of the «A» and «B» with respect to C. The former is equal to 3.17 sd and the latter is equal to 0.63. One can clearly state that those customer requests entering the system through the website message board will have a far greater chance of becoming income stream than those by email or whatsapp chat, both having practically the same service level (0.49 and 0.56).

A Service Level framework of thought is an optimal template for outlining business practices and align them with management goals, due to its panoramic view and its simplicity in terms of computation it is a fruitful primer for the inception of a Strategic Planing mindset. A/B tests are magnificent opportunities to design experiments that will enable the researcher to explore new alternatives that will yield greater returns or device improvements in the existing ones. The implementation of these tests will grant management with the necessary performance indicators to edify that goal oriented drive.
Credit risk assessment with Transition matrices and Markov Chains

mayo 7th, 2023
The defraying of logistical expenditures can be a challenge, specially, in face of short term cash obligations. Companies can resort to factoring mechanisms that will grant them credit for their monthly import operations; for such service providers keeping historical records is of paramount relevance.Ergo, one can manage risk involved in money lending. The dataset containing the credit customer data spans over a 4 month period, the amount of money lent varies from customer to customer.

The purpose of this entry is to apply Markov chains knowledge into the debtors behaviour . All debtors are divided into four groups: AAA, when the debt is liquidated within 30 days after its emission; AA, when the debt is liquidated within 60 days; B, when the total amount is paid within 90 days; Bad debt, subdivided itself into two: C, when the accounts are transfered, throughout a 30 day period, to a collection agency which will add a surcharge to each unit of debt retrieved of $ 800,00; D, when the debt is not paid with the agency assistance, and more aggressive action will need to take place. The raw dataset will contain information from the debtors for the first trimester of 2021. Each group or state will be characterised for its fluidity with the notorious exception of Bad debt (C and D) which are absorbent states; customers can move through the remaining states depending on the punctuality of payments along the different emissions. Due to the categorical and binary nature of the debt status (paid or unpaid), one ought to employ zeroes and ones to label the four milestone-payment dates along the data. Once the customer-debtor has paid his dues, his credit line is renewed by the following month. One, thus, must find four different cash streams with their respective four month margin.

Applying the python pandas library, one can start making sense of the untidy dataset in form of an Microsoft Excel file. One shall begin with getting the number of entitities in the dataset.
```
 pandas as pd
import numpy as np
pd.read_csv(r'C:\Users\Alonso\Documents\given credit 2021.csv')
tot_debtot_debtorsors=credit_dataset["num"].nunique()
6409
```
Another step for outlining the data will be to calculate the total capital at disposal.
```
credit_dataset["credit_amount"].sum()
```
As previously mencioned, the determination of payment punctuality by every date milestone is labeled with «0» and «1». So, by adding each column one can get the distribution of customers in different dates.

AAA_debtors=credit_dataset[«first_date»].sum()

Fortunately, more than fifty percent of debts in the first month are duly paid in the subsequent month
```
credit_dataset_df=pd.DataFrame(credit_dataset)
```
```
AA_debtors=credit_dataset_df["second_date"].sum()
AA_debtors
```
B_debtors=credit_dataset[«third_date»].sum()
B_debtors

Bad Debt estimation

Because of the dual nature of Bad debt, Pivot tables will be used to separate the two types, those who pay once the Collection agency makes contact,»C», and the ones who even after that fall into arrears with «0» in the fourth date.
```
credit_dataset.pivot_table(values="credit_utilized",columns="fourth_date",aggfunc="count")
```
A Markov Chain is a type of stochastic process in which a variable varies across time into a finite set of states; this variation´s core tenet is that future states depend solely on the current state. The input element for the markov chain is the Transition matrix; such array will provide us with the probability to which customers can move from one credit status to another; this movements will be sourced from payment punctuality during the following 4 months.

Using pivot tables one can estimate the punctuality ratio based upon previously calculated states using the following expression arranged in a matrix fashion:

Using the following code:
```
transition_matrix=np.matrix([[credit_dataset["first_date_second_emission"].sum()/AAA_debtors,credit_dataset["second_date_second_emission"].sum()/AAA_debtors,credit_dataset["third_date_second_emission"].sum()/AAA_debtors,credit_dataset["fourth_date_second_emission"].sum()/AAA_debtors,credit_dataset[credit_dataset["fourth_date_second_emission"]==0]["num"].count()/AAA_debtors],[credit_dataset["first_date_third_emission"].sum()/AA_debtors,credit_dataset["second_date_third_emission"].sum()/AA_debtors,credit_dataset["third_date_third_emission"].sum()/AA_debtors,credit_dataset["fourth_date_third_emission"].sum()/AA_debtors,credit_dataset[credit_dataset["fourth_date_third_emission"]==0]["num"].count()/AA_debtors],[credit_dataset["first_date_fourth_emission"].sum()/B_debtors,credit_dataset["second_date_fourth_emission"].sum()/B_debtors,credit_dataset["third_date_fourth_emission"].sum()/B_debtors,credit_dataset["fourth_date_fourth_emission"].sum()/B_debtors,credit_dataset[credit_dataset["fourth_date_fourth_emission"]==0]["num"].count()/B_debtors],[0,0,0,1,0],[0,0,0,0,1]])
```
With the following output:

Debt status AAA AA B C D
AAA 0.64 0.21 0.08 0.04 0.02
AA 0.47 0.26 0.15 0.05 0.06
B 0.48 0.33 0.08 0.05 0.04
C 0 0 0 1 0
D 0 0 0 0 1
Transition matrix or the probability of movement among debt status in a trimester span. Also, stressing the initial model parameters, one can see that C and D are inalterable due to the «bad debt» label.

Relevant Questions

With the prior accoutrements one can begin to make predictions and draw relevant conclusions for the decision takers.

1.What is the probability that an excellent debtor will fall into bad debt after one trimester? By a simple look at the transition matrix, one sees that around 6% of them will fall into arrears.

2.What is the probability that a debtor with initial AA will end up in B status at the end of the year?

According to the succint corollary of the Kolmogorov-Chapman equations, one can impel forward any transition probabilities to future periods by just powering the transition matrix to that required period.
```
transition_matrix**3
```
A fraction over 8% will fall into arrears within 60 and 90 days.

3. What is the estimated total cost of bad debt recollection at the end of
the year considering the 800,00 fare?

All needed to do in this point, is to add all of the fourth dates within
different emissions, which will amount to a total of $ 1.060,000

As a concluding note, one ought to make sense of that last figure in tandem with the cash availability in terms of the amount repaid, amount falling into arrears, and the amount in stock.

Status: Beginning in month 0, A is the amount of money repaid; B is the amount unpaid, and C is the stock capital that has not been lent
Freights in a sea fraught with frightening uncertainty

noviembre 27th, 2022
The following exercise involves information regarding Pacific sea freight operations of a mexican enterprise from the manufacturing sector. Due to confidenciality purposes, the dataset was construed through meager and rudimentary means; it was created by reviewing and manually capturing in a spreadsheet file the physical documentation from every purchase order (OC) carried out from November 2020 to October 2021, such as, invoices, packing lists, and Bills of Landing. Every record is a specific OC marked by its own denomination, Supplier, product name, description, Volume (cbm), total weight, Incoterm, invoice-value, Container number, dates (purchase, export, anchorage and plant arrival), whether its a FCL (20 or 40 footer) or less, amongst others.

The main drive for the building of this database was to keep track of every Asian imports docked in the port of Manzanillo, Mexico, and the subsequent analysis was elicited out of the chagrin stoked by all the pandemic turmoil in FBX01 index (Asia to North American West Coast) and its aftermath. The sudden elasticity of prices bore conspicuously upon the Company´s margins. Borrowing a loose 6 sigma template (Define, Measure, Analyze, Improve and Control), this case study will try to parametrize the stark fare fluctuations and rises through the employment of public data.

Firstly, we will sieve the messy csv-file through the R and pandas python library in a Jupiter environment with the purpose of determining the distribution of the goods, origin and a reasonable grasp for the entire collection.
```
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
po_dataset=pd.read_csv(r'C:\Users\Alonso\Desktop\compendium.csv')
```
#Now that we are able to read the file, we arrange the data by export date in an ascending fashion.
```
po_dataset_bydate=po_dataset[["OC","Fecha de embarque"]]
po_dataset_bydate=po_dataset.sort_values("Fecha de embarque",ascending=True) 
```
# We need to know how much money amounts for the total dataset.
```
total_value=np.sum(po_dataset_bydate["Importe"])
value_tot_fcl=0
```
#A function will be required to read the entire file to divide the cargo between FCL and LCL, then divide its invoice percentagewise. Peering the file one notices that the main discriminator between both categories will be the instantiation of a lower or upper case «x»;therefore, two simple for loops will detect such character and add it to two different lists.
```
total_value=np.sum(po_dataset_bydate["Importe"])
value_tot_fcl=0
x=0
i=0
po_dataset["Descripcion carga"].apply(str)
po_dataset.set_index("Descripcion carga")
cargo_type=[]
fcl_dataframe_list=[]
lcl_dataframe_list=[]
for x in range(len(po_dataset)):
     a=po_dataset["Descripcion carga"][x]
     for i in range(len(a)):
```
```
    if(a[i]=="x" or a[i]=="X"):
        value_tot_fcl=po_dataset["Importe"][x]+value_tot_fcl
        fcl_dataframe_list.append(po_dataset["OC"][x]) 
        break
    else:
        lcl_dataframe_list.append(po_dataset["OC"][x])
fcl_dataframe=po_dataset[po_dataset["OC"].isin(fcl_dataframe_list)]
lcl_dataframe=po_dataset[po_dataset["OC"].isin(fcl_dataframe_list)]
value_tot_lcl=total_value-value_tot_fcl
data_prop={'fcl':[value_tot_fcl/total_value100], 'lcl':[value_tot_lcl/total_value100]}
percentage_dataframe_cargo=pd.DataFrame(data_prop)

full_perc=value_tot_fcl/total_value100 less_perc=value_tot_lcl/total_value100
```
#The output will be the following: 82% of the cargo is containerized amounting for $15,146,273.

#Graphic visualization
```
import matplotlib.pyplot as plt
plot_data=[value_tot_fcl,value_tot_lcl]
etiquettes=["FCL","LCL"]
plt.title("Type of cargo")
plt.pie(plot_data,labels=etiquettes,autopct='%1.1f%%', shadow=True,startangle=90)
plt.axis("equal")
plt.show()
```
#Because of the preponderance of Containerized cargo, all of the LCL goods will be ruled out of the study. Next, we will need to divide the data into a dataframe for all the full container dataframe.
```
def cargo_division(po_dataset):
        
        value_tot_fcl=0 
        total_value=np.sum(po_dataset_bydate["Importe"])
        fcl_dataframe_series=[]
        for x in range(len(po_dataset)):
            a=po_dataset_bydate["Descripcion carga"][x] 
            for i in range(len(a)):
        
                if(a[i]=="x" or a[i]=="X"):
                    value_tot_fcl=po_dataset_bydate["Importe"][x]+value_tot_fcl
                    fcl_dataframe_series.append(po_dataset_bydate["OC"][x]) 
                    break
                
                    
     
        po_dataset_bydate.set_index("OC")         
        fcl_condition=po_dataset_bydate["OC"].isin(fcl_dataframe_series)
        #lcl_condition=po_dataset_bydate["OC"].isin(lcl_dataframe_list)
        fcl_dataframe=po_dataset_bydate[fcl_condition]
        #fcl_dataframe["Fecha de embarque"]=pd.to_datetime(fcl_dataframe["Fecha de embarque"], format="%d/%m/%Y")
        #lcl_dataframe=po_dataset_bydate[lcl_condition]
        
        value_tot_lcl=total_value-value_tot_fcl 
        data_prop={'fcl':[value_tot_fcl/total_value*100],
        'lcl':[value_tot_lcl/total_value*100]}
        fcl_dataframe["Descripcion carga"].apply(str)
        
        
     
        return fcl_dataframe
       
full_container=cargo_division(po_dataset) 
```
#Output of the full container dataframe

#Graph of cargo: There is no much difference between both container types.
```
import matplotlib.pyplot as plt
plot_data=[value_tot_fcl,value_tot_lcl]
etiquettes=["FCL","LCL"]
plt.title("Type of cargo")
plt.pie(plot_data,labels=etiquettes,autopct='%1.1f%%', shadow=True,startangle=90)
plt.axis("equal")
plt.show()
```
Distribution by ports

All of the providers are located either in India or in China. The proceeding layer of analysis will be to review the percentage distribution of invoice-values within the departure ports.

#We create a function to extract all sales done through the FOB regime
```
FOBS=twenty_forty(fcl_dataframe)

def fob_incoterm(FOBS):
    FOB_dataframe=FOBS[(FOBS["Incoterm"]=="FOB")]
    return FOB_dataframe
      
fob_incoterm(FOBS)  
```
#Percentage distribution of the ports
```
fob_ports=fob_incoterm(FOBS)
port_counts=fob_ports["Puerto de Partida"].value_counts(normalize=True)
print(port_counts)
```
From the previous image, one can visualize that four ports account 85% of the total inventory. The next stage is the creation of a dataframe that will serve as a time series with two main parameters: total freight cost, and the number of containers on a monthly-basis.

#Transform fob_dataframe with the following parameters: for the indexation of the data series, one applies the «year-month» series; the count of units chartered per period, the number of single containers required; finally total freight expenditure .
```
fob_dataframe=fob_incoterm(FOBS)
fob_dataframe["year-month"]=fob_dataframe["Fecha de embarque"].dt.to_period('M')

container_counts=fob_dataframe.groupby(["year-month"])["No. De Contenedor"].count()
container_conts=fob_dataframe.groupby(["year-month"])["No. De Contenedor"].nunique()
container_conts_month_dataframe=pd.DataFrame(container_conts)
tot_freight_month=fob_dataframe.groupby(["year-month"])["Importe Flete"].sum()
total_freight_month_dataframe=pd.DataFrame(tot_freight_month)
```
#We write the function into a csv file.
```
print(container_quantity_totfreight_month_dataframe)
container_quantity_totfreight_month_dataframe.to_csv(r'C:\Users\Alonso\Desktop\container_quantity_totfreight_month_dataframe.to_csv')
```
```
import matplotlib.pyplot as plt
average_freight_series_data=container_quantity_totfreight_month_dataframe["container_month_cost_average"]
average_freight_series_data.plot(x="year-month",y="container_cost_average",kind="line",marker="o")
plt.ylabel("United States Dollars")
plt.xlabel("Month")
plt.title("Average freight Evolution")
plt.show()
```
#Freight evolution per port.
```
container_tot_month.to_csv(r'C:\Users\Alonso\Desktop\ports.csv')
ports_plot_dataframe=pd.read_csv((r'C:\Users\Alonso\Desktop\ports.csv'))

ports_plot_dataframe.set_index('year-month',inplace=True)
ports=['Ningbo','SHANGHAI','Qingdao','NHAVASHEVA, India']
ports_plot_dataframe_limited=ports_plot_dataframe[ports_plot_dataframe['Puerto de Partida'].isin(ports)]
ports_plot_dataframe_limited.groupby('Puerto de Partida')['average_cost_freight'].plot(legend=True,subplots=False,marker='o',figsize=(20,20))
```
From the previous plot, one can appreciate how freight prices peaked around April and May 2021, for the particular west indian port of Nhava Sheva, as a probable direct consequence of the Evergiven Incident that took place at the end of March.

Departing from the aforementioned broad and explanatory strokes, one ends up with several misgivings stemming out of the ricocheting prices without prior notice. Retaking the initial proposition, the conundrum will be framed within the Six Sigma methodology; beginning with a problem statement: In less than 10 months, we have seen the doubling of the Pacific rate, which will percolate into the tax and duties paid at mexican territory, and eventually chafe companies margin.

The pretension will be to find a macroeconomic factor that will endow analysts a reasonable forecast for further periods; such factor, will be the issuing of currency. Following a conservative lecture on monetary theory, one learns that an increase in money supply will crank up inflation across all goods and services. Given the North American location of the port of Manzanillo, one can assert that the USA exerts single handedly the most overwhelming influence upon the FBX01 index; thus, one can surmise that its monetary policy will be the most crucial fare driver within the scope of this case-study, also, its a well espoused fact that during the pandemic, the government added 3 trillion dollars into circulation; this influence will be tested in the form of money creation during the data set timeline; this data will be extracted from the fred.stouisefed.org website.

Measure: Linear regression model

For the second stage of the Six Sigma praxis, a linear equation will be created; utilizing the average container fare per month as a response variable and the monetary mass as predictor or independent variable. The model mechanics will be to compare the current average container price with the monetary mass of the previous month. In order to have more manageable information and less difference in orders of magnitude, the monetary mass will be measured in a per-capita basis using a total population of 329500000.

#Data transformation for the model
```
usd_circulation_data=pd.read_csv(r'C:\Users\Alonso\Desktop\usd_incirculation.csv')
from matplotlib.legend_handler import HandlerLine2D
us_population=329500000
usd_circulation_dataframe=pd.DataFrame(usd_circulation_data)
usd_circulation_dataframe["monetary_mass_per_citizen"]=usd_circulation_dataframe["currency_emission"]/us_population
usd_per_capita=usd_circulation_data.monetary_mass_per_citizen
month=usd_circulation_dataframe.month
print(usd_circulation_dataframe[["month","currency_emission","monetary_mass_per_citizen"]])
monetary_mass_plot=usd_circulation_dataframe.plot(x="month",y="monetary_mass_per_citizen",kind="line",marker="o")
plt.ylabel("USD")
plt.xlabel("month")
plt.title("Monetary mass per person")
plt.legend(handler_map={monetary_mass_plot:HandlerLine2D(numpoints=13)})
plt.show()
```
#Visual correlation of the two variables
```
freight_predictor=np.array([average_freight_series_data])
independent_variable=np.array([usd_circulation_dataframe["monetary_mass_per_citizen"][0:12]])
plt.scatter(independent_variable,freight_predictor)
plt.ylabel("Price per container(thousand dollars)")
plt.xlabel("money supply per capita")
plt.title("Scatter Plot:Container-Freight price/monetary mass supply")
plt.show()
```
Model in Rstudio
```
x_data<-x_variables$monetary_mass_per_citizen
y_data<-y_variables$container_month_cost_average
```
```
linearmodel<-lm(formula=y_data~x_data)
print(linearmodel)
```
With the previous p-value, we can reject the Null hypothesis and conclude that the printing of money does have an impact upon freight prices.

Analyze

In this part, the model´s statistical rigour and validity will be put to a test by evaluating the output parameters. Initially, we have the Adjusted R-squared value: .9008, this tells us that the monetary input per capita explains 90% of the freight fare variation, so we can conclude that the model is quite optimal.

Testing the assumptions of the model :

Normality of the residuales

From the previous mean of the residuals and histrogram, one can concludes that it follows a normal distribution, as most values cluster around the mean.

Homocestaticity

At this point, one tests whether the residuals and the fitted values have the same variance.
```
var.test(linearmodel$residuals,linearmodel$fitted.values)
```
Given this last output, one can conclude that the model is not rigorous enough by having a confidence interval different than one.

Improve

Proceeding with the monetary reading referred to earlier, after a period of bustling consumption like the experienced during the en masse printing of money of the last months ; subsequently applied policy is the escalation of interest rates in order to incentivize savings. The inextribable links between United States and Mexico alluded in the FBX01 index are also shared with the interest rate positions. On this improve section, the purported goal ought to be the acquisition of a treasury bond product that will have provided the company with a hedging slack against stark price shocks. To assess the effectivity of this stratagem, two vectors will be created; the first one involving the total expenditure per month incurred; second the interest rates for every month in the previous vector (researched from the internet). All of this expenditures will be taken to a October 2020 in a present value basis, which is exemplified by the following image.

Create an R vector utilizing the csv file, «container_quantity_totfreight_month»
```
vector_container_cost<-c(container_quantity_totfreight_month_dataframe$Importe.Flete)
print(vector_container_cost)
```
Create discount factor:
```
i<-c(0,.0448,.0448,.0452,.0428,.0428,.0428,.0452,.0451,.0475,.0475,.04997)

discount_vector<-(1+i)^-(1:length(i)-1)
discount_vector
```
Define Present value for October 2020
```
PV_oct_20<-sum(discount_vector*vector_container_cost)
PV_oct_20
```
We get a total logistical cost at an October 2020 value of 1664775.00 USD.

Financial Product

As stated in the Improvement core, one intends to buy treasury that could be monthly compounded from October 2021 to October 2022. The amount invested should be defined from the results of the analysis. As an initial step, one gets the total amount of container employed in that year.
```
sum(container_quantity_totfreight_month_dataframe$No..De.Contenedor)
```
The October present value will be divided by this last factor as a straightforward container average value, which results in 7465.35 USD

By investing the equivalent of 4 containers every month and allow it to compound with the interest rate vector, just as the cash flow diagram above.
```
future_value_oct_22<-function(amount,i){
  p_0<-0
  for(i in (i)){
    payment_period<-p_0+amount
    value_investment<-payment_period*((1+i)^1)
    p_0<-value_investment
  
    
  }
  return(value_investment)
}
```
Finally, this last value should be discounted to the October 2020 base line; once that both parameters have the same time frame, we can determine the percentage of effectivenes reached with this simple hedging mechanic.
```
present_value_saved_oct_2020<-function(future_value_from_product,i_1,i_2){
  total_interest_vector<-append(rev(i_1),rev(i_2),after=length(i_1))
  fv<-future_value_from_product
  for(i in (total_interest_vector)){
    p_value_oct_20<-(fv)*(1+i)^-1
    fv<-p_value_oct_20
    
  }
  return(p_value_oct_20)
}

present_value_saved_oct_2020(future_value_oct_22(7465.359*4,i_2021),i_2021,i_2020)
```
From an overall logistical cost of $ 1664775, we were able to hedge 9.23% applying this treasure bond policy within a single year.

Conclusion

Radical and unpredictable freight ascensions are ineluctable, as their impact upon utilitarian indicators; yet one must strive to glean useful tools that will grant analysts with windows for visualizing these menacing tides. Albeit the statistical invalidity of the model, the macroeconomic variable of choice, «monetary mass per capita», does have a strong positive correlation with our objective variable. For the final Control stage, one sugests a periodical review of the currency emission, which will provide useful foresight and its consequential effect on interest rates will parsimoniously muffle these margin strikes for the immediate year.