<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
 xmlns:content="http://purl.org/rss/1.0/modules/content/"
 xmlns:wfw="http://wellformedweb.org/CommentAPI/"
 xmlns:dc="http://purl.org/dc/elements/1.1/"
 xmlns:atom="http://www.w3.org/2005/Atom"
 xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
 xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
 xmlns:georss="http://www.georss.org/georss"
 xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
 xmlns:media="http://search.yahoo.com/mrss/">
<channel>
<atom:link href="https://indymnv.xyz/posts/tech/rss.xml" rel="self" type="application/rss+xml" />
<title>Indy Navarro post about tech</title>
<link>https://indymnv.xyz/</link>
<description><![CDATA[Technical writing]]></description>
<language>en</language>
<lastBuildDate>Mon, 01 Jun 2026 01:23:01 +0900</lastBuildDate>
<generator>Emacs 30.1 org-publish-rss.el 0.8</generator>
<item>
<title>Using evotrees.jl for time series prediction</title>
<link>https://indymnv.xyz/posts/tech/20230302-evotrees-time-series-julia.html</link>
<pubDate>Thu, 02 Mar 2023 00:00:00 +0900</pubDate>
<guid>https://indymnv.xyz/posts/tech/20230302-evotrees-time-series-julia.html</guid>
<description>
<![CDATA[<div id="outline-container-introduction" class="outline-2">
<h2 id="introduction"><span class="section-number-2">1.</span> Introduction</h2>
<div class="outline-text-2" id="text-introduction">
<p>
In this post, I want to show an analysis of a time series that I've been
working on. Usually, when dealing with time series, it is not so common
to use machine learning algorithms (without at least trying more
traditional models like the ARIMA family), but I still wanted to test
how well a GBM model fits for these kinds of problems that are so
popular.
</p>

<blockquote>
<p>
<b><i>NOTE:</i></b> I don't recommend starting with models of this type for time
series problems. There are simpler models to understand that are less
computationally expensive.
</p>
</blockquote>
</div>
</div>
<div id="outline-container-dataset-preparation" class="outline-2">
<h2 id="dataset-preparation"><span class="section-number-2">2.</span> Dataset Preparation</h2>
<div class="outline-text-2" id="text-dataset-preparation">
</div>
<div id="outline-container-data-extraction" class="outline-3">
<h3 id="data-extraction"><span class="section-number-3">2.1.</span> Data Extraction</h3>
<div class="outline-text-3" id="text-data-extraction">
<p>
You can find the repository
<a href="https://github.com/indymnv/Household-Electric-Power-Consumption">here</a>,
The codes you will see here, I prototyped in <code>notebooks/tutorial.jl</code>.
</p>

<p>
Now we start by making the corresponding imports.
</p>

<div class="org-src-container">
<pre class="src src-julia">using DataFrames
using Plots
using MLJ
using EvoTrees
using UrlDownload
using ZipFile
using HTTP
using CSV
using Dates
using Statistics
using MLJClusteringInterface
using Clustering
using FreqTables
using StatsPlots
using RollingFunctions
using StatsBase
using ShiftedArrays
</pre>
</div>

<p>
There are several libraries in this section, and I must admit it took me
some time to use each one. But anyway to start reading the dataframe, we
can get it directly from its repository.
</p>

<div class="org-src-container">
<pre class="src src-julia">data_url = "https://archive.ics.uci.edu/ml/machine-learning-databases/00235/household_power_consumption.zip"
f = download(data_url)
z = ZipFile.Reader(f)
z_by_filename = Dict( f.name =&gt; f for f in z.files)
data = CSV.read(z_by_filename["household_power_consumption.txt"], DataFrame,)
</pre>
</div>

<p>
The dataframe looks more or less like this:
</p>

<div class="org-src-container">
<pre class="src src-julia">     Row │ Date        Time      Global_active_power  Global_reactive_power  Voltage  Global_intensity  Sub_metering_1  Sub_metering_2  Sub_metering_3
         │ String15    Time      String7              String7                String7  String7           String7         String7         Float64?
─────────┼─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
       1 │ 16/12/2006  17:24:00  4.216                0.418                  234.840  18.400            0.000           1.000                     17.0
       2 │ 16/12/2006  17:25:00  5.360                0.436                  233.630  23.000            0.000           1.000                     16.0
       3 │ 16/12/2006  17:26:00  5.374                0.498                  233.290  23.000            0.000           2.000                     17.0
       4 │ 16/12/2006  17:27:00  5.388                0.502                  233.740  23.000            0.000           1.000                     17.0
       5 │ 16/12/2006  17:28:00  3.666                0.528                  235.680  15.800            0.000           1.000                     17.0
       6 │ 16/12/2006  17:29:00  3.520                0.522                  235.020  15.000            0.000           2.000                     17.0
       7 │ 16/12/2006  17:30:00  3.702                0.520                  235.090  15.800            0.000           1.000                     17.0
       8 │ 16/12/2006  17:31:00  3.700                0.520                  235.220  15.800            0.000           1.000                     17.0
       9 │ 16/12/2006  17:32:00  3.668                0.510                  233.990  15.800            0.000           1.000                     17.0
      10 │ 16/12/2006  17:33:00  3.662                0.510                  233.860  15.800            0.000           2.000                     16.0
      11 │ 16/12/2006  17:34:00  4.448                0.498                  232.860  19.600            0.000           1.000                     17.0
      12 │ 16/12/2006  17:35:00  5.412                0.470                  232.780  23.200            0.000           1.000                     17.0
      13 │ 16/12/2006  17:36:00  5.224                0.478                  232.990  22.400            0.000           1.000                     16.0
      14 │ 16/12/2006  17:37:00  5.268                0.398                  232.910  22.600            0.000           2.000                     17.0
    ⋮    │     ⋮          ⋮               ⋮                     ⋮               ⋮            ⋮                ⋮               ⋮               ⋮
 2075246 │ 26/11/2010  20:49:00  0.948                0.000                  238.160  4.000             0.000           1.000                      0.0
 2075247 │ 26/11/2010  20:50:00  1.198                0.128                  238.110  5.000             0.000           1.000                      0.0
 2075248 │ 26/11/2010  20:51:00  1.024                0.106                  238.840  4.200             0.000           1.000                      0.0
 2075249 │ 26/11/2010  20:52:00  0.946                0.000                  239.050  4.000             0.000           0.000                      0.0
 2075250 │ 26/11/2010  20:53:00  0.944                0.000                  238.720  4.000             0.000           0.000                      0.0
 2075251 │ 26/11/2010  20:54:00  0.946                0.000                  239.310  4.000             0.000           0.000                      0.0
 2075252 │ 26/11/2010  20:55:00  0.946                0.000                  239.740  4.000             0.000           0.000                      0.0
 2075253 │ 26/11/2010  20:56:00  0.942                0.000                  239.410  4.000             0.000           0.000                      0.0
 2075254 │ 26/11/2010  20:57:00  0.946                0.000                  240.330  4.000             0.000           0.000                      0.0
 2075255 │ 26/11/2010  20:58:00  0.946                0.000                  240.430  4.000             0.000           0.000                      0.0
 2075256 │ 26/11/2010  20:59:00  0.944                0.000                  240.000  4.000             0.000           0.000                      0.0
 2075257 │ 26/11/2010  21:00:00  0.938                0.000                  239.820  3.800             0.000           0.000                      0.0
 2075258 │ 26/11/2010  21:01:00  0.934                0.000                  239.700  3.800             0.000           0.000                      0.0
 2075259 │ 26/11/2010  21:02:00  0.932                0.000                  239.550  3.800             0.000           0.000                      0.0
                                                                                                                                   2075231 rows omitted
</pre>
</div>
</div>
</div>
<div id="outline-container-dataset-cleaning" class="outline-3">
<h3 id="dataset-cleaning"><span class="section-number-3">2.2.</span> Dataset Cleaning</h3>
<div class="outline-text-3" id="text-dataset-cleaning">
<p>
As can be seen, it is a quite large dataset and we can take the
opportunity to create new variables, so we have the possibility to
obtain relevant information.
</p>

<div class="org-src-container">
<pre class="src src-julia">#Create a variable 
date_time = [DateTime(d, t) for (d,t) in zip(data[!,1], data[!,2])]

data[!,:date_time] = date_time

#Create variable for date
data[!,:year] = Dates.value.(Year.(data[!,1]))
data[!,:month] = Dates.value.(Month.(data[!,1]))
data[!,:day] = Dates.value.(Day.(data[!,1]))

#Create variable for time
data[!, :hour] = Dates.value.(Hour.(data[!,2]))
data[!, :minute] = Dates.value.(Minute.(data[!,2]))

#Create variable for weekends
data[!, :dayofweek] = [dayofweek(date) for date in data.Date]
data[!, :weekend] = [day in [6, 7] for day in data.dayofweek]
</pre>
</div>

<p>
In addition, we notice that the variables are in String format. We can
make some changes to put them in the appropriate form.
</p>

<div class="org-src-container">
<pre class="src src-julia">for i in 3:8
    data[!,i] = parse.(Float64, data[!,i])
end
data[!,1] = replace.(data[!,1], "/" =&gt; "-")
data[!,1] = Date.(data[!,1], "d-m-y")
</pre>
</div>
</div>
</div>
</div>
<div id="outline-container-preliminary-visualizations" class="outline-2">
<h2 id="preliminary-visualizations"><span class="section-number-2">3.</span> Preliminary Visualizations</h2>
<div class="outline-text-2" id="text-preliminary-visualizations">
<p>
A classic way to plot all the variables is with the following code:
</p>

<div class="org-src-container">
<pre class="src src-julia">plot([plot(data[1:50000, :date_time],data[1:50000,col]; label = col, xrot=30) for col in ["Global_active_power",  "Global_reactive_power", "Global_intensity", "Voltage", "Sub_metering_1",  "Sub_metering_2", "Sub_metering_3"]]...)
</pre>
</div>


<div id="orge056ff9" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///003_evotrees/lineplot_all.png" alt="lineplot_all.png" />
</p>
<p><span class="figure-number">Figure 1: </span>Line Plot All</p>
</div>

<p>
Note that we only take a sample of 50,000 data points to avoid
overloading the graphs with information, and in the same way, we can
create histograms.
</p>

<div class="org-src-container">
<pre class="src src-julia">plot([histogram(data[1:50000, col],label = col, bins = 20 ) for col in ["Global_active_power",  "Global_reactive_power", "Global_intensity", "Voltage", "Sub_metering_1",  "Sub_metering_2", "Sub_metering_3"]]...)
</pre>
</div>


<div id="org447ee8c" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///003_evotrees/all_hist.svg" alt="all_hist.svg" class="org-svg" />
</p>
<p><span class="figure-number">Figure 2: </span>Line Plot All</p>
</div>

<p>
For now, we can recognize that the time series in its global variables
have a white noise behavior, and Voltage also has it, however, it is the
only one that seems to have a distribution that is similar to a normal
distribution, while the sub-metering, are signs of use of household
appliances.
</p>
</div>
</div>
<div id="outline-container-a-brief-clustering-with-kmeans" class="outline-2">
<h2 id="a-brief-clustering-with-kmeans"><span class="section-number-2">4.</span> A brief clustering with kmeans</h2>
<div class="outline-text-2" id="text-a-brief-clustering-with-kmeans">
<p>
In this section, we are interested in building a clustering model on the
time series. The purpose? It is simply a way of evaluating behavior
patterns over time, one hypothesis would be to see irregular behavior
patterns over time, given that greater consumption would be seen at
specific periods of the day or season.
</p>

<p>
An interesting issue that I was unaware of was that time series
clustering is possible and you can use k-means, however in these cases,
they cannot be treated from the same perspective, and other types of
variants of these algorithms should be used to consider the temporality
of neighboring observations when clustering. But since this project is
just a toy, and the use of this technique is only for EDA, we will stick
with the classical algorithm.
</p>

<p>
If you want to know more about this topic, yu can read this
<a href="https://towardsdatascience.com/time-series-clustering-deriving-trends-and-archetypes-from-sequential-data-bb87783312b4">articule</a>
</p>

<p>
Continuing with the problem, we can cluster by applying the following
code.
</p>

<div class="org-src-container">
<pre class="src src-julia">X = data[!, 3:9]
transformer_instance = Standardizer()
transformer_model = machine(transformer_instance, X)
fit!(transformer_model)
X = MLJ.transform(transformer_model, X);
KMeans= @load KMeans pkg=Clustering
kmeans = KMeans(k=3)

mach = machine(kmeans, X) |&gt; fit!

# cluster X into 3 clusters using K-means
Xsmall = MLJ.transform(mach);
selectrows(Xsmall, 1:4) |&gt; pretty
yhat = MLJ.predict(mach)
data[!,:cluster] = yhat
</pre>
</div>

<p>
In this case, we have 3 clusters that are ordered as follows.
</p>

<pre class="example" id="orgdd0eba5">
cluster nrow
CategoricalValue    Int64
1   1   741077
2   2   1257309
3   3   50894
</pre>

<p>
And if we try to plot the clusters, we would have the following.
</p>

<div class="org-src-container">
<pre class="src src-julia">plot([scatter(data[1:20000, :date_time],data[1:20000,col]; group=data[1:20000,:].cluster, size=(1200, 1000), title = col, xrot=30) for col in ["Global_active_power",  "Global_reactive_power", "Global_intensity", "Voltage", "Sub_metering_1",  "Sub_metering_2", "Sub_metering_3"]]...)
</pre>
</div>


<div id="orgc4de59a" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///003_evotrees/cluster2.png" alt="cluster2.png" />
</p>
<p><span class="figure-number">Figure 3: </span>scatter plot cluster</p>
</div>

<p>
It looks a bit confusing, although if we look at the voltage variable,
we can already size up a certain trend. For now, let's consider a
boxplot of the main variables but considering the clusters.
</p>

<div class="org-src-container">
<pre class="src src-julia">b1 =@df data boxplot(string.(:cluster), :Global_active_power, fillalpha=0.75, linewidth=2, title ="Global active power")
b2 =@df data boxplot(string.(:cluster), :Global_reactive_power, fillalpha=0.75, linewidth=2, title = "Global reactive power")
b3 = @df data boxplot(string.(:cluster), :Global_intensity, fillalpha=0.75, linewidth=2, title ="Global intensity")
b4 = @df data boxplot(string.(:cluster), :Voltage, fillalpha=0.75, linewidth=2, title = "Voltage")


plot(b1, b2, b3, b4 ,layout=(2,2), legend=false)
</pre>
</div>


<div id="orgf4722f3" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///003_evotrees/boxplotcluster.png" alt="boxplotcluster.png" />
</p>
<p><span class="figure-number">Figure 4: </span>scatter plot cluster</p>
</div>

<p>
The truth is that we notice slight differences between the clusters,
where we have certain consumption patterns in each category, but in some
of their variables these do not necessarily lead us to any conclusion.
However, as we had mentioned at the beginning, the idea of ​​clustering
was to study consumption patterns during time intervals, so we add the
following.
</p>

<div class="org-src-container">
<pre class="src src-julia">h1 =heatmap(freqtable(data,:cluster,:dayofweek)./freqtable(data,:cluster), title = "day of week")
h2 =heatmap(freqtable(data,:cluster,:hour)./freqtable(data,:cluster), title = "hour")
h3 = heatmap(freqtable(data,:cluster,:month)./freqtable(data,:cluster), title = "month")
h4 = heatmap(freqtable(data,:cluster,:day)./freqtable(data,:cluster), title = "day")

plot(h1, h2, h3, h4 ,layout=(2,2), legend=false)
</pre>
</div>


<div id="org91aa2d3" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///003_evotrees/diagram_time.svg" alt="diagram_time.svg" class="org-svg" />
</p>
<p><span class="figure-number">Figure 5: </span>scatter plot cluster</p>
</div>

<p>
It might be a bit confusing initially, but let me take an example that
might help you understand. If you take into account cluster 2, it
corresponds to the lowest use of the global intensity used. If we go to
the heatmap that represents the hours, we will see that the time where
this pattern of behavior is most present is at night, which corresponds
to the hours we are usually sleeping. I hope this make more sense.
</p>

<p>
This might give us a slight hint that time frames might be necessary,
we'll take this information for featuer engineering at this point later.
Now let's start with the next phase.
</p>
</div>
</div>
<div id="outline-container-using-evotrees-for-prediction." class="outline-2">
<h2 id="using-evotrees-for-prediction."><span class="section-number-2">5.</span> Using EvoTrees for prediction.</h2>
<div class="outline-text-2" id="text-using-evotrees-for-prediction.">
<p>
For now, we want to predict voltage. I'm not an expert in the field of
electricity and consumption, but for a simple exercise, we will use the
MLJ library (for Python users it would be equivalent to Scikit-Learn).
Due to the amount of data and the algorithm we are going to use, it is
not practical to perform training with cross-validation, this will take
too much time, so we will prefer to only use a train/test split as a
strategy.
</p>

<p>
let's generate a lag and cut the data in the following way:
</p>

<div class="org-src-container">
<pre class="src src-julia">data[!, :lag_30] = Array(ShiftedArray(data.Voltage, 30))
replace!(data.lag_30, missing =&gt; 0);
</pre>
</div>

<p>
And to assign the training and testing, we use the following.
</p>

<div class="org-src-container">
<pre class="src src-julia">train = copy(filter(x -&gt; x.Date &lt; Date(2010,10,01), data))
test = copy(filter(x -&gt; x.Date &gt;= Date(2010,10,01), data))
</pre>
</div>

<p>
Then, we remove some variables that we won't use to train the model, and
we save our voltage variable.
</p>

<div class="org-src-container">
<pre class="src src-julia">select!(train, Not([:Date, :Time, :date_time, :cluster, ]))
select!(test, Not([:Date, :Time, :date_time, :cluster, ]))
y_train = copy(train[!,:Voltage])
y_test = copy(test[!,:Voltage])
</pre>
</div>

<p>
Now we are going to apply a cyclical encoder to be able to work with the
data, we have several new variables related to time (month, day, hour,
among others), and all these variables will be more helpful if we allow
extracting their cyclical character, that is why we use a trigonometric
transformation
</p>

<div class="org-src-container">
<pre class="src src-julia">function cyclical_encoder(df::DataFrame, columns::Union{Array, Symbol}, max_val::Union{Array, Int} )
    for (column, max) in zip(columns, max_val)

        df[:, Symbol(string(column) * "_sin")] = sin.(2*pi*df[:, column]/max)
        df[:, Symbol(string(column) * "_cos")] = cos.(2*pi*df[:, column]/max)
    end
    return df
end
</pre>
</div>

<p>
Finally, we can apply this new function to our dataset.
</p>

<div class="org-src-container">
<pre class="src src-julia">columns_selected = [:day, :year, :month, :hour, :minute, :dayofweek]
max_val = [31, 2010, 12, 23, 59, 7]
train_cyclical = cyclical_encoder(train, columns_selected, max_val)
test_cyclical = cyclical_encoder(test, columns_selected, max_val)
</pre>
</div>

<p>
And finally, let's train the model.
</p>

<div class="org-src-container">
<pre class="src src-julia">EvoTreeRegressor = @load EvoTreeRegressor pkg=EvoTrees verbosity=0
etr_start = EvoTreeRegressor(max_depth =15)

machreg = machine(etr_start, train_cyclical[!,14:end], y_train);
fit!(machreg);


pred_etr_train = MLJ.predict(machreg, train_cyclical[!,14:end]);
rms_score_train = rms(pred_etr_train, y_train)
println("The rms in train is $rms_score_train")

pred_etr = MLJ.predict(machreg, test_cyclical[!,14:end]);
rms_score = rms(pred_etr, y_test)
println("The rms in test is $rms_score")
</pre>
</div>

<p>
This is our result: * The rms in train is 2.5364451392238085 * The rms
in test is 3.438565163838837
</p>

<p>
In this section, we plot the residual left by our model, and here we can
detect some signs of overfitting, considering that our model has a much
better score in the training dataset than in the test dataset. On the
other hand, the plots are showing us that our model has biases in its
predictions, it is not being able to recognize trends.
</p>


<div id="org5b053cb" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///003_evotrees/predictions.svg" alt="predictions.svg" class="org-svg" />
</p>
<p><span class="figure-number">Figure 6: </span>prediction</p>
</div>

<p>
Finally, we can see how the predictions compare to the test data.
</p>


<div id="orgf0d961b" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///003_evotrees/pred_vs_real.png" alt="pred_vs_real.png" />
</p>
<p><span class="figure-number">Figure 7: </span>pred-vs-real</p>
</div>

<p>
As we have confirmed earlier, the prediction does not seem to have been
able to determine the magnitudes of the voltage in the testing of the
dataset. Despite the fact that our variable is fairly stable over time,
the model was trained with different parameters, but ultimately none of
the options managed to show a significant improvement.
</p>
</div>
</div>
<div id="outline-container-conclusions" class="outline-2">
<h2 id="conclusions"><span class="section-number-2">6.</span> Conclusions</h2>
<div class="outline-text-2" id="text-conclusions">
<p>
With this small exercise, we only tried to test that GBM, while a
powerful tool and popular in places like Kaggle, requires a certain
level of expertise both in the model and in the use case to achieve good
performance. A naive approach may not generate results that satisfy the
users. This, on one hand, requires:
</p>

<ol class="org-ol">
<li>Understanding how to perform feature engineering for a time series,
such as obtaining the decomposition of the time series. This can help
capture trends that cannot always be obtained solely with the time
horizon.</li>
<li>Applying smoothing strategies like moving averages could help
recognize the underlying pattern, but then you will need to estimate
that moving average into the future.</li>
</ol>

<p>
Overall, time series analysis requires a deep understanding of the data,
proper preprocessing techniques, feature engineering, and selecting
appropriate models that can capture the specific patterns and dynamics
of the data.
</p>
</div>
</div>
]]>
</description></item>
<item>
<title>World Happiness Report - EDA and clustering with Julia</title>
<link>https://indymnv.xyz/posts/tech/20221123-world-happiness-report.html</link>
<pubDate>Wed, 23 Nov 2022 00:00:00 +0900</pubDate>
<guid>https://indymnv.xyz/posts/tech/20221123-world-happiness-report.html</guid>
<description>
<![CDATA[<p>
The purpose of this post is to show Julia as a language for data
analysis and Machine Learning. Sadly Kaggle does not support Julia
Kernels (hopefully, they will add it in the future). Therefore I wanted
to take advantage of this space to show a reimplementation of Python/R
Notebooks to Julia. In this context, I took data on happiness in
countries in 2021 and some factors considered in this exciting survey.
</p>

<ul class="org-ul">
<li>You can get the dataset in
<a href="https://www.kaggle.com/datasets/ajaypalsinghlo/world-happiness-report-2021?datasetId=1222432&amp;sortBy=voteCount">Kaggle</a></li>
<li>The full code is in my
<a href="https://github.com/indymnv/happiness-score-with-julia">Github</a></li>
</ul>
<div id="outline-container-packages-used" class="outline-2">
<h2 id="packages-used"><span class="section-number-2">1.</span> Packages used</h2>
<div class="outline-text-2" id="text-packages-used">
<p>
I'm using Julia version <code>1.8.0</code> in this project, and the library
versions are in the Project.toml, there are some installed that I didn't
end up using for this analysis, but these are the important ones
</p>

<div class="org-src-container">
<pre class="src src-julia">using DataFrames
using DataFramesMeta
using CSV
using Plots
using StatsPlots
using Statistics
using HypothesisTests
Plots.theme(:ggplot2)
</pre>
</div>

<p>
Let's start reading the file.
</p>

<div class="org-src-container">
<pre class="src src-julia">df_2021 = DataFrame(CSV.File("./data/2021.csv", normalizenames=true))
</pre>
</div>

<p>
You can see the dataset in the REPL.
</p>

<div class="org-src-container">
<pre class="src src-julia">julia&gt; df_2021 = DataFrame(CSV.File("./data/2021.csv", normalizenames=true))
149×20 DataFrame
 Row │ Country_name    Regional_indicator            Ladder_score  Standard_error_of_ladder_score  upperwhi ⋯
     │ String31        String                        Float64       Float64                         Float64  ⋯
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ Finland         Western Europe                       7.842                           0.032         7 ⋯
   2 │ Denmark         Western Europe                       7.62                            0.035         7
   3 │ Switzerland     Western Europe                       7.571                           0.036         7
   4 │ Iceland         Western Europe                       7.554                           0.059         7
   5 │ Netherlands     Western Europe                       7.464                           0.027         7 ⋯
   6 │ Norway          Western Europe                       7.392                           0.035         7
   7 │ Sweden          Western Europe                       7.363                           0.036         7
   8 │ Luxembourg      Western Europe                       7.324                           0.037         7
   9 │ New Zealand     North America and ANZ                7.277                           0.04          7 ⋯
  10 │ Austria         Western Europe                       7.268                           0.036         7
  11 │ Australia       North America and ANZ                7.183                           0.041         7
  12 │ Israel          Middle East and North Africa         7.157                           0.034         7
  13 │ Germany         Western Europe                       7.155                           0.04          7 ⋯
  14 │ Canada          North America and ANZ                7.103                           0.042         7
  ⋮  │       ⋮                      ⋮                     ⋮                      ⋮                      ⋮   ⋱
 136 │ Togo            Sub-Saharan Africa                   4.107                           0.077         4
 137 │ Zambia          Sub-Saharan Africa                   4.073                           0.069         4
 138 │ Sierra Leone    Sub-Saharan Africa                   3.849                           0.077         4 ⋯
 139 │ India           South Asia                           3.819                           0.026         3
 140 │ Burundi         Sub-Saharan Africa                   3.775                           0.107         3
 141 │ Yemen           Middle East and North Africa         3.658                           0.07          3
 142 │ Tanzania        Sub-Saharan Africa                   3.623                           0.071         3 ⋯
 143 │ Haiti           Latin America and Caribbean          3.615                           0.173         3
 144 │ Malawi          Sub-Saharan Africa                   3.6                             0.092         3
 145 │ Lesotho         Sub-Saharan Africa                   3.512                           0.12          3
 146 │ Botswana        Sub-Saharan Africa                   3.467                           0.074         3 ⋯
 147 │ Rwanda          Sub-Saharan Africa                   3.415                           0.068         3
 148 │ Zimbabwe        Sub-Saharan Africa                   3.145                           0.058         3
 149 │ Afghanistan     South Asia                           2.523                           0.038         2
</pre>
</div>

<p>
To see the columns name, simply use
</p>

<div class="org-src-container">
<pre class="src src-julia">names(df_2021)
</pre>
</div>

<p>
getting a vector with all column names
</p>

<div class="org-src-container">
<pre class="src src-julia">julia&gt; names(df_2021)
20-element Vector{String}:
 "Country_name"
 "Regional_indicator"
 "Ladder_score"
 "Standard_error_of_ladder_score"
 "upperwhisker"
 "lowerwhisker"
 "Logged_GDP_per_capita"
 "Social_support"
 "Healthy_life_expectancy"
 "Freedom_to_make_life_choices"
 "Generosity"
 "Perceptions_of_corruption"
 "Ladder_score_in_Dystopia"
 "Explained_by_Log_GDP_per_capita"
 "Explained_by_Social_support"
 "Explained_by_Healthy_life_expectancy"
 "Explained_by_Freedom_to_make_life_choices"
 "Explained_by_Generosity"
 "Explained_by_Perceptions_of_corruption"
 "Dystopia_residual"
</pre>
</div>

<p>
The features of this dataset are as follow:
</p>

<ul class="org-ul">
<li>Country<sub>name</sub>: Name of the country</li>
<li>Regional<sub>indicator</sub>: The region to which the country belongs.</li>
<li>Ladder<sub>score</sub>: The English wording of the question is "Please imagine a
ladder, with steps numbered from 0 at the bottom to 10 at the top. The
top of the ladder represents the best possible life for you and the
bottom of the ladder represents the worst possible life for you. On
which step of the ladder would you say you personally feel you stand
at this time?", this metric represent the average of this response by
country</li>
<li>Standard<sub>error</sub><sub>of</sub><sub>ladder</sub><sub>score</sub>: This metric represent the standard
error of the Ladder<sub>score</sub> metric.</li>
<li>upperwhisker: Refers to the upper part of a box plot of the
Ladder<sub>score</sub> metric</li>
<li>lowerwhisker Refers to the lower part of a box plot of the
Ladder<sub>score</sub> metric</li>
<li>Logged<sub>GDP</sub><sub>per</sub><sub>capita</sub>: GDP per Capita Registered to the date</li>
<li>Social<sub>support</sub>: Average of the question based on: 'If you were in
trouble, do you have relatives or friends you can count on to help you
whenever you need them, or not?' The response is 0 for 'no' and 1 for
'yes'.</li>
<li>Healthy<sub>life</sub><sub>expectancy</sub>: Average lifespan by country, information
extracted from the World Health Organization's (WHO) Global Health
Observatory data repository.</li>
<li>Freedom<sub>to</sub><sub>make</sub><sub>life</sub><sub>choices</sub>: National average of responses to the GWP
question "Are you satisfied or dissatisfied with your freedom to
choose what you do with your life?"</li>
<li>Generosity: Is the residual of regressing national average of response
to the GWP question "Have you donated money to a charity in the past
month?" on GDP per capita.</li>
<li>Perceptions<sub>of</sub><sub>corruption</sub>: "Is corruption widespread throughout the
government or not" and "Is corruption widespread within businesses or
not?" The overall perception is just the average of the two 0-or-1
responses.</li>
<li>The variable 'Dystopia' and the explained variables that come from a
built-in regression model are not taken into consideration for this
project.</li>
</ul>

<p>
To see what is a regional indicator, we can see how every country is
grouped.
</p>

<div class="org-src-container">
<pre class="src src-julia">julia&gt; unique(df_2021.Regional_indicator)
10-element Vector{String}:
 "Western Europe"
 "North America and ANZ"
 "Middle East and North Africa"
 "Latin America and Caribbean"
 "Central and Eastern Europe"
 "East Asia"
 "Southeast Asia"
 "Commonwealth of Independent States"
 "Sub-Saharan Africa"
 "South Asia"
</pre>
</div>

<p>
Let's do a simple operation with the dataframe getting the number of
countries by regional indicator and sorting those
</p>

<div class="org-src-container">
<pre class="src src-julia">sort(
    combine(groupby(df_2021, :Regional_indicator), nrow), 
    :nrow
)
</pre>
</div>

<p>
Getting this output
</p>

<div class="org-src-container">
<pre class="src src-julia">julia&gt; sort(
           combine(groupby(df_2021, :Regional_indicator), nrow),
           :nrow
       )
10×2 DataFrame
 Row │ Regional_indicator                 nrow
     │ String                             Int64
─────┼──────────────────────────────────────────
   1 │ North America and ANZ                  4
   2 │ East Asia                              6
   3 │ South Asia                             7
   4 │ Southeast Asia                         9
   5 │ Commonwealth of Independent Stat…     12
   6 │ Middle East and North Africa          17
   7 │ Central and Eastern Europe            17
   8 │ Latin America and Caribbean           20
   9 │ Western Europe                        21
  10 │ Sub-Saharan Africa                    36
</pre>
</div>

<p>
With this, we can see a more significant number of countries in
Sub-Saharan Africa and only a smaller group of countries in North
America and ANZ.
</p>

<p>
Now, let's try to slice our data. We will create a data frame called
<code>float_df</code> that contains only the <code>Float64</code> variables but excludes the
"explained_" variables. This new dataframe will help us with some
operations later.
</p>

<div class="org-src-container">
<pre class="src src-julia">#Get all columns Float64
float_df = select(df_2021, findall(col -&gt; eltype(col) &lt;: Float64, eachcol(df_2021)))

#Take away the Explained variables
float_df = float_df[:,Not(names(select(float_df, r"Explained")))] 
</pre>
</div>

<p>
Let's make our first plot.
</p>

<div class="org-src-container">
<pre class="src src-julia">scatter(
    df_2021.Social_support,
    df_2021.Ladder_score,
    size = (1000,800),
    label="country",
    xaxis = "Social Support",
    yaxis = "Ladder Score",
    title = "Relation between Social Support and Happiness Index Score by country"
)
</pre>
</div>


<div id="org9fb2d05" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///005_happines/scatterplot.png" alt="scatterplot.png" />
</p>
<p><span class="figure-number">Figure 1: </span>scatterplot with ladder score and social support</p>
</div>

<p>
If we want a view of all float variables in several histograms, we can
add this code using Statsplots.
</p>

<div class="org-src-container">
<pre class="src src-julia">N = ncol(float_df)
numerical_cols = Symbol.(names(float_df,Real))
@df float_df Plots.histogram(cols();
                             layout=N,
                             size=(1400,800),
                             title=permutedims(numerical_cols),
                             label = false)
</pre>
</div>


<div id="org76af1fd" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///005_happines/histogram.png" alt="histogram.png" />
</p>
<p><span class="figure-number">Figure 2: </span>Histogram of all variables</p>
</div>

<p>
And If we want to compare it with boxplots.
</p>

<div class="org-src-container">
<pre class="src src-julia">@df float_df boxplot(cols(), 
                     fillalpha=0.75, 
                     linewidth=2,
                     title = "Comparing distribution for all variables in dataset",
                     legend = :topleft)
</pre>
</div>


<div id="org0507ed6" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///005_happines/boxplot_all.png" alt="boxplot_all.png" />
</p>
<p><span class="figure-number">Figure 3: </span>Boxplot all variables</p>
</div>

<p>
Without going into so much detail, we can affirm that the Ladder Score
is the variable related to the result of the survey on the degree of
happiness in the country (our dependent variable). Explained variables
correspond to the preprocessing to build the Ladder Score, for this
reason, we remove them from the dataframe and will hold with only the
raw data.
</p>

<p>
What are the top 5 countries and bottom 5?
</p>

<div class="org-src-container">
<pre class="src src-julia"># Top 5 and bottom 5 countries by ladder score
sort!(df_2021, :Ladder_score, rev=true)
plot(
    bar(
        first(df_2021.Country_name, 5 ),
        first(df_2021.Ladder_score, 5 ),
        color= "green",
        title = "Top 5 countries by Happiness score",
        legend = false,
    ),
    bar(
        last(df_2021.Country_name, 5 ),
        last(df_2021.Ladder_score, 5 ),
        color ="red",
        title = "Bottom 5 countries by Happiness score",
        legend = false,
    ),
size=(1000,800),
yaxis = "Happines Score",
)
</pre>
</div>


<div id="orga0727c4" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///005_happines/rank.png" alt="rank.png" />
</p>
<p><span class="figure-number">Figure 4: </span>top5 and bottom 5</p>
</div>

<p>
And the classic heatmap for correlation with the following function.
</p>

<div class="org-src-container">
<pre class="src src-julia">function heatmap_cor(df)
    cm = cor(Matrix(df))
    cols = Symbol.(names(df))

    (n,m) = size(cm)
    display(
    heatmap(cm, 
        fc = cgrad([:white,:dodgerblue4]),
        xticks = (1:m,cols),
        xrot= 90,
        size= (800, 800),
        yticks = (1:m,cols),
        yflip=true))
    display(
    annotate!([(j, i, text(round(cm[i,j],digits=3),
                       8,"Computer Modern",:black))
           for i in 1:n for j in 1:m])
    )
end
</pre>
</div>


<div id="org15c1ec4" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///005_happines/heatmap.png" alt="heatmap.png" />
</p>
<p><span class="figure-number">Figure 5: </span>heatmap</p>
</div>

<p>
And now, we can build a function where we can get the mean ladder score
by regional indicator and compare it with the distribution of all
countries.
</p>

<div class="org-src-container">
<pre class="src src-julia">function distribution_plot(df)
    display(
        @df df density(:Ladder_score,
        legend = :topleft, size=(1000,800) , 
        fill=(0, .3,:yellow),
        label="Distribution" ,
        xaxis="Happiness Index Score", 
        yaxis ="Density", 
        title ="Comparison Happiness Index Score by Region 2021") 
    )
    display(
        plot!([mean(df_2021.Ladder_score)],
        seriestype="vline",
        line = (:dash), 
        lw = 3,
        label="Mean")
    )
    for element in unique(df_2021.Regional_indicator)
        display(
            plot!(
            [mean(mean([filter(row-&gt;row["Regional_indicator"]==element, df).Ladder_score]))],
            seriestype="vline",
            lw = 3,
            label="$element") 
        )
    end
end
</pre>
</div>


<div id="org1759a1b" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///005_happines/index_region.png" alt="index_region.png" />
</p>
<p><span class="figure-number">Figure 6: </span>distribution region</p>
</div>

<p>
Suppose we want to try the same idea but with countries. In that case,
we can take advantage of multiple dispatch and create a function that
receives a list of countries and creates a variation of the distribution
with countries.
</p>

<div class="org-src-container">
<pre class="src src-julia">function distribution_plot(df, var_filter, list_elements)
    display(
        @df df density(:Ladder_score,
        legend = :topleft, size=(1000,800) , 
        fill=(0, .3,:yellow),
        label="Distribution" ,
        xaxis="Happiness Index Score", 
        yaxis ="Density", 
        title ="Happiness index score compare by countries 2021") 
    )
    display(
        plot!([mean(df_2021.Ladder_score)],
        seriestype="vline",
        line = (:dash), 
        lw = 3,
        label="Mean")
    )
    for element in list_elements
        display(
            plot!(
            mean([filter(row-&gt;row[var_filter]==element, df).Ladder_score]),
            seriestype="vline",
            lw = 3,
            label="$element") 
        )
    end
end
</pre>
</div>

<p>
Let's test our new function, comparing three countries.
</p>

<div class="org-src-container">
<pre class="src src-julia">distribution_plot(df_2021, "Country_name", ["Chile",
                                            "United States",
                                            "Japan",
                                           ])
</pre>
</div>


<div id="org4cd79d7" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///005_happines/happiness_countries.png" alt="happiness_countries.png" />
</p>
<p><span class="figure-number">Figure 7: </span>distribution countries</p>
</div>

<p>
Here we can see how the USA has the highest score, followed by Chile and
Japan.
</p>

<p>
To end the first part, let's apply some statistical tests. We will use
an equal variance T-test to compare distribution from different regions.
The function is as follows.
</p>

<div class="org-src-container">
<pre class="src src-julia"># Perform a simple test to compare distributions
# This function performs a two-sample t-test of the null hypothesis that s1 and s2 
# come from distributions with equal means and variances 
# against the alternative hypothesis that the distributions have different means 
# but equal variances.
function t_test_sample(df, var, x , y)
    x = filter(row -&gt;row[var] == x, df).Ladder_score
    y = filter(row -&gt;row[var] == y, df).Ladder_score
    EqualVarianceTTest(vec(x), vec(y))
end
</pre>
</div>

<p>
We will have this output if we compare Western Europe and North America
and ANZ.
</p>

<div class="org-src-container">
<pre class="src src-julia">t_test_sample(df_2021, "Regional_indicator", "Western Europe", "North America and ANZ")
</pre>
</div>

<div class="org-src-container">
<pre class="src src-julia">julia&gt; t_test_sample(df_2021, "Regional_indicator", "Western Europe", "North America and ANZ")
Two sample t-test (equal variance)
----------------------------------
Population details:
    parameter of interest:   Mean difference
    value under h_0:         0
    point estimate:          -0.213595
    95% confidence interval: (-0.9068, 0.4796)

Test summary:
    outcome with 95% confidence: fail to reject h_0
    two-sided p-value:           0.5301

Details:
    number of observations:   [21,4]
    t-statistic:              -0.6374218416101513
    degrees of freedom:       23
    empirical standard error: 0.3350924366753546
</pre>
</div>

<p>
We don't have enough evidence to reject the hypothesis that these
samples come from distributions with equal means and variance. On
another side, if we try comparing Western Europe with South Asia, we can
see this:
</p>

<div class="org-src-container">
<pre class="src src-julia">julia&gt; t_test_sample(df_2021, "Regional_indicator", "South Asia", "Western Europe")
Two sample t-test (equal variance)
----------------------------------
Population details:
    parameter of interest:   Mean difference
    value under h_0:         0
    point estimate:          -2.47305
    95% confidence interval: (-3.144, -1.802)

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           &lt;1e-07

Details:
    number of observations:   [7,21]
    t-statistic:              -7.576776118465833
    degrees of freedom:       26
    empirical standard error: 0.32639840222022687
</pre>
</div>

<p>
In this case, we can reject that hypothesis.
</p>
</div>
</div>
<div id="outline-container-clustering" class="outline-2">
<h2 id="clustering"><span class="section-number-2">2.</span> Clustering</h2>
<div class="outline-text-2" id="text-clustering">
<p>
Now we will cluster the countries using the popular algorithm Kmeans. My
first option was to use
<a href="https://github.com/JuliaStats/Clustering.jl">clustering.jl</a>. However,
determining the ideal number of clusters is necessary to get the Wcss
(within-cluster sum of the square). With this, we can evaluate it with
the elbow method, so I used
<a href="https://github.com/cstjean/ScikitLearn.jl">Scikit-learn</a> wrapper. I
also include an
<a href="https://github.com/JuliaStats/Clustering.jl/issues/239">issue</a>. Well,
let's continue with the last part. I started adding some libraries.
</p>

<div class="org-src-container">
<pre class="src src-julia">using Random
using ScikitLearn
using PyCall

@sk_import preprocessing: StandardScaler
@sk_import cluster: KMeans
</pre>
</div>

<p>
Let's take out from the <code>float_df</code> all the variables related to
Ladder<sub>score</sub>, and keep only the variables considered in the survey.
</p>

<div class="org-src-container">
<pre class="src src-julia">select!(float_df, Not([:Standard_error_of_ladder_score, 
                           :Ladder_score, 
                           :Ladder_score_in_Dystopia, 
                           :Dystopia_residual]))
</pre>
</div>

<p>
To train our model, we need to standardize the data, and then we will
create a list to retrieve the wcss in every iteration. The function is
as follows:
</p>

<div class="org-src-container">
<pre class="src src-julia">function kmeans_train(df)
    X = fit_transform!(StandardScaler(), Matrix(df))

    wcss = []
    for n in 1:10

        Random.seed!(123)
        cluster =KMeans(n_clusters=n,
                        init = "k-means++",
                        max_iter = 20,
                        n_init = 10,
                        random_state = 0)
        cluster.fit(X)
        push!(wcss, cluster.inertia_)
    end
    return wcss
end
</pre>
</div>

<p>
Let's invoke the function and plot the wcss.
</p>

<div class="org-src-container">
<pre class="src src-julia">wcss = kmeans_train(float_df)

plot(wcss, title = "wcss in each cluster",
    xaxis = "cluster",
   yaxis = "Wcss")
</pre>
</div>


<div id="org56e0f93" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///005_happines/elbow.png" alt="elbow.png" />
</p>
<p><span class="figure-number">Figure 8: </span>Elbow Method</p>
</div>

<p>
In this case, I decided to go for three clusters. We can abuse make use
of multiple dispatch again, adding <code>n</code> for a defined number of clusters.
</p>

<div class="org-src-container">
<pre class="src src-julia">function kmeans_train(df, n)
    X = fit_transform!(StandardScaler(), Matrix(df))

    Random.seed!(123)
    cluster =KMeans(n_clusters=n,
                    init = "k-means++",
                    max_iter = 20,
                    n_init = 10,
                    random_state = 0)
    cluster.fit(X)
    return cluster
end

cluster= kmeans_train(float_df, 3)
</pre>
</div>

<p>
If we take the first plot we did at the beginning of the post, but now
we add the cluster labels, we have this plot.
</p>

<div class="org-src-container">
<pre class="src src-julia">
scatter(filter(row -&gt;row.cluster ==1,df).Social_support, filter(row -&gt;row.cluster ==1,df).Ladder_score, title = "Distribution of Happiness Score by Cluster", xaxis = "Social Support", yaxis="Ladder Score", label = "Cluster 1", legend = :topleft)
scatter!(filter(row -&gt;row.cluster ==3,df).Social_support, filter(row -&gt;row.cluster ==3,df).Ladder_score,  label = "Cluster 2")
scatter!(filter(row -&gt;row.cluster ==2,df).Social_support, filter(row -&gt;row.cluster ==2,df).Ladder_score,  label = "Cluster 3")

</pre>
</div>


<div id="orgd02bb9a" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///005_happines/cluster-scatter.png" alt="cluster-scatter.png" />
</p>
<p><span class="figure-number">Figure 9: </span>Scatter with cluster</p>
</div>

<p>
Here are the lists in these 3 clusters:
</p>

<p>
<b>Cluster 1:</b> Australia, Austria, Canada, Denmark, Estonia, Finland,
France, Germany, Hong Kong S.A.R. of China, Iceland, Ireland,
Luxembourg, Malta, Netherlands, New Zealand, Norway, Singapore, Sweden,
Switzerland, United Arab Emirates, United Kingdom, United States,
Uzbekistan.
</p>

<p>
<b>Cluster 2:</b> Albania, Argentina, Armenia, Azerbaijan, Bahrain, Belarus,
Belgium, Bolivia, Bosnia and Herzegovina, Brazil, Bulgaria, Chile,
China, Colombia, Costa Rica, Croatia, Cyprus, Czech Republic, Dominican
Republic, Ecuador, El Salvador, Greece, Guatemala, Honduras, Hungary,
Israel, Italy, Jamaica, Japan, Kazakhstan, Kosovo, Kuwait, Kyrgyzstan,
Latvia, Libya, Lithuania, Malaysia, Maldives, Mauritius, Mexico,
Moldova, Mongolia, Montenegro, Nicaragua, North Cyprus, North Macedonia,
Panama, Paraguay, Peru, Philippines, Poland, Portugal, Romania, Russia,
Saudi Arabia, Serbia, Slovakia, Slovenia, South Korea, Spain, Taiwan
Province of China, Tajikistan, Thailand, Turkey, Turkmenistan, Ukraine,
Uruguay, Venezuela, Vietnam.
</p>

<p>
<b>Cluster 3:</b> Afghanistan, Algeria, Bangladesh, Benin, Botswana, Burkina
Faso, Burundi, Cambodia, Cameroon, Chad, Comoros, Congo (Brazzaville),
Egypt, Ethiopia, Gabon, Gambia, Georgia, Ghana, Guinea, Haiti, India,
Indonesia, Iran, Iraq, Ivory Coast, Jordan, Kenya, Laos, Lebanon,
Lesotho, Liberia, Madagascar, Malawi, Mali, Mauritania, Morocco,
Mozambique, Myanmar, Namibia, Nepal, Niger, Nigeria, Pakistan,
Palestinian Territories, Rwanda, Senegal, Sierra Leone, South Africa,
Sri Lanka, Swaziland, Tanzania, Togo, Tunisia, Uganda, Yemen, Zambia,
Zimbabwe.
</p>

<div class="org-src-container">
<pre class="src src-julia">histogram(filter(row -&gt;row.cluster ==1,df).Ladder_score, label = "cluster 1", title = "Distribution of Happiness Score by Cluster", xaxis = "Ladder Score", yaxis="n° countries")
histogram!(filter(row -&gt;row.cluster ==3,df).Ladder_score, label = "cluster 2")
histogram!(filter(row -&gt;row.cluster ==2,df).Ladder_score, label = "cluster 3")
</pre>
</div>


<div id="orga00c75c" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///005_happines/distribution.png" alt="distribution.png" />
</p>
<p><span class="figure-number">Figure 10: </span>histogram happiness cluster</p>
</div>

<p>
Finally, we can compare how this cluster affects all the variables.
</p>

<div class="org-src-container">
<pre class="src src-julia">@df float_df Plots.density(cols();
                             layout=N,
                             size=(1600,1200),
                             title=permutedims(numerical_cols),
                             group = df.cluster,
                             label = false)
</pre>
</div>


<div id="org924f11d" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///005_happines/distr_cluster_vars.png" alt="distr_cluster_vars.png" />
</p>
<p><span class="figure-number">Figure 11: </span>Distribution by variables with cluster</p>
</div>
</div>
</div>
<div id="outline-container-conclusions" class="outline-2">
<h2 id="conclusions"><span class="section-number-2">3.</span> Conclusions</h2>
<div class="outline-text-2" id="text-conclusions">
<p>
From my experience using Python for about two years in data analysis and
recently dabbling with Julia, I can say that the ecosystem generally
seems quite mature for this purpose. I had some questions that the
community immediately answered on Julia Discourse. More content like
this is needed so that the data science community can more widely adopt
this technology.
</p>
</div>
</div>
]]>
</description></item>
<item>
<title>Creating your own blog with Julia and Franklin</title>
<link>https://indymnv.xyz/posts/tech/20230816-blog-julia-and-franklin.html</link>
<pubDate>Wed, 16 Aug 2023 00:00:00 +0900</pubDate>
<guid>https://indymnv.xyz/posts/tech/20230816-blog-julia-and-franklin.html</guid>
<description>
<![CDATA[<p>
In this post, we are going to discuss how to build your own blog with
Julia and <a href="https://franklinjl.org/">Franklin.jl</a>, a popular static
site generator among Julia users who create their own blogs or even
build websites for tutorials. I hope that if you are reading this entry
and you don't have your own space, it can motivate you to build your own
website.
</p>
<div id="outline-container-some-reasons-to-create-your-own-blog" class="outline-2">
<h2 id="some-reasons-to-create-your-own-blog"><span class="section-number-2">1.</span> Some Reasons to Create Your Own Blog</h2>
<div class="outline-text-2" id="text-some-reasons-to-create-your-own-blog">
<p>
Blogs may sound old-fashioned, something created by people who are still
living in the 90s, typing with passion about the political system while
listening to Soundgarden in the background and drinking some kind of
cheap beer&#x2026; or programmers. And because if you are reading this
content, you're probably at least the second one, you should consider
that having a blog is a nice way to:
</p>

<ul class="org-ul">
<li>Track your progress in your field</li>
<li>Generate content that can be useful for somebody else</li>
<li>Help the open-source community with diffusion, tutorials, etc.</li>
<li>Create your own space and adapt it you your needs</li>
<li>Build your personal brand and help you to find a job</li>
</ul>

<p>
But why Franklin? Franklin is one of the most popular libraries for this
purpose in Julia. It offers seamless integration with running Julia
scripts so you can use julia for demostrations in your blog this coud be
harder with other static site generators. If you only want to create
basic entries with some code and images, perhaps Franklin.jl might not
be that different from Hugo or Jekyll.
</p>
</div>
</div>
<div id="outline-container-installation" class="outline-2">
<h2 id="installation"><span class="section-number-2">2.</span> Installation</h2>
<div class="outline-text-2" id="text-installation">
<p>
The first step is to create a folder where you will save your project.
Once you are ready, open the Julia REPL in the location where the folder
should be. When it's ready, type <code>]</code> to activate the package manager and
then type:
</p>

<div class="org-src-container">
<pre class="src src-julia">(@v1.9) pkg&gt; add Franklin
</pre>
</div>

<p>
then, return to the Julia Repl and import the library:
</p>

<div class="org-src-container">
<pre class="src src-julia">julia&gt; using Franklin
</pre>
</div>

<p>
Remember to make sure you have successfully installed the Franklin
library before trying to import it.
</p>
</div>
</div>
<div id="outline-container-first-steps" class="outline-2">
<h2 id="first-steps"><span class="section-number-2">3.</span> First Steps</h2>
<div class="outline-text-2" id="text-first-steps">
<p>
To create your website, you can choose one of the
<a href="https://tlienart.github.io/FranklinTemplates.jl/">templates</a>
available. In my case, I just used the basic one, but if you have a
different preference, feel free to go ahead; they all follow similar
structures. You can also import another template that you like more and
adapt it to your website. Please read the documentation for instructions
on how to do this.
</p>
</div>
<div id="outline-container-selecting-a-template" class="outline-3">
<h3 id="selecting-a-template"><span class="section-number-3">3.1.</span> Selecting a template</h3>
<div class="outline-text-3" id="text-selecting-a-template">
<p>
Once you have decided your template, type in the REPL the next
instruction
</p>

<div class="org-src-container">
<pre class="src src-julia">julia&gt; newsite("myBlog", template="basic") #you can choose another name and template
</pre>
</div>

<p>
This will create a folder with various directories and elements. It will
also activate the environment inside the project. So, if you verify the
project with <code>]</code>, it should display the name of your project.
</p>

<div class="org-src-container">
<pre class="src src-sh">.
&#9500;&#9472;&#9472; 404.md            <span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">Page for error 404
</span>&#9500;&#9472;&#9472; Manifest.toml     <span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">The typical toml files for Julia development project
</span>&#9500;&#9472;&#9472; Project.toml
&#9500;&#9472;&#9472; __site            <span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">Generate your full website.
</span>&#9500;&#9472;&#9472; _assets           <span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">You can add pictures and images here
</span>&#9500;&#9472;&#9472; _css              <span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">All related to styling your website
</span>&#9500;&#9472;&#9472; _layout           <span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">All related to the structure of your website
</span>&#9500;&#9472;&#9472; _libs             <span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">Here will go all elements for website like katex, searchbar, etc  
</span>&#9500;&#9472;&#9472; _rss              <span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">A couple of files related to rss feed, 
</span>&#9500;&#9472;&#9472; config.md         <span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">Set Global variables for your website
</span>&#9500;&#9472;&#9472; index.md          <span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">Main landing page
</span>&#9500;&#9472;&#9472; pages.md          <span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">All your pages / you can create your folder or organize in different way
</span>&#9492;&#9472;&#9472; utils.jl          <span style="font-weight: bold; font-style: italic;"># </span><span style="font-weight: bold; font-style: italic;">Julia File for setting some configurations</span>
</pre>
</div>

<p>
Finally type:
</p>

<div class="org-src-container">
<pre class="src src-julia">julia&gt; serve()
</pre>
</div>

<p>
It should open your website locally in the browser, and it should look
exactly the same as the template website you chose.
</p>


<div id="org3399848" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///006_build_blog/white_blog.png" alt="white_blog.png" />
</p>
<p><span class="figure-number">Figure 1: </span>starting template</p>
</div>

<p>
From this point, it's time to delete some files and content. You might
also want to add some pages for your projects, about, contact, etc. This
is up to you, but for now, we are going to keep just 2 pages: one for
the main "about" page and another to host all your posts.
</p>
</div>
</div>
<div id="outline-container-cleaning-the-template" class="outline-3">
<h3 id="cleaning-the-template"><span class="section-number-3">3.2.</span> Cleaning the template</h3>
<div class="outline-text-3" id="text-cleaning-the-template">
<p>
Now, go to the "index.md" page and delete all its content. This page
will become your main page, and you can mix HTML and Markdown in this
file to add whatever you want to it.
</p>

<div class="org-src-container">
<pre class="src src-md">
# Welcome to my blog
## I am using Franklin

~~~
    &lt;img src="/assets/rndimg.jpg" height="300" class="main-picture" &gt;
    &lt;p&gt;
    &lt;p&gt;
~~~


This is an introductory message 
</pre>
</div>

<p>
You might have noticed that in our main page, there are four links to
different pages. You can choose to keep those links or delete them all.
However, for the purpose of creating a blog section, let's use one of
those links. To do that, follow these steps:
</p>

<ol class="org-ol">
<li>Go to the "header.html" file located in the "layout" folder.</li>
<li>Modify the code in the "header.html" file to something like this:</li>
</ol>

<div class="org-src-container">
<pre class="src src-html">&lt;<span style="font-weight: bold;">header</span>&gt;
&lt;<span style="font-weight: bold;">div</span> <span style="font-weight: bold; font-style: italic;">class</span>=<span style="font-style: italic;">"blog-name"</span>&gt;&lt;<span style="font-weight: bold;">a</span> <span style="font-weight: bold; font-style: italic;">href</span>=<span style="font-style: italic;">"/"</span>&gt;&lt;/<span style="font-weight: bold;">a</span>&gt;Amazing Blog&lt;/<span style="font-weight: bold;">div</span>&gt;
&lt;<span style="font-weight: bold;">nav</span>&gt;
  &lt;<span style="font-weight: bold;">ul</span>&gt;
    &lt;<span style="font-weight: bold;">li</span>&gt;&lt;<span style="font-weight: bold;">a</span> <span style="font-weight: bold; font-style: italic;">href</span>=<span style="font-style: italic;">"/"</span>&gt;Home&lt;/<span style="font-weight: bold;">a</span>&gt;&lt;/<span style="font-weight: bold;">li</span>&gt;
    &lt;<span style="font-weight: bold;">li</span>&gt;&lt;<span style="font-weight: bold;">a</span> <span style="font-weight: bold; font-style: italic;">href</span>=<span style="font-style: italic;">"/menu1/"</span>&gt;Blog&lt;/<span style="font-weight: bold;">a</span>&gt;&lt;/<span style="font-weight: bold;">li</span>&gt;
  &lt;/<span style="font-weight: bold;">ul</span>&gt;
  &lt;<span style="font-weight: bold;">img</span> <span style="font-weight: bold; font-style: italic;">src</span>=<span style="font-style: italic;">"/assets/hamburger.svg"</span> <span style="font-weight: bold; font-style: italic;">id</span>=<span style="font-style: italic;">"menu-icon"</span>&gt;
&lt;/<span style="font-weight: bold;">nav</span>&gt;
&lt;/<span style="font-weight: bold;">header</span>&gt;
</pre>
</div>

<p>
If you're looking to change the background color to something more
interesting than white, now is the time to showcase your frontend
skills. Follow these steps:
</p>

<ol class="org-ol">
<li>Navigate to the "franklin.css" file.</li>
<li>In the first block of code, add the background color that you prefer.
For instance:</li>
</ol>

<div class="org-src-container">
<pre class="src src-css">
<span style="font-weight: bold;">:root</span> {
  <span style="font-weight: bold; font-style: italic;">--block-background</span>: <span style="color: #000000; background-color: #efefef;">hsl(0, 0%, 94%)</span>;
  <span style="font-weight: bold; font-style: italic;">--output-background</span>: <span style="color: #000000; background-color: #f9f9f9;">hsl(0, 0%, 98%)</span>;
  <span style="font-weight: bold; font-style: italic;">--small</span>: 14px;
  <span style="font-weight: bold; font-style: italic;">--normal</span>: 19px;
  <span style="font-weight: bold; font-style: italic;">--text-color</span>: hsv(0, 0%, 20%);
    <span style="font-weight: bold;">background-color</span>: <span style="color: #000000; background-color: #00ffff;">aqua</span>;
}
</pre>
</div>

<p>
Finally, after making these modifications, the result should look
something like this:
</p>


<div id="org6332bb7" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///006_build_blog/frontend_master.png" alt="frontend_master.png" />
</p>
<p><span class="figure-number">Figure 2: </span>frontend</p>
</div>
</div>
</div>
<div id="outline-container-creating-your-first-post" class="outline-3">
<h3 id="creating-your-first-post"><span class="section-number-3">3.3.</span> Creating your first post</h3>
<div class="outline-text-3" id="text-creating-your-first-post">
<p>
Now, if you're ready to start your own blog, here's how you can set up
the "posts" folder to add your articles, create a new folder named
"posts" in the same root directory as your other folders. Is important
to consider this things.
</p>

<ul class="org-ul">
<li>Inside the "posts" folder, you can add all your articles. You have the
flexibility to use both Markdown files and HTML files for your
articles.</li>
<li>If you're doing literate programming with tools like Pluto or Jupyter,
you can export your notebooks to HTML format and place them in the
"posts" folder. This way, anyone can easily view your data science
projects.</li>
</ul>

<p>
For now, let's add a file called <code>test1.md</code> inside the <code>posts</code> folder
and you can add some text
</p>

<div class="org-src-container">
<pre class="src src-md"># This is a title in my first post

So I can write anything

## Here is an introduction

We are going to write some code:

using LinearAlgebra
a = [1, 2, 3, 3, 4, 5, 2, 2]
@show dot(a, a)
println(dot(a, a))
</pre>
</div>

<p>
Then, go to the <code>menu1.md</code> file, erase the remaining content, and create
a link to the <code>test1.md</code> file. This is as simple as:
</p>


<p>
If you save it, and navigate to <code>http://localhost:8000/posts/test1/</code>,
you should see your post displayed clearly. This page will include your
"about" section and the space to write your blog content.
Congratulations! You now have a basic understanding of how Franklin
works and can make any further edits or modifications you desire.
</p>

<p>
If you wish to further style your website, please go ahead and customize
it to your heart's content.
</p>
</div>
</div>
</div>
<div id="outline-container-deployment" class="outline-2">
<h2 id="deployment"><span class="section-number-2">4.</span> Deployment</h2>
<div class="outline-text-2" id="text-deployment">
<p>
Now it's time to host your website in some place. One of the most
straightforward options is using GitHub. Here's how you can do it:
</p>

<ol class="org-ol">
<li><b>Create a Repository</b>: Go to your GitHub account and create an empty
repository. When entering the name of your project, you have two
paths to choose from:

<ol class="org-ol">
<li>If this is a personal website or organization, the name of your
project should be something like <code>username.github.io</code>.</li>

<li>You can create your own custom name for your project, like
<code>myblog</code>.</li>
</ol></li>
</ol>

<p>
If you're unsure which option to choose, I recommend going with option
(a) because it's more straightforward. If you choose option (b), you'll
need to define a <code>prepath</code> variable in your <code>config.md</code> with the name of
that project. For instance: <code>@def prepath = "myblog"</code>.
</p>

<ol class="org-ol">
<li value="2"><b>Upload Your Project</b>: Now upload your project to GitHub,
following the instructions in your repository.</li>

<li><b>Configure GitHub Pages</b>: Once you've pushed your project, go to the
<code>Settings</code> tab in your repository. Then navigate to <code>GitHub Pages</code>.
In the Source dropdown, select <code>gh-pages</code>. If you see a message
indicating success, your project is now live.</li>

<li><b>Check Your Website</b>: You can now open your web browser and enter the
link of your project, which would be <code>username.github.io</code>. If you can
see your website, congratulations! Your blog is now live on the
internet.</li>
</ol>

<p>
By following these steps, you've successfully hosted your
Franklin-generated website on GitHub Pages. It's now accessible to
anyone with the link, and you can share your content with the world.
</p>
</div>
<div id="outline-container-hosting-in-a-different-domain-optional" class="outline-3">
<h3 id="hosting-in-a-different-domain-optional"><span class="section-number-3">4.1.</span> Hosting in a different domain (optional)</h3>
<div class="outline-text-3" id="text-hosting-in-a-different-domain-optional">
<p>
If you're hesitant to share your GitHub username due to its lengthy or
unconventional extension, or if you prefer a more professional-looking
link, you might want to consider an alternative domain, such as .com or
.dev. You can purchase a domain and link it to your website. For
example, you can use services like Google Domains to find and purchase a
domain that suits your preference.
</p>

<p>
Once you've found and acquired the domain you like, you can proceed to
link it to your website. To do this, you need to configure the DNS
settings. You can find detailed explanations about custom domains and
GitHub Pages in the
<a href="https://docs.github.com/en/pages/configuring-a-custom-domain-for-your-github-pages-site/about-custom-domains-and-github-pages">documentation</a>.
In a nutshell, follow these steps:
</p>

<ol class="org-ol">
<li>Go to Google Domains, select your domain, and navigate to the DNS
section.</li>
<li>Configure the DNS records, as shown below:</li>
</ol>


<div id="orge44792f" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///006_build_blog/dns_setup.png" alt="dns_setup.png" />
</p>
<p><span class="figure-number">Figure 3: </span>dns<sub>setup</sub></p>
</div>

<ol class="org-ol">
<li value="3">After correctly setting up the DNS records, go to your GitHub
project repository's settings, then navigate to Pages and enter your
custom domain:</li>
</ol>


<div id="orgf39f259" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///006_build_blog/custom_domain.png" alt="custom_domain.png" />
</p>
<p><span class="figure-number">Figure 4: </span>custom<sub>domain</sub></p>
</div>

<ol class="org-ol">
<li value="4">If everything is set up correctly, GitHub will confirm the
configuration. In a few minutes, your website should become
accessible via your new custom domain.</li>
</ol>

<p>
By following these steps, you'll be able to link a custom domain to your
Franklin-generated website, providing a more personalized and
professional web presence.
</p>
</div>
</div>
</div>
<div id="outline-container-rss-and-tags" class="outline-2">
<h2 id="rss-and-tags"><span class="section-number-2">5.</span> RSS and Tags</h2>
<div class="outline-text-2" id="text-rss-and-tags">
<p>
Now that your website is up and running, setting up an RSS feed is
important for people who want to stay updated on your new articles
without having to visit your website daily. Tools like Newsboat or
Inoreader help users keep track of updates from various websites, making
an RSS feed a valuable addition to your blog.
</p>

<p>
Thankfully, Franklin makes setting up an RSS feed quite simple. All you
need to do is go to each page in your "posts" folder and add a small
description within <code>+++</code> brackets, like this:
</p>

<div class="org-src-container">
<pre class="src src-md">+++
tags = ["Julia", "Writing"] 

rss_title = "Creating your own blog with Julia and Franklin"
rss_description = "Describing the steps to create your own blog, so you can stop posting your code on Instagram"
rss_pubdate = Date(2023, 8, 10) 
+++
</pre>
</div>

<p>
The RSS fields you add will be included in the information extracted by
platforms like Newsboat. From these applications, I can read the title,
a brief description, and the publication date and all the content if
it's available. Additionally, you'll notice a "tags" section. This is
also important because it allows users to filter by topics. For example,
if you write different blogs about topics ranging from Julia programming
to analysis of Shakira's new songs, users can select the topics they're
specifically interested in.
</p>

<p>
To share your blog's RSS feed, you'll need a URL like
<code>https://www.yourdomain.com/feed.xml</code>. Make sure to prominently display
this URL in your website so that readers can easily find and subscribe
to your feed.
</p>
</div>
<div id="outline-container-host-your-feed-to-juliabloggers-optional" class="outline-3">
<h3 id="host-your-feed-to-juliabloggers-optional"><span class="section-number-3">5.1.</span> Host your Feed to JuliaBloggers (optional)</h3>
<div class="outline-text-3" id="text-host-your-feed-to-juliabloggers-optional">
<p>
Lastly, if you're considering writing about Julia and want to contribute
to the community, don't hesitate to share your work. Whether it's a
calculator project, a website, a 2D game, or a cutting-edge machine
learning algorithm, your contributions will help the Julia community
grow and provide valuable insights for others to learn from.
</p>

<p>
Visit the
<a href="https://www.juliabloggers.com/julia-bloggers-submit-feed/">JuliaBloggers
Website</a> and add your information. In the "Feed URL" field, you can use
a URL similar to the first example you mentioned, like:
</p>

<ul class="org-ul">
<li><code>http://indymnv.dev/tag/julia/feed/</code></li>
</ul>

<p>
Once you've submitted this information, every time you publish a new
post on your website, the community will be able to see it. If you want
to test this process first, you can use an RSS reader like Newsboat or
Inoreader to ensure that your updates are being picked up as expected.
</p>
</div>
</div>
</div>
<div id="outline-container-conclusions" class="outline-2">
<h2 id="conclusions"><span class="section-number-2">6.</span> Conclusions</h2>
<div class="outline-text-2" id="text-conclusions">
<p>
I hope you enjoyed reading this article. If you haven't yet created your
own website, I hope it serves as motivation to get started, whether you
choose to use Franklin or another static site generator. Having your own
online space to write about your interests and dive as deep as you like
is a rewarding endeavor. Don't hesitate to embark on this journey and
create a platform that showcases your passion and expertise. Happy
blogging!
</p>
</div>
</div>
<div id="outline-container-acknowledgment" class="outline-2">
<h2 id="acknowledgment"><span class="section-number-2">7.</span> Acknowledgment</h2>
<div class="outline-text-2" id="text-acknowledgment">
<p>
I also want to thank Thibaut Lienart, who is the main developer of
Franklin. His work has been incredibly beneficial for the community.
</p>
</div>
</div>
]]>
</description></item>
<item>
<title>How to scrape data with Python using selenium and Pandas</title>
<link>https://indymnv.xyz/posts/tech/20221215-how-to-scrape-data-python-selenium.html</link>
<pubDate>Thu, 15 Dec 2022 00:00:00 +0900</pubDate>
<guid>https://indymnv.xyz/posts/tech/20221215-how-to-scrape-data-python-selenium.html</guid>
<description>
<![CDATA[<div id="outline-container-org768b31e" class="outline-2">
<h2 id="org768b31e"><span class="section-number-2">1.</span> Introduction</h2>
<div class="outline-text-2" id="text-1">
<p>
In this tutorial, I will dedicate myself to explaining how web scraping can be done from a platform where a dynamic interaction of the web application is required, this is quite useful when obtaining data from different links within the platform and where it is necessary a management scheme of the front-end components to carry it out.
</p>

<p>
Here there are mainly two essential libraries, the first is selenium which corresponds to a framework that operates for multiple languages and serves to automate and control the browser, while Pandas for data manipulation will allow us to read data tables directly.
</p>

<p>
Many times, the beautiful soup library is used to extract html elements from the web, but as we will see, it is not necessary to do so in this case.
</p>

<p>
For this example, I am going to use the <a href="http://aplicativos.odepa.cl/recepcion-industria-lactea.do">chilean dairy production platform</a>, this platform is used to obtain information on the production of products dairy products from different factories nationwide.
</p>
</div>
</div>
<div id="outline-container-orgc149fba" class="outline-2">
<h2 id="orgc149fba"><span class="section-number-2">2.</span> Requirements</h2>
<div class="outline-text-2" id="text-2">
<p>
To start, you must have Python installed. In my case, I am using version 3.9, you also have to have your browser (Mozilla or Chrome) secured. In this project, I will use the chrome one, but the codes should be similar to the one we are using here, then to work with selenium, you have to download the [executable](<a href="https://chromedriver.chromium.org/downloads">https://chromedriver.chromium.org/downloads</a>) that corresponds to your browser and its respective version
</p>

<p>
If you use pip you can install it using:
</p>

<div class="org-src-container">
<pre class="src src-python">pip install -U selenium
</pre>
</div>

<p>
The import the libraries
</p>

<div class="org-src-container">
<pre class="src src-python"><span style="font-weight: bold;">from</span> selenium <span style="font-weight: bold;">import</span> webdriver
<span style="font-weight: bold;">import</span> pandas <span style="font-weight: bold;">as</span> pd
<span style="font-weight: bold;">import</span> lxml
<span style="font-weight: bold;">from</span> selenium.webdriver.support.ui <span style="font-weight: bold;">import</span> Select
<span style="font-weight: bold;">import</span> sys
<span style="font-weight: bold;">import</span> time
</pre>
</div>

<p>
Once you have imported the corresponding libraries, we will perform the first test with the chromedriver.exe (the one you downloaded from the selenium portal). For simplicity, I recommend having it in the same directory as this scrapper.
</p>

<div class="org-src-container">
<pre class="src src-python"><span style="font-weight: bold; font-style: italic;">driver</span> = webdriver.Chrome(<span style="font-style: italic;">'/Your/path/to/the/project/chromedriver'</span>)
driver.get(<span style="font-style: italic;">"http://aplicativos.odepa.cl/recepcion-industria-lactea.do"</span>)
</pre>
</div>

<p>
This should allow the web page to be opened from the Chrome browser, the driver variable that we have assigned the chromedriver will drive the states of our browser. We can now add this snippet code
</p>

<div class="org-src-container">
<pre class="src src-nil">time.sleep(5)
driver.quit()
</pre>
</div>

<p>
With this, we add a timeout of 5 seconds, and with driver.quit() we close the browser. The reason for adding waiting times is that while we have to operate within the browser, either due to internet connections or latency of the web platform, we will therefore have to wait for the elements we need to be available.
</p>

<p>
It is time to see how we can start interacting with the web page elements. For example, if we want to click on certain features, what we have to do is right-click on the component on the web page, place inspect and then recognize the element and how we can call it according to how it is identified, this can be by id, name, XPath, etc. I often use the XPath, which you can copy and paste into your code.
</p>

<div class="org-src-container">
<pre class="src src-python"><span style="font-weight: bold; font-style: italic;">#</span><span style="font-weight: bold; font-style: italic;">Select elements
</span>driver.find_element_by_id(<span style="font-style: italic;">'tipoConsulta2'</span>).click()
driver.find_element_by_id(<span style="font-style: italic;">'filterByRegionOrPlanta2'</span>).click()
driver.find_element_by_id(<span style="font-style: italic;">'filterByRegionOrPlanta2'</span>).click()

<span style="font-weight: bold; font-style: italic;">#</span><span style="font-weight: bold; font-style: italic;">Extract the list of years
</span>driver.find_element_by_xpath(<span style="font-style: italic;">'//*[@id="divFechaDetalleMensual"]/img'</span>).click()
driver.find_element_by_xpath(<span style="font-style: italic;">'//*[@id="ui-datepicker-div"]/div[1]/div/select'</span>).click()
<span style="font-weight: bold; font-style: italic;">years</span> = driver.find_elements_by_tag_name(<span style="font-style: italic;">"option"</span>)

</pre>
</div>

<p>
Here what we have done is open the web page and make the necessary selections and filters to access the data, we end up creating a list called years, where we will have all the years available in this web application.
</p>

<p>
Now with this, we can get the elements. Using the following code.
</p>

<div class="org-src-container">
<pre class="src src-python">  <span style="font-weight: bold; font-style: italic;">list_years</span> = []
<span style="font-weight: bold;">for</span> year <span style="font-weight: bold;">in</span> years:
    list_years.append(year.get_attribute(<span style="font-style: italic;">'value'</span>))

<span style="font-weight: bold; font-style: italic;">#</span><span style="font-weight: bold; font-style: italic;">here I added a filter by year which is optional (you can delete it)
</span><span style="font-weight: bold; font-style: italic;">list_years</span> = [element <span style="font-weight: bold;">for</span> element <span style="font-weight: bold;">in</span> list_years <span style="font-weight: bold;">if</span> element != <span style="font-style: italic;">''</span> <span style="font-weight: bold;">and</span> <span style="font-weight: bold;">int</span>(element)&gt; 2000]
</pre>
</div>

<p>
Now we will obtain the list of elements of all the years to be able to iterate. Then if we want to get the plants, we can use the following:
</p>

<div class="org-src-container">
<pre class="src src-nil">  #Extract all the factory names:
plantasposibles=driver.find_element_by_id('planta')
plantasposibles=plantasposibles.find_elements_by_tag_name("option")
valoresplantas=[]
nombresplantas=[]

for option in plantasposibles:
    valoresplantas.append(option.get_attribute("value"))
    nombresplantas.append(option.get_attribute("text"))

</pre>
</div>

<p>
We locate the dropdown that corresponds to the list of available plants, with this, we take the elements and build the list of plants. This will allow us to perform the following iteration:
</p>

<div class="org-src-container">
<pre class="src src-python"><span style="font-weight: bold; font-style: italic;">tabla</span>=pd.DataFrame() <span style="font-weight: bold; font-style: italic;">#</span><span style="font-weight: bold; font-style: italic;">Here we create the dataframe
</span>
driver.find_element_by_xpath(<span style="font-style: italic;">'//*[@id="divFechaDetalleMensual"]/img'</span>).click()

<span style="font-weight: bold;">for</span> lastyear <span style="font-weight: bold;">in</span> list_years:
    <span style="font-weight: bold;">for</span> i <span style="font-weight: bold;">in</span> <span style="font-weight: bold;">range</span>(1,<span style="font-weight: bold;">len</span>(valoresplantas)):
    ...

</pre>
</div>

<p>
We need to start controlling the options and release the report with the data. From there, we perform reading and data extraction, this is where Pandas shines. If we remember the last double loop, what should go inside is the following.
</p>

<div class="org-src-container">
<pre class="src src-python">  <span style="font-weight: bold; font-style: italic;">#</span><span style="font-weight: bold; font-style: italic;">Select options
</span>driver.execute_script(<span style="font-style: italic;">"document.getElementById('planta').value="</span>+ valoresplantas[i])
driver.find_element_by_xpath(<span style="font-style: italic;">"//*[@id='divFechaDetalleMensual']/img"</span>).click()
time.sleep(1)
<span style="font-weight: bold; font-style: italic;">select</span>=Select(driver.find_element_by_xpath(<span style="font-style: italic;">"//*[@id='ui-datepicker-div']/div[1]/div/select"</span>))
select.select_by_visible_text(<span style="font-weight: bold;">str</span>(lastyear))        
driver.find_element_by_xpath(<span style="font-style: italic;">"//*[@id='ui-datepicker-div']/div[2]/button"</span>).click()
driver.find_element_by_id(<span style="font-style: italic;">'fechaDetalleMensual'</span>).send_keys(lastyear)
<span style="font-weight: bold; font-style: italic;">timeout</span>=15
driver.find_element_by_id(<span style="font-style: italic;">'btnVerInforme'</span>).click()
<span style="font-weight: bold; font-style: italic;">timeout</span>=20

<span style="font-weight: bold; font-style: italic;">############################## </span><span style="font-weight: bold; font-style: italic;">PANDAS #######################################
</span>
<span style="font-weight: bold; font-style: italic;">prueba_html</span>=driver.page_source
<span style="font-weight: bold; font-style: italic;">df</span> = pd.read_html(prueba_html, flavor=<span style="font-style: italic;">'html5lib'</span>)[0]
<span style="font-weight: bold; font-style: italic;">df</span>=df.drop(df.columns[14:397],axis=1)
<span style="font-weight: bold; font-style: italic;">df</span>=df.drop(df.index[0:8],axis=0)
<span style="font-weight: bold; font-style: italic;">df</span>=df.drop(df.index[1],axis=0)
<span style="font-weight: bold; font-style: italic;">df</span>=df.drop(df.index[8:9],axis=0)
<span style="font-weight: bold; font-style: italic;">df</span>[<span style="font-style: italic;">'Year'</span>]=lastyear
<span style="font-weight: bold; font-style: italic;">df</span>[<span style="font-style: italic;">'Factory_Name'</span>]=nombresplantas[i]
<span style="font-weight: bold; font-style: italic;">tabla</span>=pd.concat([tabla,df])

</pre>
</div>

<p>
In case it fails, which is typical when working in selenium, the try/catch options are the best to handle exceptions intelligently. Obviously, it depends a lot on the case and the nature of the project on how to use them, but here I just proceeded to close the application and operate again where it was. To summarize this point, the double loop would look like this:
</p>

<div class="org-src-container">
<pre class="src src-python"><span style="font-weight: bold;">for</span> lastyear <span style="font-weight: bold;">in</span> list_years:
  <span style="font-weight: bold;">for</span> i <span style="font-weight: bold;">in</span> <span style="font-weight: bold;">range</span>(1,<span style="font-weight: bold;">len</span>(valoresplantas)):
      <span style="font-weight: bold;">try</span>: 
          driver.execute_script(<span style="font-style: italic;">"document.getElementById('planta').value="</span>+ valoresplantas[i])
          driver.find_element_by_xpath(<span style="font-style: italic;">"//*[@id='divFechaDetalleMensual']/img"</span>).click()
          time.sleep(1)
          <span style="font-weight: bold; font-style: italic;">select</span>=Select(driver.find_element_by_xpath(<span style="font-style: italic;">"//*[@id='ui-datepicker-div']/div[1]/div/select"</span>))
          select.select_by_visible_text(<span style="font-weight: bold;">str</span>(lastyear))        
          driver.find_element_by_xpath(<span style="font-style: italic;">"//*[@id='ui-datepicker-div']/div[2]/button"</span>).click()
          driver.find_element_by_id(<span style="font-style: italic;">'fechaDetalleMensual'</span>).send_keys(lastyear)
          <span style="font-weight: bold; font-style: italic;">timeout</span>=15
          driver.find_element_by_id(<span style="font-style: italic;">'btnVerInforme'</span>).click()
          <span style="font-weight: bold; font-style: italic;">timeout</span>=20


          <span style="font-weight: bold; font-style: italic;">prueba_html</span>=driver.page_source
          <span style="font-weight: bold; font-style: italic;">df</span> = pd.read_html(prueba_html, flavor=<span style="font-style: italic;">'html5lib'</span>)[0]
          <span style="font-weight: bold; font-style: italic;">df</span>=df.drop(df.columns[14:397],axis=1)
          <span style="font-weight: bold; font-style: italic;">df</span>=df.drop(df.index[0:8],axis=0)
          <span style="font-weight: bold; font-style: italic;">df</span>=df.drop(df.index[1],axis=0)
          <span style="font-weight: bold; font-style: italic;">df</span>=df.drop(df.index[8:9],axis=0)
          <span style="font-weight: bold; font-style: italic;">df</span>[<span style="font-style: italic;">'Year'</span>]=lastyear
          <span style="font-weight: bold; font-style: italic;">df</span>[<span style="font-style: italic;">'Factory_Name'</span>]=nombresplantas[i]
          <span style="font-weight: bold; font-style: italic;">tabla</span>=pd.concat([tabla,df])

      <span style="font-weight: bold;">except</span>:

          <span style="font-weight: bold; font-style: italic;">#</span><span style="font-weight: bold; font-style: italic;">If fail close and open up the window again
</span>          <span style="font-weight: bold; font-style: italic;">#</span><span style="font-weight: bold; font-style: italic;">driver.quit()
</span>          time.sleep(5)
          driver.get(<span style="font-style: italic;">"http://aplicativos.odepa.cl/recepcion-industria-lactea.do"</span>)
          time.sleep(5)
          driver.find_element_by_id(<span style="font-style: italic;">'tipoConsulta2'</span>).click()
          driver.find_element_by_id(<span style="font-style: italic;">'filterByRegionOrPlanta2'</span>).click()
          driver.find_element_by_id(<span style="font-style: italic;">'filterByRegionOrPlanta2'</span>).click()

</pre>
</div>
</div>
</div>
<div id="outline-container-orge587962" class="outline-2">
<h2 id="orge587962"><span class="section-number-2">3.</span> Final steps with Pandas</h2>
<div class="outline-text-2" id="text-3">
<p>
With this, we would be finishing the process, the only thing left is to integrate the final data with some extra elements and save the dataframe. We will ensure that each component is integrated with its period since the months are in columns, so we will make a single column that contains them.
</p>

<div class="org-src-container">
<pre class="src src-python">
<span style="font-weight: bold; font-style: italic;">tabla</span>=tabla[[<span style="font-style: italic;">'Year'</span>, <span style="font-style: italic;">'Factory_Name'</span>, <span style="font-style: italic;">'Product'</span>, <span style="font-style: italic;">'Unit'</span>,<span style="font-style: italic;">'Jan'</span>,<span style="font-style: italic;">'Feb'</span>,<span style="font-style: italic;">'Mar'</span>,<span style="font-style: italic;">'Apr'</span>,<span style="font-style: italic;">'May'</span>,<span style="font-style: italic;">'Jun'</span>,<span style="font-style: italic;">'Jul'</span>,<span style="font-style: italic;">'Aug'</span>,<span style="font-style: italic;">'Sep'</span>,<span style="font-style: italic;">'Oct'</span>,<span style="font-style: italic;">'Nov'</span>,<span style="font-style: italic;">'Dec'</span>]]

<span style="font-weight: bold; font-style: italic;">lista</span>=<span style="font-weight: bold;">range</span>(<span style="font-weight: bold;">len</span>(tabla.index))
tabla.<span style="font-weight: bold; font-style: italic;">index</span>=lista

<span style="font-weight: bold; font-style: italic;">tablafinal</span>=pd.DataFrame()
<span style="font-weight: bold; font-style: italic;">tablaparcial</span>=tabla.drop(tabla.columns[4:],axis=1)

<span style="font-weight: bold;">for</span> month <span style="font-weight: bold;">in</span> tabla.columns[4:<span style="font-weight: bold;">len</span>(tabla.columns)]:

    <span style="font-weight: bold; font-style: italic;">tablaparcial</span>[<span style="font-style: italic;">'Month'</span>]=month
    <span style="font-weight: bold; font-style: italic;">tablaparcial</span>[<span style="font-style: italic;">'Quantity'</span>]=tabla[month]
    <span style="font-weight: bold; font-style: italic;">tablafinal</span>=pd.concat([tablafinal,tablaparcial])

tablafinal.to_csv(<span style="font-style: italic;">"data.csv"</span>, index = <span style="font-weight: bold; text-decoration: underline;">False</span>)

</pre>
</div>

<p>
Finally, we can do the extraction and a simple preprocessing to leave them more prepared for some analysis or save them to a database.
</p>
</div>
</div>
<div id="outline-container-org9a588c0" class="outline-2">
<h2 id="org9a588c0"><span class="section-number-2">4.</span> Conclusions</h2>
<div class="outline-text-2" id="text-4">
<p>
In this project, we show how we can perform scrapping using selenium and pandas, this of course, can be done thanks to the pandas tools to extract data from HTML, simplifying the extraction. Selenium is an excellent tool to carry out this automation and test web pages, so I recommend it for the design of web apps, for example,  failures in the results or scenarios where there are possible bugs.
</p>
</div>
</div>
]]>
</description></item>
<item>
<title>Notes about Functional Programing with Julia</title>
<link>https://indymnv.xyz/posts/tech/20240810-functional-julia.html</link>
<pubDate>Sat, 10 Aug 2024 00:00:00 +0900</pubDate>
<guid>https://indymnv.xyz/posts/tech/20240810-functional-julia.html</guid>
<description>
<![CDATA[<p>
I am writing here some general ideas that were taken from some sources
like boot.dev about functional programming, many of these sources were
written in Python and I just rewrote Julia in most of the cases. Because
Julia is a program more suitable for FP I considered a good exercise in
the long run to translate the concepts that I am learning about this
paradigm.
</p>
<div id="outline-container-what-is-functional-programming" class="outline-2">
<h2 id="what-is-functional-programming"><span class="section-number-2">1.</span> What is Functional Programming</h2>
<div class="outline-text-2" id="text-what-is-functional-programming">
<ul class="org-ul">
<li>compose functions instead of mutating states,</li>
<li>What you want to happen rather than how you want to happen</li>
</ul>
</div>
<div id="outline-container-inmutability" class="outline-3">
<h3 id="inmutability"><span class="section-number-3">1.1.</span> Inmutability</h3>
<div class="outline-text-3" id="text-inmutability">
<p>
Once the Value is created it can't be changed, this can be easier to
debug
</p>
</div>
</div>
<div id="outline-container-declarative" class="outline-3">
<h3 id="declarative"><span class="section-number-3">1.2.</span> Declarative</h3>
<div class="outline-text-3" id="text-declarative">
<p>
Functional aims to be declarative rather than imperative
</p>
</div>
</div>
<div id="outline-container-math-style" class="outline-3">
<h3 id="math-style"><span class="section-number-3">1.3.</span> Math Style</h3>
<div class="outline-text-3" id="text-math-style">
<p>
imperative style
</p>

<div class="org-src-container">
<pre class="src src-julia">function get_average(nums)
    total = 0
    for num in nums
        total += num
    end
    return total / length(nums)
end
</pre>
</div>

<p>
functional style
</p>

<div class="org-src-container">
<pre class="src src-julia">function get_average(nums)
    return sum(nums) / length(nums)
end
</pre>
</div>

<p>
In general to make a bit more functional style, we should avoid loops
and mutate any variable
</p>

<blockquote>
<p>
Classes encourage you to think about the world as a hierarchical
collection of objects. Objects bundle behavior, data, and state together
in a way that draws boundaries between instances of things, like chess
pieces on a board.
</p>
</blockquote>

<blockquote>
<p>
Functions encourage you to think about the world as a series of data
transformations. Functions take data as input and return a transformed
output. For example, a function might take the entire state of a chess
board and a move as inputs, and return the new state of the board as
output.
</p>
</blockquote>

<p>
OOP is not quite the opposite with FP, but the 4 pillars of the first
one (abstraction, encapsulation, inheritance and polymorphism)
inheritance is the one that can produce changes in classes, so break the
rule of inmutability in FP
</p>
</div>
</div>
<div id="outline-container-functions-are-first-class" class="outline-3">
<h3 id="functions-are-first-class"><span class="section-number-3">1.4.</span> Functions are First Class</h3>
<div class="outline-text-3" id="text-functions-are-first-class">
<p>
We can treat functions as values
</p>

<div class="org-src-container">
<pre class="src src-julia">function add(x,y)
    return x+ y
end

addition = add

println(addition(2,7)

# print 9
</pre>
</div>
</div>
<div id="outline-container-anonymous-functions" class="outline-4">
<h4 id="anonymous-functions"><span class="section-number-4">1.4.1.</span> Anonymous Functions</h4>
<div class="outline-text-4" id="text-anonymous-functions">
<p>
Basically functions that doesn't have name, similar like python use
lambda functions
</p>

<div class="org-src-container">
<pre class="src src-julia">function filter_var(df, value)
    return filter!(row -&gt; row.colum != value , df)
end
</pre>
</div>

<p>
in the last case <code>row -&gt; row.colum !</code> value= is an anonymous function
</p>
</div>
</div>
<div id="outline-container-higher-order-functions" class="outline-4">
<h4 id="higher-order-functions"><span class="section-number-4">1.4.2.</span> Higher Order Functions</h4>
<div class="outline-text-4" id="text-higher-order-functions">
<p>
In the case that the programming language threat functions like any
other variable, so <b>Functions are first class</b> then we can pass
functions as an arguments to other functions.
</p>

<div class="org-src-container">
<pre class="src src-julia">function square(x)
    return x * x
end

function my_map(func, arg_list)
    result = []
    for i in arg_list
        push!(result, func(i))
    end
    return result
end

squares = my_map(square, [1, 2, 3, 4, 5])
println(squares)
# [1, 4, 9, 16, 25]
</pre>
</div>

<p>
In the last case my<sub>map</sub>() is a higher order function
</p>
</div>
<ol class="org-ol">
<li><a id="map-filter-and-reduce"></a>Map, Filter and Reduce<br />
<div class="outline-text-5" id="text-map-filter-and-reduce">
<p>
Map, filter and reduce are three typical examples of Higher order
functions that are quite useful, for a map function you need an iterable
(An object capable of returning its members one at a time.) and a
function, and apply the function to all the elements of this iterable
</p>

<div class="org-src-container">
<pre class="src src-julia">function say_hello(name)
    return "Hello " * name
end

list_names = ["Chris", "Hector", "Benito"]

map(say_hello, list_names)
# ["Hello Chris, "Hello Hector", "Hello Benito"]
</pre>
</div>

<p>
Filter was already shown in an example before, but basically takes an
iterable, a function and return also an iterable that is a subset of the
original.
</p>

<p>
Finally the Reduce function take same arguments but now it reduce
everything to a single value, like the following example
</p>

<div class="org-src-container">
<pre class="src src-julia">function add(sum_so_far, x)
    prinln("sum_so_far: $sum_so_far, x: $x")
    return sum_so_far + x
end

numbers = [1, 2, 3, 4]
sum = reduce(add, numbers)

# sum_so_far: 1, x: 2
# sum_so_far: 3, x: 3
# sum_so_far: 6, x: 4
# 10

println(sum)

# 10
</pre>
</div>

<p>
This higher order functions allow us to write functions without using
loops in some cases avoiding stateful iterations and mutation of
variables.
</p>
</div>
</li>
</ol>
</div>
</div>
<div id="outline-container-pure-functions" class="outline-3">
<h3 id="pure-functions"><span class="section-number-3">1.5.</span> Pure Functions</h3>
<div class="outline-text-3" id="text-pure-functions">
<p>
Pure functions has to accomplish two properties:
</p>

<ul class="org-ul">
<li>They always return the same value given the same arguments.</li>
<li>Running them causes no side effects</li>
</ul>


<div id="org2b8ff51" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///functional_julia/pure_function.png" alt="pure_function.png" />
</p>
<p><span class="figure-number">Figure 1: </span>pure</p>
</div>

<div class="org-src-container">
<pre class="src src-julia">function findMax(nums)
    max_val = -Inf
    for num in nums
        if max_val &lt; num
            max_val = num
        end
    end
    return max_val
end
</pre>
</div>

<p>
Let's compare with this other case
</p>

<div class="org-src-container">
<pre class="src src-julia"># instead of returning a value
# this function modifies a global variable
global_max = -Inf

function findMax(nums)
    global global_max
    for num in nums
        if global_max &lt; num
            global_max = num
        end
    end
end
</pre>
</div>

<p>
In the first case we keep a function which clearly define an input and
return and output while in the second case we produce a global variable
that change the state of this (breaking the rule of inmutability) and
does not return anything but our global variable has changed. In
summary, pure functions:
</p>

<ul class="org-ul">
<li>Return the same result if given the same input, so they are
deterministic (which no randomness is involved in the development of
future states of the system.). Also there is the term
<a href="https://www.baeldung.com/cs/referential-transparency#referential-transparency">referentially
transparent</a></li>
<li>Do not change the external state of the program. For example, they do
not change any variables outside of their scope.</li>
<li>Do not perform any I/O operation like printing, accessing to data via
HTTP or reading files.</li>
</ul>
</div>
</div>
<div id="outline-container-reference-and-value" class="outline-3">
<h3 id="reference-and-value"><span class="section-number-3">1.6.</span> Reference and Value</h3>
<div class="outline-text-3" id="text-reference-and-value">
<p>
There are functions that allow you to pass by references, this are
mutable, you can see this when appending values in a list. In this case
the function has access to the original value. For other side a function
that receive variables as values are receiving copy of the original and
do not attemt to change the original (inmutability), you can do in Julia
using <code>deepcopy(var)</code> to create copies
</p>
</div>
<div id="outline-container-pass-by-reference-impurity" class="outline-4">
<h4 id="pass-by-reference-impurity"><span class="section-number-4">1.6.1.</span> Pass by Reference Impurity</h4>
<div class="outline-text-4" id="text-pass-by-reference-impurity">
<p>
To avoid side effects we can create copies of the variables inside of a
function without changing any variables that is out of the scope (this
includes the input of the function)
</p>

<div class="org-src-container">
<pre class="src src-julia">function remove_format(default_formats, old_format)
    new_formats = deepcopy(default_formats)
    new_formats[old_format] = false
    return new_formats
end
</pre>
</div>

<p>
With this we avoid mutating any input or global variable making it
easier to debug and test.
</p>
</div>
</div>
</div>
<div id="outline-container-input-and-output" class="outline-3">
<h3 id="input-and-output"><span class="section-number-3">1.7.</span> Input and Output</h3>
<div class="outline-text-3" id="text-input-and-output">
<p>
While I/O operations are part of impure functions, these are necessaries
(or our program is completely useless) so It tries to use only when is
<b>neccesary</b>.
</p>
</div>
</div>
<div id="outline-container-no-op" class="outline-3">
<h3 id="no-op"><span class="section-number-3">1.8.</span> NO-OP</h3>
<div class="outline-text-3" id="text-no-op">
<p>
Functions that does <i>nothing</i>, or better said doesn't return anything,
probably are impure functions
</p>

<div class="org-src-container">
<pre class="src src-julia">function square(x)
    x * x
end
</pre>
</div>

<p>
That function doesn't do anything, but also there are functions that
perform some side effect:
</p>

<div class="org-src-container">
<pre class="src src-julia">y = 5
function add_to_y(x)
    global y
    y += x
end

add_to_y(3)
# y = 8
</pre>
</div>

<blockquote>
<p>
Even the print() function technically has an impure side effect
</p>
</blockquote>
</div>
</div>
<div id="outline-container-memoization" class="outline-3">
<h3 id="memoization"><span class="section-number-3">1.9.</span> Memoization</h3>
<div class="outline-text-3" id="text-memoization">
<p>
This is storing a copy of a result a computation so we don't have it to
compute it again in the future, it holds a trade-off between memory and
speed. This only can be achieved with pure functions.
</p>

<div class="org-src-container">
<pre class="src src-julia">const fibmem = Dict{Int,Int}()
function fib(n)
    get!(fibmem, n) do
        n &lt; 3 ? 1 : fib(n-1) + fib(n-2)
    end
end
</pre>
</div>
</div>
</div>
</div>
<div id="outline-container-recursion" class="outline-2">
<h2 id="recursion"><span class="section-number-2">2.</span> Recursion</h2>
<div class="outline-text-2" id="text-recursion">
<p>
Function that define itself, for example the classic factorial. This
kind of functions are quite useful for unknown <i>tree structure</i>
</p>

<div class="org-src-container">
<pre class="src src-julia">function factorial_rec(x)
    if x == 0 
        return 1
    else
        return x * factorial_rec(x - 1)
    end
end

julia&gt; factorial_rec(0)
1

julia&gt; factorial_rec(3)
6
</pre>
</div>

<p>
A recursive function should have some dangerous edge case that deserve
attention:
</p>

<ol class="org-ol">
<li>Requires base case to avoid infinite loops.</li>
<li>Each function call requires a bit of memory, so in long trees
structures can cause a <b>stack overflow</b> and will crash your program</li>
<li>In some languages recursion is slow, like python where is even slower
than loops. Use of
<a href="https://exploringjs.com/es6/ch_tail-calls.html">Tail call
Optimizations</a> can deal with that</li>
</ol>
</div>
<div id="outline-container-function-transformations" class="outline-3">
<h3 id="function-transformations"><span class="section-number-3">2.1.</span> Function Transformations</h3>
<div class="outline-text-3" id="text-function-transformations">
<p>
Specific type of Higher order functions that receive functions as input
and return functions as output, special for some cases of code
reusability
</p>

<div class="org-src-container">
<pre class="src src-julia">function multiply(x, y)
    return x * y
end

function add(x, y)
    return x + y
end

# self_math is a higher order function
# input: a function that takes two arguments and returns a value
# output: a new function that takes one argument and returns a value
function self_math(math_func)
    function inner_func(x)
        return math_func(x, x)
    end
    return inner_func
end

square_func = self_math(multiply)
double_func = self_math(add)

println(square_func(5))
# prints 25

println(double_func(5))
# prints 10
</pre>
</div>
</div>
</div>
<div id="outline-container-closures" class="outline-3">
<h3 id="closures"><span class="section-number-3">2.2.</span> Closures</h3>
<div class="outline-text-3" id="text-closures">
<p>
A closure is a function that references variables from outside its own
function body. The function definition and its environment are bundled
together into a single entity so a closure can change the value outside
its body
</p>


<div id="org164ae89" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///functional_julia/closure.png" alt="closure.png" />
</p>
<p><span class="figure-number">Figure 2: </span>closure</p>
</div>

<div class="org-src-container">
<pre class="src src-julia">julia&gt; function make_adder(amount)
           function add(x)
               return x + amount
           end
       end;

julia&gt; add_one = make_adder(1);

julia&gt; add_two = make_adder(2);

julia&gt; 10 |&gt; add_one
11

julia&gt; 10 |&gt; add_two
12
</pre>
</div>

<blockquote>
<p>
In the case of Julia, generate global variables can cause
<a href="https://docs.julialang.org/en/v1/manual/performance-tips/#man-performance-captured-1">Type
Instability</a> and there are some discussions about avoiding closures
when performance is required, However that doesn't mean that using
closures should be avoided completely, a lot of discussions are
<a href="https://discourse.julialang.org/t/should-closures-be-avoided/96073/8">here</a>
also interesting content
<a href="https://m3g.github.io/JuliaNotes.jl/stable/anonymous/">here</a>
</p>
</blockquote>

<p>
Naturally if a function can change a a non local variable then is not a
pure function, so many cases closures are not pure functions because
they can mutate outside of their scope and have side effects.
</p>

<blockquote>
<p>
Notice that also there are concept of Decorators in some languages like
Python, that are just syntactic sugar for higher order functions
</p>
</blockquote>
</div>
</div>
<div id="outline-container-currying" class="outline-3">
<h3 id="currying"><span class="section-number-3">2.3.</span> Currying</h3>
<div class="outline-text-3" id="text-currying">
<p>
Function currying is a specific kind of function transformation where we
translate a single function that accepts multiple arguments into
multiple functions that each accept a single argument.
</p>


<div id="org227b53a" class="figure">
<p><img src="https://indymnv.xyz/posts/tech/file:///functional_julia/currying.jpeg" alt="currying.jpeg" />
</p>
<p><span class="figure-number">Figure 3: </span>currying</p>
</div>

<p>
This is a normal function without currying
</p>

<div class="org-src-container">
<pre class="src src-julia">function sum(a,b)
    return a+b
end
</pre>
</div>

<p>
With currying
</p>

<div class="org-src-container">
<pre class="src src-julia">function sum(a)
    function inner_sum(b)
        return a + b
    end
    return inner_sum
end
</pre>
</div>

<p>
With this option now we can return a function as a value (inner<sub>sum</sub>) and
<b>change it's signature</b> to make it conform to specific parameter
</p>
</div>
</div>
<div id="outline-container-wrapping-up" class="outline-3">
<h3 id="wrapping-up"><span class="section-number-3">2.4.</span> Wrapping up</h3>
<div class="outline-text-3" id="text-wrapping-up">
<p>
These are just basic ideas about functional programming, there are more
concepts to deal with, but at least here is an starting point for people
like me who is not a cs person&#x2026;
</p>
</div>
</div>
</div>
]]>
</description></item>
</channel>
</rss>
