One hot Encoding using:
Categorical data is a type of data that is used to group information with similar characteristics while Numerical data is a type of data that expresses information in the form of numbers.
We will also refer to a cheat sheet that…
A Beginners Guide to Implement Feature Selection in Python using Filter Methods. To the Point, Guide Covering all Filter Methods| Easy Implementation of Concepts and Code
Feature selection, also known as variable/predictor selection, attribute selection, or variable subset selection, is the process of selecting a subset of relevant features for use in machine learning model construction.
Image Ref: Unsplash
Time series data also referred to as time-stamped data, is a sequence of data points indexed in time order. Time-stamped is data collected at different points in time. These data points typically consist of successive measurements made from the same source over a time interval and are used to track change over time.
While dealing with time-Series data analysis we need to combine data into certain intervals like with each day, a week, or a month.
We will solve these using only 2 Pandas APIs i.e. resample() and GroupBy().
The resample() function is used to resample time-series…
Project Gutenberg is a volunteer effort to digitize and archive cultural works, to “encourage the creation and distribution of eBooks”. It was founded in 1971 by American writer Michael S. Hart and is the oldest digital library. Most of the items in its collection are the full texts of public domain books.
It’s a repository of Over 60,000 books.
Link to the project: https://www.gutenberg.org/
Patterns within the written text are not the same across all authors or languages. …
In this blog, we will see the amazing types of mini-reports and EDA generated by Pandas Profile, how can we analyze data from this, how to save the report in HTML and other format so as to be able to give instant presentation and drive amazing data analysis from it.
Pandas profiling is a package of Pandas that lets you do Exploratory analysis of your database. Much like the pandas
df.describe() function (which does basic EDA)
pandas_profiling extends the analysis of DataFrame with
df.profile_report() for getting a complete Report.
Pandas Profiling is an incredible open-source tool that every data scientist…
Our task is to create an animated bar chart race(bcr) for the number of country-wise covid-19 cases between the time period of Feb 2020 to April 2021.
Unlike other tutorials that allow you to use a pre-loaded bcr dataset, we will process, and clean our own dataset for the race chart bar
Our Problem statement would be Covid-19 case records around the world.
“Hope is being able to see that there is light despite all of the darkness.” — Desmond Tutu
You can find the raw data here: https://github.com/shelvi31/Animated-Bar-Graph/blob/main/worldometer_coronavirus_daily_data.csv
This is the most to-the-point tutorial on how to install the ELK — Elasticsearch stack through Docker. If you follow the step-by-step guide you will face no trouble in following the suit.
Before moving ahead, I assume you have downloaded the Docker Desktop. If you haven't here the link: https://www.docker.com/products/docker-desktop
Step 1: Create a Directory /Folder with the name of ELK.
Step 2: Open the ELK Directory in your IDE (VS Code in my case).
Step 3: Create a File named docker-compose.yml in this directory.
Step 4: Copy-Paste the following commands : (it mentions some…
Concise Notes for Skewness and Kurtosis
We generally use moments in statistics, machine learning, mathematics, and other fields to describe the characteristics of a distribution.
Let’s say the variable of our interest is X then, moments are X’s expected values. For example, E(X), E(X²), E(X³), E(X⁴),…, etc.
Figure 1: Moments in Statistics.
1) First Moment: Measure of the central location. (MEAN)
2) Second Moment: Measure of dispersion/spread.(VARIANCE)
3) Third Moment: Measure of asymmetry.
4) Fourth Moment: Measure of outliers/tailedness.
Now we are very familiar with the first moment(mean) and the second moment(variance).
The third moment is called skewness, and the…
A step-by-step guide to getting started with Seaborn!
If matplotlib “tries to make easy things easy and hard things possible”, seaborn tries to make a well-defined set of hard things easy too.
Seaborn’s greatest strengths are its diversity of plotting functions. It allows us to make complicated plots even in a single line of code!
In this tutorial, we will be using three libraries to get the job done — Matplotlib, Seaborn, Pandas. If you are a complete beginner to Python, I suggest starting out and getting a little familiar with Matplotlib and Pandas.
If you follow along with this…
I am a Data Scientist Associate. Though my interests and learnings are not limited! :)