We have already learned in this article .The basic concept of Descriptive Statistics which is best way to take an inclusive image about your dataset .As it just use single value to give us a good understanding about out data this single value can be classified into 4 main category according to information gain from using it. Now we will move a step ahead and find a single value, which will help us to identify spreadness of our Data

The most commonly used Summary statistics can be classified according to their purpose to the following category;



Before preparing frequency distribution it is necessary to collect data of the required nature from the various sources. The data collected is always in raw form which is needed to be arranged in a proper arrangement for the purpose of inferring the required results. You need to do certain preparations for frequency distribution, which includes the classification and tabulation of data. Both of them are explained in detail as follows;


“The process of arranging data into classes or categories according to some common characteristics present in the data is called classification”

Collected data are usually available in a form which…

What Are Variables?

In statistics, a variable has two defining characteristics:

  • A variable is an attribute that describes a person, place, thing, or idea.
  • The value of the variable can “vary” from one entity to another.

Variables in statistics can describe either quantities, or qualities.

Quantitative variable

For instance, the Height variable for example describe tall of object. Generally, a variable that describes how much there is of something describes a quantity, and, for this reason, it’s called a quantitative variable. Usually, quantitative variables describe a quantity using real numbers, but there are also cases when words are used instead. …

In this article i will complete what i have started in this article

In part one I have covered the following methods for validation :

  • Re-Substitution
  • Train/test split
  • k-Fold Cross-Validation
  • Leave-one-out Cross-Validation
  • Leave-one-group-out Cross-Validation

So let’s continue what we started by covering the following validation methods:

  • Random Subsampling.
  • Bootstrapping
  • Nested Cross-Validation
  • Time-series Cross-Validation
  • Stratified Cross validation
  • Wilcoxon signed-rank test
  • McNemar’s test
  • 5x2CV paired t-test
  • 5x2CV combined F test

1. Random Subsampling

In this technique, multiple sets of data are randomly chosen from the dataset and combined to form a test dataset. The remaining data forms the training dataset. The following diagram represents the…


In machine learning, model validation is referred to as the process where a trained model is evaluated with a testing data set. The testing data set is a separate portion of the same data set from which the training set is derived. The main purpose of using the testing data set is to test the generalization ability of a trained model.Model validation is carried out after model training. Together with model training, model validation aims to find an optimal model with the best performance.


Starbucks has provided a dateset that emulates the behavior of customers using the Starbucks rewards mobile app. The Starbucks app provides a way to advertise and share offers with the customers. Customers can also use it to pay at the stores. Starbucks sends different types of advertisement and offers once every few days to their customers. A customer might get one of the following:

  • Informational offer (i.e., mere advertisement)
  • Discount offer
  • Buy one get one free (BOGO) offer

Discount and BOGO offers have a challenge, that is, the customer must make a minimum purchase before it can redeem the offer…

Seattle in night from wallpaper access


Since 2008, guests and hosts have used Airbnb to travel in a more unique, personalized way. As part of the Airbnb Inside initiative, this dataset describes the listing activity of home stays in Seattle, WA.

For this Udacity Data Scientist Nanodegree first project, we are using Seattle AirBNB data to analyze the following three business questions:

Seasonal pattern

  1. What is the seasonal pattern of AirBNB in Seattle?
  2. What are the most popular AirBNB Houses in Seattle?

Pricing Trends

  1. How does pricing increase or decrease by season and what is the peak season in Seattle?
  2. How does pricing increase or decrease by neighborhood and which…

