মেশিন লার্নিংয়ের জন্য স্ট্যাটিসটিক্স কনসেপ্ট এবং এক্সেল কমান্ড

0

Arthur Samuel says,

Machine Learning is a field of study that gives computers the ability to learn without being explicitly programmed.

If we want to talk about learning (Machine) then we would consider data. Cause, If a machine wants to learn then how can it is learned? Obviously, the answer is the different source of data. Today’s topic not about machine learning but it very close to this area. Now, I am going to talk about basic Statistics concept for machine learning. Let’s go to our cheat sheet for Statistics –

In statistics if you want to consider data then you have two options :

  • Types of Data
  • Level of Measurement

In statistics, we’ve mainly two types of data

  • Categorical
  • Numerical

Numerical data are two types –

  • Discrete: Such as – SAT score, Children you want to have.
  • Continuous: weight, height etc.

 

 

Now I am talking about some popular concept about statistics:

Mean: It’s the simple average of the datasets. Mind it, mean easily affected by outliers.

Equation:

Related image

The mean is calculated by excel, = AVERAGE()

Median: The median is the midpoint of the orders datasets. In an ordered data set, the median is the number at position

Equation:

Related image

In Excel,  = MEDIAN()

Mode: The mode is calculated simply by finding the value with the highest frequency.

Image result for mode formula

= MODE.SNGL() – Returns one mode

= MODE.MUL() – Returns an array with the modes. It’s used when we have more than 1 mode.

Skewness: Skewness is a measure of asymmetric that indicates whether the observation in datasets is concentrated on one side.

= SKEW() [The Skewness is calculated by]

Related image

Variance & Standard Deviation: Variance and SD measure the dispersion of a set of data points around its mean value.

Image result for variance and standard deviation

(Sample is a subset of the population and population is a collection of all items of interest to this specific study) . In Excel,

Sample Variance = VAR.S()

Population Variance = VAR.P()

Sample SD = STDWV.S()

Population SD = STDEV.P()

Covariance: Covariance is a measure of the joint variability of two variables.

  • A positive CV means that the two variables move together
  • A covariance of 0 means that two variables are independent
  • A negative covariance means that two variables move in opposite directions

In excel,
Sample CV = COVARIANCE.S()

Population CV = COVARIANCE.P()

Correlation: Correlation is a measure of the joint variability of two variables.

  • A correlation of 1, known as perfect positive correlation means that one variable is perfectly explained by other
  • A correlation of 0 means, that the variables are independent.
  • -1, known as perfect negative correlation, means that they’re explaining others but move in opposite directions.

Image result for correlation

In excel, calculated by

= CORREL()

 

Written By,
Raju Ahmed Rony
Machine Learning Enthusiast
Manager, Idea & Innovation, Youth Carnival
Dept. of Software Engineering
Daffodil International University
LinkedIn: https://www.linkedin.com/in/rarony/

Leave a Reply

Your email address will not be published. Required fields are marked *