Statistik Dan Ekonometrik [Compatibility Mode]

1-1

Statistik dan EkonometrikStatistik dan Ekonometrik

Oleh:Prof. Tri Widodo, Ph.D

1-2

Statistics is the science of collecting, organizing,

ti l ipresenting, analyzing, and interpreting numerical data to assistnumerical data to assist in making more effective decisions.

μ Σμ

λΣ

σβFirst PhD Class: Lets cross the street and do your

economics!

What is Meant by Statistics?

1-3

Statistical techniques are used extensively by marketing, accounting, quality control, consumers professionalconsumers, professional sports people, hospital administrators,administrators, educators, politicians, physicians, and many others.

Who Uses Statistics?

1-4

Descriptive StatisticsDescriptive Statistics: Methods of organizing, esc p ve S s csesc p ve S s cs g g,summarizing, and presenting data in an informative way.

EXAMPLE : Election Result

Types of Statistics

1-5

Inferential StatisticsInferential Statistics:: A decision, estimate, prediction, or generalization about a population, based on a sample.

A PopulationPopulationi CollectionCollection

A SampleSample is a iis a CollectionCollection

of all possible individuals,

portion, or part, of the population of interestindividuals,

objects, or measurements of i

of interest

interest.

Types of Statistics

1-6

Contoh: “Hasil penelitian heboh mengenai virginitas: 93% mahasiswi di Yogya sudah tidak virgin”mahasiswi di Yogya sudah tidak virgin

PopulasiSampling

PopulasiSampel

Parameter StatisticsPengujian Hipotesa

Statistik Inferensi

Pengujian Hipotesa

Ho:……; H1:……

1-7

DATA

Qualitative or attribute(type of car owned)

Quantitative or numerical(type of car owned)

discrete(number of children)

continuous(time taken for an exam)

Summary of Types of Variables

1-8

Describing Data: Frequency Distributions and Graphic Describing Data: Frequency Distributions and Graphic PresentationPresentation

Organize data into a frequency distributionOrganize data into a frequency distribution.

Portray a frequency distribution in a histogram frequencyPortray a frequency distribution in a histogram, frequency polygon, and cumulative frequency polygon.

Present data using such graphic techniques as line charts bar charts and pie chartscharts, bar charts, and pie charts.

1-9

The three commonly used graphic forms are Histograms, Frequency PolygonsHistograms, Frequency Polygons, and a

Cumulative FrequencyCumulative Frequency distribution.

A i i h i hi h h l

q yq y

A Histogram is a graph in which the class midpoints or limits are marked on the horizontal axis and the class frequencies on the vertical axisaxis and the class frequencies on the vertical axis. The class frequencies are represented by the heights of the bars and the bars are drawnheights of the bars and the bars are drawn adjacent to each other.

Graphic Presentation of a Frequency Distribution

1-10

Describing Data: Numerical MeasuresDescribing Data: Numerical Measures

3- 10

Compute and interpret the range the mean deviation the

Describing Data: Numerical MeasuresDescribing Data: Numerical Measures

Compute and interpret the range, the mean deviation, the variance, and the standard deviation of ungrouped data.

Explain the characteristics, uses, advantages, and disadvantages of each measure of dispersion.

1-11

It is calculated by i th l

The Arithmetic MeanArithmetic Mean is 3- 11

summing the values and dividing by the number of values.

the most widely used measure of location and shows the central value of fshows the central value of the data.

The major characteristics of the mean are: Average Joe

It requires the interval scale.All values are used.It is unique.The sum of the deviations from the mean is 0.

Characteristics of the Mean

1-12

Th MedianMedian i th

3- 12

The MedianMedian is the midpoint of the values after they have been ordered There are as many

values above the median as below it in

they have been ordered from the smallest to the largest. median as below it in

the data array.g

For an even set of values, the median will be the arithmetic average of the two middle numbers and is g f

found at the (n+1)/2 ranked observation.

The Median

1-13 3- 13

The ModeMode is another measure of location and represents the value of the observation that appears most frequently.

Data can have more than one mode. If it has two modes, it is referred to as bimodal, three modes, trimodal, and the like.

1-14

Dispersion Dispersion 25

30

3- 14

pprefers to the spread or 15

20

25

variability in the data.

0

5

10

M f di i i l d th f ll i rangerange

00 2 4 6 8 10 12

Measures of dispersion include the following: range, range, mean deviation, variance, and standard mean deviation, variance, and standard deviationdeviation.

RangeRange = Largest value – SmallestRange Range = Largest value – Smallest value Measures of Dispersion

1-15

Sample variance (sSample variance (s22))

3- 15

Sample variance (sSample variance (s22))

2

s2 =Σ(X - X)2

n-1n 1

Sample standard deviation (s)Sample standard deviation (s)Sample standard deviation (s)Sample standard deviation (s)

2ss =Sample variance and standard deviation

1-16Chapter FourDescribing Data: Displaying and ExploringDescribing Data: Displaying and ExploringDescribing Data: Displaying and Exploring Describing Data: Displaying and Exploring

DataData

Develop and interpret a stem-and-leaf display.

Develop and interpret a dot plot.

Compute and interpret quartiles, deciles, and percentiles.

Construct and interpret box plots.Construct and interpret box plots.

Compute and understand the coefficient of variation and the coefficient of skewness.

Draw and interpret a scatter diagram.

Set up and interpret a contingency table.Set up and interpret a contingency table.

1-174-17

Dot Plot

Dot plots:Report the details of each observationAre useful for comparing two or more data setsAre useful for comparing two or more data sets

Dot Plot

1-184-18

Stem-and-leaf Displays

Note: an advantage

f p y

Stem-and-leaf gof the stem-and-leaf display over a f

fdisplay: A statistical technique for di l i t f frequency

distribution is we do not lose the identity

displaying a set of data. Each numerical value is y

of each observation.divided into two parts: the leading digits become thedigits become the stem and the trailing digits the

Stem-and-leaf Displaysleaf.

1-194-19

b lA box plot is a graphical display, based on quartiles, that helps to picture a set of

Five pieces of data

that helps to picture a set of data.

Five pieces of data are needed to construct a box plot: the Minimum Value, the First Quartile, the Median thethe Median, the Third Quartile, and the Maximum V l

Box Plots

Value.

1-204-20

Q1 Q3MaxMin Median

12 14 16 18 20 22 24 26 28 30 32

1-214-21

Skewness is the measurement of themeasurement of the lack of symmetry of the distribution.

Thecoefficient of k A value of 0 indicates a symmetricskewnesscan range

from -3.00 up to 3.00 when using the following

A value of 0 indicates a symmetric distribution.

when using the following formula: Some software packages use a different formula

which results in a wider range for the coefficient.

( )( )s

MedianXsk−= 3

Movie

s

1-224-22

Scatter V i bl t b t l t i t l l dScatter diagram: A technique

Variables must be at least interval scaled.

qused to show the

Relationship can be positive (direct) or negative (inverse).

relationship between variables

negative (inverse).

variables.

ExampleThe twelve days of stock prices and the overall market index on each day are given as follows:

Scatter diagram

1-23

i i d C fid li i d C fid lEstimation and Confidence IntervalsEstimation and Confidence Intervals

Construct a confidence interval for the population proportionConstruct a confidence interval for the population proportion.

.

1-24

A confidence intervalA point estimate isa s i n g l e v a l u e f

is a range of valuesw i t h i n w h i c h t h e

a s i n g l e v a l u e(statistic) used toe s t i m a t e a

population parameteris expected to occur.

e s t i m a t e apopulation value( p a r a m e t e r ) .

The two confidencei t l th t dintervals that are usedextensively are the95% and the 99%

An Interval Estimates t a t e s t h e r a n g ew i t h i n w h i c h a 95% and the 99%.w i t h i n w h i c h apopulation parameterp r o b a b l y l i e s .

Point and Interval Estimatesp r o b a b l y l i e s .

1-25

If the populationIf the population standard deviation is unknown, the

stX ±

underlying population is approximately normal, and the sample

ntX ±, p

size is less than 30 we use the t distribution.

The value of t for a given confidence level depends upon its degrees of freedom.

Point and Interval Estimates

1-26

Confidence interval for the mean

nszX ±n

95% CI for the population mean

sX 96.1±n


Constructing General Confidence


Xs

± 2 5 8 Constructing General Confidence Intervals for µ

Xn

± 2 5 8.

1-27

Ekonometrik

StatistikaStatistika

Ekonomi Matematika

1-28

i i d C l ii i d C l iLinear Regression and CorrelationLinear Regression and Correlation

Draw a scatter diagram.Understand and interpret the terms dependent variable and independent variable.Calculate and interpret the coefficient of correlation, the coefficient of determination, and the standard error of estimate.Conduct a test of hypothesis to determine if the population coefficient of correlation is different from zero.

Calculate the least squares regression line and interpret the slope and intercept values.Construct and interpret a confidence interval and prediction interval for the dependent variable.Set up and interpret an ANOVA table.

1-29

Correlation AnalysisCorrelation Analysis is a group of statistical techniques to yy g p f qmeasure the association between two variables.

A Scatter DiagramScatter DiagramAdvertising Minutes and $ Sales

A Scatter DiagramScatter Diagramis a chart that portrays the relationship 5

10

15

20

25

30

ales

($th

ousa

nds)

the relationship between two variables.

0

5

70 90 110 130 150 170 190

Advertising Minutes

Sa

The Independent Independent VariableVariable provides the

The Dependent Dependent VariableVariable is the variable

basis for estimation. It is the predictor variable.

VariableVariable is the variable being predicted or estimated.

Correlation Analysis

1-30

The Coefficient of CorrelationCoefficient of Correlation (r) is a measure of the ff fff f fstrength of the relationship between two variables.

Also called Pearson’s r and P ’ d

It requires interval or

Pearson's r

Pearson’s product moment correlation coefficient.

qratio-scaled data.

It can range fromIt can range from -1.00 to 1.00.

Values of -1 00 or 1 00

-1 10

Values of -1.00 or 1.00 indicate perfect and strong correlation.Negative values indicate an inverse relationship and positive values indicate a

gValues close to 0.0 indicate

weak correlation.

The Coefficient of Correlation, rpositive values indicate a direct relationship.

1-31

109876554321

Y

0

0 1 2 3 4 5 6 7 8 9 10

X

Perfect Negative Correlation

1-32

1098765543210

Y

0

0 1 2 3 4 5 6 7 8 9 10

X

Perfect Positive Correlation

1-33

109876554321

Y

0

0 1 2 3 4 5 6 7 8 9 10

X

Zero Correlation

1-34

1099876543210

Y

0

0 1 2 3 4 5 6 7 8 9 100 1 2 3 4 5 6 7 8 9 10

X

Strong Positive Correlation

1-35

The coefficient of determinationcoefficient of determination (r2) is theThe coefficient of determinationcoefficient of determination (r ) is the proportion of the total variation in the dependent variable (Y) that is explained or accounted for by the va iable ( ) t at is explai ed o accou ted fo by t evariation in the independent variable (X).

It is the square of the coefficient of correlation. It ranges from 0 to 1.It does not give any information on the direction of the relationship between the variables.

Coefficient of Determination

1-36

In Regression AnalysisRegression Analysis we use the independentIn Regression AnalysisRegression Analysis we use the independent variable (X) to estimate the dependent variable (Y).

The relationship between the

Both variables must be at leastbetween the

variables is linear.must be at least interval scale.

The least squares criterion is used to determine the equation. That is the term Σ(Y – Y’)2 is minimized.

Regression Analysis

1-37

The regression equation is Y’= a + bXThe regression equation is Y’= a + bX

where where Y’ is the average predicted value of Y for any X.Y’ is the average predicted value of Y for any X.

a is the Ya is the Y--intercept. intercept. It i th ti t d Y l h X 0It i th ti t d Y l h X 0It is the estimated Y value when X=0It is the estimated Y value when X=0

b is the slope of the line or the average changeb is the slope of the line or the average changeb is the slope of the line, or the average change b is the slope of the line, or the average change in Y’ for each change of one unit in Xin Y’ for each change of one unit in X

The least squares principle is used to obtain a The least squares principle is used to obtain a and b.and b.

Regression Analysis

1-38

T i k ihTerimakasih

Statistik Dan Ekonometrik [Compatibility Mode]

Documents

Transcript of Statistik Dan Ekonometrik [Compatibility Mode]