# [OPE-L:5614] ope-l: RE: In defence of correlation

andrew kliman (Andrew_Kliman@CLASSIC.MSN.COM)
Thu, 16 Oct 97 16:06:30 UT

[ show plain text ]

A brief reply to Allin's PIAF
----------
From: owner-ope-l@galaxy.csuchico.edu on behalf of Allin Cottrell
Sent: Wednesday, October 15, 1997 4:55 PM
To: ope-l@galaxy.csuchico.edu
Subject: [OPE-L:5614] In defence of correlation

I thank Allin for the mathematical deductions, which nailed down our
intuitions on the matter of the aggregate sectoral price-value correlations.

Allin:
"(b) Given b, the correlation coefficient will be larger, the
greater is the dispersion of sector sizes (i.e. the greater
is x for given k)... up to a point ....

"Point (b) is, I think, what troubles Alan. It means that r(XV,V) does not
_just_ reflect the dispersion of price-to-value ratios; it is 'contaminated'
by the dispersion of sector sizes."

I'll let Alan speak for himself, but this is exactly what troubles me.
Allin's explanation of the implications of this problem is quite good:

"imagine, if you will, two economies A and B: each has N sectors, and the
distribution of the price-to-value ratios for these sectors is identical in A
and B. ... [but] in A the sectors are all of similar size; in B there is a
wide dispersion of sector sizes. As a matter of arithmetic, the sectoral
price-value correlation will be greater for B. Alan holds that this is
spurious: it would be quite wrong to infer that B is a 'better' case in point
for the labour theory of value."

Allin then objects that "If _sectoral values and prices_ were distributed
independently, then one would expect to see the greater dispersion of sector
sizes in B reflected in a greater dispersion of sectoral price-to-value
ratios. That one does _not_ see this suggests the presence of stronger forces
limiting price-value dispersion in B."

This is an important point, IMO, because it reveals the **implicit null
hypothesis** that C-C test by means of their correlations: sectoral aggregate
values and prices are distributed independently of one another, i.e., **the
correlation coefficient is zero**. This is indeed why the results of such
studies appear superficially to be striking and even breathtaking: one judges
the degree of relationship by measuring NOT ONLY how close it is from 1, but
ALSO how far it is from zero. Or more precisely, one measures the
coefficients' relative distance from both zero and 1. Thus, we say a
correlation of .05 is low and a correlation of .95 is high, because the first
is close to zero and far from 1 and the second is close to 1 and far from
zero.

In other words, as correlation coefficients are normally interpreted, it is
crucial that a coefficient of zero signify something meaningful. Allin tells
us precisely what a zero correlation would mean in this case: sectoral
aggregate values and prices would be distributed independently of one another.

The problem is that this is not a *reasonable* null hypothesis. On the
contrary, I expect that almost all measures of industry size will be
positively correlated with one another -- and that the correlations will be
far from zero. If we take aggregate sectoral price as our measure of industry
size, then I expect that other measures of industry size -- aggregate sectoral
value, but also employment, energy usage, "capital" stock, total assets,
advertising expenditures, total profit, etc. -- will be correlated positively
with it. The mere existence of a positive correlation between aggregate
sectoral price and aggregate sectoral value no more constitutes evidence in
favor of the so-called "labor theory of value" than the existence of a
positive correlation between aggregate sectoral price and advertising
expenditures constitutes evidence in favor of the theory that value (i.e.,
price) is determined in the market.

I think Allin will actually agree with this. So the question turns from the
mere *existence* of a positive correlation to the *size* of the coefficient.
What this implies, however, is that the correlation coefficient is already a
misleading statistic, since, as I have noted, the normal interpretation of
correlation depends crucially upon a zero coefficient denoting something
meaningful.

Moreover, the issue of size returns us to the question I've already raised a
few times in this discussion, namely, what is a *reasonable* benchmark number
that would signify refutation of the hypothesis that relative values determine
relative prices? Whatever measure one uses, shift-shares, MADs, MAWDs,
correlations, etc., this issue arises. I have proposed using the number
obtained from the "naive hypothesis" that surplus-value is randomly
distributed as the benchmark figure. Thus, for instance, if the aggregate
sector price-value correlation coefficient is actually .98, and if its
expected value were the naive hypothesis true would be .97, then .97 takes the
place of a correlation of zero as the *reasonable* lower bound. So instead of
saying the correlation of .98 is high because it is 98o much closer to 1
than to 0, we would say it is not so high because it is somewhat closer to .97
than to 1. Furthermore, I have recommended as a test of statistical
significance that one measure and report the probability that a random
distribution of surplus-value would yield a figure (correlation coefficient,
in this case) that exceeds the actual one -- i.e., what is the probability
that the correlation coefficient would be greater than .98, were the naive
hypothesis correct.

The point is that, in the absence of such hypothesis testing, *none* of the
measures of price-value deviation can answer the questions they pose. Even in
the case of the shift-share index, in which the upper and lower bounds do have
a transparent economic meaning (0 means there's no redistribution of
surplus-value; 1 means that all surplus-value is received elsewhere than it
was generated), we need to know not how far the index number is from 1, but
how far it is from the *reasonable* upper bound (perhaps .33), i.e., the value
the index would have were the naive hypothesis true.

I realize I still owe Allin a response to his post on the shift-share index.
I won't have the time in the next week to give a rigorous response, so let me
just make a few points. He questions whether there is a difference between
the expectations generated by the naive hypothesis and those generated by his
(probabilistic) "labor theory of value." That he isn't sure strikes me as a
mark against his theory. If it is unable to generate hypotheses that allow
one to discriminate between it and the naive hypothesis, then the two are
identical for practical purposes, and C-C's claims reduce to the obvious point
that firms recoup their costs and get some random cut of surplus-value.

However, C-C have also claimed that "values" are good predictors of prices.
As generally understood, this implies two things.

First, that they are "unbiased" predictors of prices: a sector's mean price
will equal its value. The evidence we have indicates that this isn't true.
Enough evidence -- some coming from C-C themselves -- has been reported on
this list to allow us to conclude that prices deviate systematically from
values due to rent but also due to the fact that sectors with larger
capital/labor ratios tend to have higher price/value ratios. Paul Cockshott,
for instance, reported that prices on average lie midway between values and
production prices. So values are a biased predictor of market prices. (BTW,
in answer to one question of Allin's: if the sectoral (Cu+V)'s are equal,
where Cu is used-up constant capital, *and* if the ratios of Cu to total C are
also equal, then the expected mean prices according to the naive hypothesis
equal production prices, not values. I.e., on average, each sector would get
the same amount of profit were the naive hypothesis true. If their capital
investments are also equal, then prices equal production prices on average.)

However, there's a second issue, to my mind a more important one: variance.
For a predictor to be good, the variance of the observations around it (if it
is reasonably close to the mean of the observations) must be reasonably small.
(It is one thing if prices are less than 10 0gher or lower than values 90%
of the time; it is another thing if prices are less than 10 0gher or lower
than values only 200f the time.) The purpose of my naive hypothesis is
precisely to assess whether this is the case.

Thus, while I understand and accept Allin's point that his theory is
probabilistic, I would suggest that the claim that values are good predictors
of prices signifies that the variance of prices around values is considerably
smaller than would be the case were surplus-value to be distributed randomly.
If C-C do not mean to imply this, I would suggest that they not call values
good predictors of prices, or at least let us all know that they are using the
term in a sense other than that which we take for granted. If, however, they
do indeed mean to imply that the variance of prices around values is small, I
think they -- and others working in this field, too, of course -- need to
provide us with reasonable information about the dispersion. We need to know
whether (a) the variance of prices is sizably less than would be expected to
result from a random distribution of surplus-value and (b) the sample size is
large enough to reject the hypothesis of random distribution (the naive
hypothesis).

Andrew Kliman