This package performs revision analysis and offers both detailed and summary output including the generation of a visual report. It is composed on a selection of parametric tests which enable the users to detect potential bias (both mean and regression bias) and other sources of inefficiency in preliminary estimates. What we mean by inefficiency in preliminary estimates is whether further revisions are predictable in some way.
This package uses the efficient libraries of JDemetra+ v3. It was built mostly based on Eurostat’s technical reports from D. Ares and L. Pitton (2013).
The next section helps with the installation of the package. The
third section describes how to use the tool and give some important
details on the main functions. In particular, it is important to mention
beforehand that your input data must first be set in a specific format
described in the sub-section on the input
data. The fourth section is theoretical and describes each test
being performed by the main function revision_analysis()
(they could also be performed individually) and how to interpret them.
Finally, you will find out all the reference papers in the last
section.
This package relies on the Java libraries of JDemetra+ v3 and on the package rjd3toolkit of rjdverse. Prior the installation, you must ensure to have a Java version >= 17.0 on your computer. If you need to use a portable version of Java to fill this request, you can follow the instructions in the installation manual of RJDemetra.
In addition to a Java version >= 17.0, you must have a recent version of the R packages rJava (>= 1.0.6) and RProtobuf (>=0.4.17) that you can download from CRAN. Depending on your current version of R, you might also need to install another version of Rtools. (>= 3.6).
This package also depends on the package rjd3toolkit that you must install from GitHub prior to rjd3revsions.
remotes::install_github("rjdverse/rjd3toolkit@*release")
remotes::install_github("rjdverse/rjd3revisions@*release", build_vignettes = TRUE)
Note that depending on the R packages that are already installed on your computer, you might also be asked to install or re-install a few other packages from CRAN.
Finally, this package also suggests the R packages
formattable
and kableExtra
(and
readxl
if you import your input data from Excel)
downloadable from CRAN. You are invited to install them for an enhanced
formatting of the output (i.e., meaningful colors).
Your input data must be in a specific format: long, vertical or diagonal as shown in one of the table below. Regarding the dates, the format shown in the examples below is acceptable, as are the other common date formats for both revision dates and time periods. Note that missing values can either be mentioned as NA (as in the example below) or not be included in the input at the best convenience of the user.
rev_date | time_period | obs_values |
---|---|---|
2022-07-31 | 2022Q1 | 0.8 |
2022-07-31 | 2022Q2 | 0.2 |
2022-07-31 | 2022Q3 | NA |
2022-07-31 | 2022Q4 | NA |
2022-08-31 | 2022Q1 | 0.8 |
… | … | … |
Period | 2023/03/31 | 2023/04/30 | 2023/05/31 |
---|---|---|---|
2022M01 | 15.2 | 15.1 | 15.0 |
2022M02 | 15.0 | 14.9 | 14.9 |
… | … | … | … |
2023M01 | 13.0 | 13.1 | 13.2 |
2023M02 | 12.1 | 12.1 | |
2023M03 | 12.3 |
Period | 2022M01 | 2022M02 | … | 2023M01 | 2023M02 | 2023M03 |
---|---|---|---|---|---|---|
2023/03/31 | 15.2 | 15.0 | … | 13.0 | ||
2023/04/30 | 15.1 | 14.9 | … | 13.1 | 12.1 | |
2023/05/31 | 15.0 | 14.9 | … | 13.2 | 12.1 | 12.3 |
Depending on the location of your input data, you can use
create_vintages_from_xlsx()
or
create_vintages_from_csv()
, or the more generic function
create_vintages()
to create the vintages (see section vintages & revisions) later on. If you plan to
use the generic function, you’ll first need to put your input data in a
data.frame in R as in the example below.
# Long format
long_view <- data.frame(
rev_date = rep(x = c("2022-07-31", "2022-08-31", "2022-09-30", "2022-10-31",
"2022-11-30", "2022-12-31", "2023-01-31", "2023-02-28"),
each = 4L),
time_period = rep(x = c("2022Q1", "2022Q2", "2022Q3", "2022Q4"), times = 8L),
obs_values = c(
.8, .2, NA, NA, .8, .1, NA, NA,
.7, .1, NA, NA, .7, .2, .5, NA,
.7, .2, .5, NA, .7, .3, .7, NA,
.7, .2, .7, .4, .7, .3, .7, .3
)
)
print(long_view)
# Horizontal format
horizontal_view <- matrix(data = c(.8, .8, .7, .7, .7, .7, .7, .7, .2, .1,
.1, .2, .2, .3, .2, .3, NA, NA, NA, .5, .5, .7, .7,
.7, NA, NA, NA, NA, NA, NA, .4, .3),
ncol = 4)
colnames(horizontal_view) <- c("2022Q1", "2022Q2", "2022Q3", "2022Q4")
rownames(horizontal_view) <- c("2022-07-31", "2022-08-31", "2022-09-30", "2022-10-31",
"2022-11-30", "2022-12-31", "2023-01-31", "2023-02-28")
print(horizontal_view)
# Vertical format
vertical_view <- matrix(data = c(.8, .2, NA, NA, .8, .1, NA, NA, .7, .1, NA,
NA, .7, .2, .5, NA, .7, .2, .5, NA, .7, .3, .7, NA,
.7, .2, .7, .4, .7, .3, .7, .3),
nrow = 4)
rownames(vertical_view) <- c("2022Q1", "2022Q2", "2022Q3", "2022Q4")
colnames(vertical_view) <- c("2022-07-31", "2022-08-31", "2022-09-30", "2022-10-31",
"2022-11-30", "2022-12-31", "2023-01-31", "2023-02-28")
print(vertical_view)
Once your input data are in the right format, you create the vintages:
vintages <- create_vintages(long_view, type = "long", periodicity = 4L)
# vintages <- create_vintages_from_xlsx(
# file = "myinput.xlsx",
# type = "long",
# periodicity = 4,
# "Sheet1"
# )
# vintages <- create_vintages_from_csv(
# file = "myinput.csv",
# periodicity = 4,
# sep = "\t",
# date_format = "%Y-%m-%d"
# )
print(vintages) # extract of the vintages according to the different views
summary(vintages) # metadata about vintages
possibly inspect the revisions and perform the revision analysis:
revisions <- get_revisions(vintages, gap = 1)
plot(revisions)
print(revisions) # extract of the revisions according to the different views
summary(revisions) # metadata about revisions
Finally to create the report and get a summary of the results, you
can use the function revision_analysis
rslt <- revision_analysis(vintages, gap = 1, view = "diagonal", n.releases = 3)
summary(rslt)
print(rslt)
To export the report in pdf or html format, you can use the function
render_report
Once your input data are in the right format, you must first create
an object of the class rjd3rev_vintages
before you can run
your revision analysis. The function create_vintages()
(or,
alternatively, create_vintages_from_xlsx()
or
create_vintages_from_csv()
) will create this object from
the input data and display the vintages considering three different data
structures or views: vertical, horizontal and diagonal.
Period | 2023/03/31 | 2023/04/30 | 2023/05/31 |
---|---|---|---|
2022M01 | 15.2 | 15.1 | 15.0 |
2022M02 | 15.0 | 14.9 | 14.9 |
… | … | … | … |
2023M01 | 13.0 | 13.1 | 13.2 |
2023M02 | 12.1 | 12.1 | |
2023M03 | 12.3 |
Period | 2022M01 | 2022M02 | … | 2023M01 | 2023M02 | 2023M03 |
---|---|---|---|---|---|---|
2023/03/31 | 15.2 | 15.0 | … | 13.0 | ||
2023/04/30 | 15.1 | 14.9 | … | 13.1 | 12.1 | |
2023/05/31 | 15.0 | 14.9 | … | 13.2 | 12.1 | 12.3 |
Period | Release1 | Release2 | Release3 |
---|---|---|---|
2023M01 | 13.0 | 13.1 | 13.2 |
2023M02 | 12.1 | 12.1 | |
2023M03 | 12.3 |
Note that in the argument of the function
create_vintages()
, the argument
vintage_selection
allows you to limit the range of revision
dates to consider if needed. See ?create_vintages for more details.
Revisions are easily calculated from vintages by choosing the gap to
consider. The function get_revisions()
will display the
revisions according to each view. It is just an informative function
that does not need to be run prior the revision analysis.
The function revision_analysis()
is the main function of
the package. It provides some descriptive statistics and performs a
battery of parametric tests which enable the users to detect potential
bias (both mean and regression bias) and sources of inefficiency in
preliminary estimates. We would conclude to inefficiency in the
preliminary estimates when revisions are predictable in some way.
Parametric tests are divided into 5 categories: relevancy (check whether
preliminary estimates are even worth it), bias, efficiency,
orthogonality (correlation at higher lags), and signalVSnoise. This
function is robust. If for some reasons, a test fails to process, it is
just skipped and a warning is sent to users with the possible cause of
failure. The other tests are then performed as usual.
For some of the parametric tests, prior transformation of the vintage data may be important to avoid misleading results. By default, the decision to differentiate the vintage data is performed automatically based on unit root and co-integration tests. More specifically, we focus on the augmented Dickey-Fuller (ADF) test to test the presence of unit root and, for cointegration, we proceed to an ADF test on the residuals of an OLS regression between the two vintages. The results of those tests are also made available in the output of the function (section ‘varbased’). By contrast, the choice of log-transformation is left to the discretion of the users based on their knowledge of the series and the diagnostics of the various tests. By default, no log-transformation is considered.
As part of the arguments of the revision_analysis()
function, you can choose the view and the gap to consider, restrict the
number of releases under investigation when diagonal view is selected
and/or change the default setting about the prior transformation of the
data (including the possibility to add a prior log-transformation of
your data).
The function render_report()
applied on the output of
revision_analysis()
will generate an enhanced HTML or PDF
report including both a formatted summary of the results and full
explanations about each test (which are also included in the vignette
below). The formatted summary of the results display the p-value of each
test (except for Theil’s tests where the value of the statistics is
provided) and their interpretation. An appreciation ‘good’, ‘uncertain’,
‘bad’ and ‘severe’ is indeed associated to each test following the usual
statistical interpretation of p-values and the orientation of the tests.
This allows a quick visual interpretation of the results and is similar
to what is done in the GUI of JDemetra+.
In addition to the function revision_analysis()
, the
user can also perform tests individually if they want to. To list all
functions available in the package (and therefore finding the different
functions corresponding to the individual tests), you can do
Use help(‘name of the functions’) or ?‘name of the functions’ for more information and examples over the various functions.
The detailed results of each test are part of the output returned by
the function revision_analysis()
. Alternatively, the
functions associated with the individual test will give you the same
result specific to this test.
In addition to the visual report that you can get by applying the
function render_report()
on the output of the function
revision_analysis()
, you can also apply the usual
summary()
and print()
functions to this
output. The function summary()
, in particular, will print
only the formatted table of the report with the main results. The
print()
will provide the same unformatted information
together with some extra ones. Finally the plot()
function
applied to the output of the function get_revisions()
will
provide you with a visual of the revisions over time.
It is possible for the user to change the default values of the
thresholds considered in the function revision_analysis()
(and displayed by the functions summary.rjd3rev_rslts()
and
render_report()
) that are used to make quality assessment
on the results of the tests. Changing the default values of the
thresholds can be done for each test through the global options. The
latter can be set via options()
and queried via
getOption()
. Note that the default thresholds considered
for residuals diagnostics can also be changed if necessary.
Here’s how to customize a threshold. Thresholds values should be defined as an ascending numeric vector. We start from -Inf and each element of the vector should be understood as the upper or lower bound (depending on the null hypothesis) of the corresponding assessment. Furthermore, the assessment “good” is always the one not to be mentioned. Depending on the test, it will be interpreted adequately. Finally, only the assessments ‘good’ (implicitly), ‘uncertain’, ‘bad’ and ‘severe’ are allowed but they don’t all have to be used if it is not necessary. Here is an example of how to modify threshold values for some of the tests. The list with the name of all test thresholds that can be modified can be found in the list below.
options(
augmented_t_threshold = c(severe = 0.005, bad = 0.05, uncertain = 0.1),
t_threshold = c(bad = 0.05, uncertain = 0.1),
theil_u2_threshold = c(uncertain = .5, bad = .75, severe = 1)
)
rslt2 <- revision_analysis(vintages, gap = 1, view = "diagonal", n.releases = 3)
summary(rslt2)
For the augmented t-test, by defining the threshold values like this, it means that the results will be assessed as severe if the p-value < 0.005, as bad if the p-value is between 0.005 and 0.05, as uncertain if between 0.05 and 0.1 and as good if it is higher than 0.1. As far as the Theil U2 test is concerned, given the definition of the test (see section Theil’s Inequality Coefficient), the results will be assessed as good if the value of the test is lower than 0.5, uncertain, between 0.5 and 0.75, bad between 0.75 and 1 and severe if it is higher than 1.
Here are all the possible options the user can modified, together with their description and their default value.
Name | Description | Default value |
---|---|---|
theil_u1_threshold | Threshold values for Theil’s Inequality Coefficient U1 | c(uncertain = .8, bad = .9, severe = .99) |
theil_u2_threshold | Threshold values for Theil’s Inequality Coefficient U2 | c(uncertain = .8, bad = .9, severe = 1) |
t_threshold | Threshold values for T-test | c(severe = 0.001, bad = 0.01, uncertain = 0.05) |
augmented_t_threshold | Threshold values for Augmented T-test | c(severe = 0.001, bad = 0.01, uncertain = 0.05) |
slope_and_drift_threshold | Threshold values for Slope and drift test | c(severe = 0.001, bad = 0.01, uncertain = 0.05) |
eff1_threshold | Threshold values for Efficiency test (test 1) | c(severe = 0.001, bad = 0.01, uncertain = 0.05) |
eff2_threshold | Threshold values for Efficiency test (test 2) | c(severe = 0.001, bad = 0.01, uncertain = 0.05) |
orth1_threshold | Threshold values for Orthogonality test (test 1) | c(severe = 0.001, bad = 0.01, uncertain = 0.05) |
orth2_threshold | Threshold values for Orthogonality test (test 2) | c(severe = 0.001, bad = 0.01, uncertain = 0.05) |
autocorr_threshold | Threshold values for Orthogonality test (test 3 on autocorrelation) | c(severe = 0.001, bad = 0.01, uncertain = 0.05) |
seas_threshold | Threshold values for Orthogonality test (test 4 on seasonality) | c(severe = 0.001, bad = 0.01, uncertain = 0.05) |
signal_noise1_threshold | Threshold values for Signal vs Noise test (test 1) | c(severe = 0.001, bad = 0.01, uncertain = 0.05) |
signal_noise2_threshold | Threshold values for Signal vs Noise test (test 2) | c(uncertain = 0.05) |
jb_res_threshold | Normality test: Jarque-Bera | c(bad = 0.01, uncertain = 0.1) |
bp_res_threshold | Homoskedasticity test: Breusch-Pagan | c(bad = 0.01, uncertain = 0.1) |
white_res_threshold | Homoskedasticity test: Whitet | c(bad = 0.01, uncertain = 0.1) |
arch_res_threshold | Homoskedasticity test: ARCH | c(bad = 0.01, uncertain = 0.1) |
Finally, the functions set_thresholds_to_default()
and
set_all_thresholds_to_default()
can be used to reset test
thresholds to their default values.
Recall that the purpose of the parametric tests described below are:
In the context of revision analysis, Theil’s inequality coefficient, also known as Theil’s U, provides a measure of the accuracy of a set of preliminary estimates (P) compared to a latter version (L). There exists a few definitions of Theil’s statistics leading to different interpretation of the results. In this package, two definitions are considered. The first statistic, U1, is given by $$ U_1=\frac{\sqrt{\frac{1}{n}\sum^n_{t=1}(L_t-P_t)^2}}{\sqrt{\frac{1}{n}\sum^n_{t=1}L_t^2}+\sqrt{\frac{1}{n}\sum^n_{t=1}P_t^2}} \\ \\ $$ U1 is bounded between 0 and 1. The closer the value of U1 is to zero, the better the forecast method. However, this classic definition of Theil’s statistic suffers from a number of limitations. In particular, a set of near zero preliminary estimates would always give a value of the U1 statistic close to 1 even though they are close to the latter estimates.
The second statistic, U2, is given by $$ U_2=\frac{\sqrt{\sum^n_{t=1}\left(\frac{P_{t + 1}-L_{t + 1}}{L_t}\right)^2}}{\sqrt{\sum^n_{t=1}\left(\frac{L_{t + 1}-L_t}{L_t}\right)^2}} $$ The interpretation of U2 differs from U1. The value of 1 is no longer the upper bound of the statistic but a threshold above (below) which the preliminary estimates is less (more) accurate than the naïve random walk forecast repeating the last observed value (considering Pt + 1 = Lt). Whenever it can be calculated (Lt ≠ 0 ∀t), the U2 statistic is the preferred option to evaluate the relevancy of the preliminary estimates.
A bias in preliminary estimates may indicate inaccurate initial data or inefficient compilation methods. However, we must be cautious if a bias is shown to be statistically significant and we intend to correct it. Biases may change overtime, so we might correct for errors that no longer apply. Over a long interval, changes in methodology or definitions may also occur so that there are valid reasons to expect a non-zero mean revision.
To test whether the mean revision is statistically different from zero for a sample of n, we employ a conventional t-statistic $$ t=\frac{\overline{R}}{\sqrt{s^2/n}} $$ If the null hypothesis that the bias is equal to zero is rejected, it may give an insight of whether a bias exists in the earlier estimates.
The t-test is equivalent to fitting a linear regression of the revisions on only a constant (i.e. the mean). Assumptions such as the gaussianity of the revisions are implied. One can release the assumption of no autocorrelation by adding it into the model. Hence we have Rt = μ + εt, t = 1, ..., n And the errors are thought to be serially correlated according to an AR(1) model, that is εt = γεt + ut, with |γ| < 1 and ut ∼ iid The auto-correlation in the error terms reduces the number of independent observations (and the degrees of freedom) by a factor of $n\frac{(1-\gamma)}{(1+\gamma)}$ and thus, the variance of the mean should be adjusted upward accordingly.
Hence, the Augmented t-test is calculated as $$ t_{adj}=\frac{\overline{R}}{\sqrt{\frac{s^2(1+\hat{\gamma})}{n(1-\hat{\gamma})}}} $$ where $$ \hat{\gamma}=\frac{\sum^{n-1}_{t=1}(R_t-\overline{R})(R_{t + 1}-\overline{R})}{\sum^n_{t=1}(R_t-\overline{R})^2} $$
We assume a linear regression model of a latter vintage (L) on a preliminary vintage (P) and estimate the intercept (drift) β0 and slope coefficient β1 by OLS. The model is
Lt = β0 + β1Pt + εt While the (augmented) t-test on revisions only gives information about mean bias, this regression enables to assess both mean and regression bias. A regression bias would occur, for example, if preliminary estimates tend to be too low when latter estimates are relatively high and too high when latter estimates are relatively low. In this case, this may result in a positive value of the intercept and a value of β1 < 1. To evaluate whether a mean or regression bias is present, we employ a conventional t-test on the parameters with the null hypothesis β0 = 0 and β1 = 1.
Recall that OLS regressions always come along with rather strict assumptions. Users should check the diagnostics and draw the necessary conclusions from them.
Efficiency tests evaluate whether preliminary estimates are “efficient” forecast of the latter estimates. If all information were used efficiently at the time of the preliminary estimate, revisions should not be predictable and therefore neither be correlated with preliminary estimates or display any relationship from one vintage to another. This section focuses on these two points. Predictability of revisions is tested even further in the Orthogonality and SignalVSNoise sections.
We assume a linear regression model of the revisions (R) on a preliminary vintage (P) and estimate the intercept β0 and slope coefficient β1 by OLS. The model is
Rt = β0 + β1Pt + εt, t = 1, ..., n If revisions are affected by preliminary estimates, it means that those are not efficient and could be improved. We employ a conventional t-test on the parameters with the null hypothesis β0 = 0 and β1 = 0. Diagnostics on the residuals should be verified.
We assume a linear regression model of the revisions between latter vintages (Rv) on the revisions between the previous vintages (Rv − 1) and estimate the intercept β0 and slope coefficient β1 by OLS. The model is
Rv, t = β0 + β1Rv − 1, t + εt, t = 1, ..., n If latter revisions are predictable from previous revisions, it means that preliminary estimates are not efficient and could be improved. We employ a conventional t-test on the parameters with the null hypothesis β0 = 0 and β1 = 0. Diagnostics on the residuals should be verified.
Orthogonality tests evaluate whether revisions from older vintages affect the latter revisions. This section also includes autocorrelation and seasonality tests for a given set of revisions. If there is significant correlation in the revisions for subsequent periods, this may witness some degree of predictability in the revision process.
We assume a linear regression model of the revisions from latter vintages (Rv) on the revisions from p previous vintages (Rv − 1, ..., Rv − p) and estimate the intercept β0 and slope coefficients of β1, ..., βp by OLS. The model is
$$ R_{v,t}=\beta_0+\sum^p_{i=1}\beta_{i}R_{v-i,t}+\varepsilon_t, ~~t=1,...,n $$ If latter revisions are predictable from previous revisions, it means that preliminary estimates are not efficient and could be improved. We employ a conventional t-test on the intercept parameter with the null hypothesis β0 = 0 and a F-test to check the null hypothesis that β1 = β2 = ... = βp = 0. Diagnostics on the residuals should be verified
We assume a linear regression model of the revisions from latter vintages (Rv) on the revisions from a specific vintage (Rv − k) and estimate the intercept β0 and slope coefficient β1 by OLS. The model is
Rv, t = β0 + β1Rv − k, t + εt, t = 1, ..., n If latter revisions are predictable from previous revisions, it means that preliminary estimates are not efficient and could be improved. We employ a conventional t-test on the parameters with the null hypothesis β0 = 0 and β1 = 0. Diagnostics on the residuals should be verified.
To test whether autocorrelation is present in revisions, we employ the Ljung-Box test. The Ljung-Box test considers together a group of autocorrelation coefficients up to a certain lag k and is therefore sometimes referred to as a portmanteau test. For the purpose of revision analysis, as we expect a relatively small number of observations, we decided to restrict the number of lags considered to k = 2. Hence, the users can also make the distinction with autocorrelation at seasonal lags (see seasonality tests below).
The null hypothesis is no autocorrelation. The test statistic is given by $$ Q=n(n+2)\sum^k_{i=1}\frac{\hat{\rho}_i^2}{n-i} $$ where n is the sample size, ρ̂i2 is the sample autocorrelation at lag in and k = 2 is the number of lags being tested. Under the null hypothesis, Q ∼ χ2(k).
If Q is statistically different from zero, the revision process may be locally biased in the sense that latter revisions are related to the previous ones.
To test whether seasonality is present in revisions, we employ two tests: the parametric QS test and the non-parametric Friedman test. Note that seasonality tests are always performed on first-differentiated series to avoid misleading results.
The QS test is a variant of the Ljung-Box test computed on seasonal lags, where we only consider positive auto-correlations $$ QS=n(n+2)\sum^k_{i=1}\frac{\left[max(0,\hat{\gamma}_{i.l})\right]^2}{n-i.l} $$ where k = 2, so only the first and second seasonal lags are considered. Thus, the test would checks the correlation between the actual observation and the observations lagged by one and two years. Note that l = 12 when dealing with monthly observations, so we consider the auto-covariances γ̂12 and γ̂24 alone. In turn k = 4 in the case of quarterly data.
Under the null hypothesis of no autocorrelation at seasonal lags, QS ∼ χmodified2(k). The elimination of negative correlations calls indeed for a modified χ2 distribution. This was done using simulation techniques.
The Friedman test requires no distributional assumptions. It uses the rankings of the observations. It is constructed as follows. Consider first the matrix of data {xij}nxk with n rows (the blocks, i.e. the number of years in the sample), k columns (the treatments, i.e., either 12 months or 4 quarters, depending on the frequency of the data). The data matrix needs to be replaced by a new matrix {rij}nxk, where the entry rij is the rank of xij within the block i. The test statistic is given by
$$ Q=\frac{n\sum^k_{j=1}(\tilde{r}_{.j}-\tilde{r})^2}{\frac{1}{n(k-1)}\sum^n_{i=1}\sum^k_{j=1}(r_{ij}-\tilde{r})^2} $$ The denominator represents the variance of the average ranking across treatments j relative to the total. Under the null hypothesis of no (stable) seasonality, Q ∼ χ2(k − 1).
As for non-seasonal autocorrelation tests at lower lags, if QS and Q are significantly different from zero, the revision process may be locally biased in the sense that latter revisions are related to the previous ones.
Regression techniques can also be used to determine whether revisions should be classified as ‘news’ or ‘noise’. Those are also closely related to the notion of efficiency developed earlier.
If the correlation between revisions and preliminary estimates is significantly different from zero, this implies that we did not fully utilize all the information available at the time of the preliminary estimates. In that sense, we would conclude that the preliminary estimates could have been better and that revisions embody noise. The model to test whether revisions are “noise” is similar to the first model established earlier to test efficiency: Rt = β0 + β1Pt + εt, t = 1, ..., n We employ a F-test on the parameters to test jointly that β0 = 0 and β1 = 0. If the null hypothesis is rejected, this suggest that revisions are likely to include noise. Diagnostics on the residuals should be verified.
On the other hand, revisions should be correlated to the latter estimates. If this is the case, it means that information which becomes available between the compilation of the preliminary and the latter estimates are captured in the estimation process of the latter estimates. In that sense, the revision process is warranted and we conclude that revisions embody news. The model to test whether revisions are “news” is
Rt = β0 + β1Lt + εt, t = 1, ..., n We employ a F-test on the parameters to test jointly that β0 = 0 and β1 = 0. If the null hypothesis is rejected, this is a good thing as it suggests that revisions incorporate news. Note that even if we reject the null hypothesis and conclude that revisions incorporate news, it does not necessarily mean that revisions are efficient as those might still be predicted by other variables not included in the estimation process.