# In the spotlight

## Panel data: Complexity vs Economic Modeling Opportunities

### A new PhD thesis from the Department of Economics at Lund University on “mostly panel econometrics” Stauskas (2022) tackles several important issues in the analysis of this rich form of data popularly used in research and more broadly in analytical work. In particular, the research contained in this thesis teaches us novel methods to transform some of the challenges associated with the complexity of panel data into a powerful resource.

** By Ovidijus Stauskas**

### What are Panel Data?

Researchers both in academia and industry often face panel data. They are rich, informative and at the same time complex data, because they provide information on a pool of individuals over (potentially large) span of time. In other words, they can be seen as a collection of individuals time series. OECD, Eurostat, World Bank are just a few popular sources of panel data. Individuals may be represented by countries, firms, households, etc. Given this, we can expect intricate interdependencies and correlations among and within the panel units, respectively.

Moreover, it is very likely that researchers who analyze panel data face the so-called unobserved heterogeneity – a feature that has become inseparable from the panel data discussions. For example, we can observe many macroeconomic indicators of a country. Maybe we would like to understand how change in labor force affects long-term economic growth on average in OECD countries. However, some country-specific characteristics (also known individual fixed effects) are not entirely observed and hard to quantify, e.g. cultural aspects or technological levels. If these unobservables are correlated with the observed indicators (change in labor force), then estimation of even the simplest econometric models becomes challenging. Luckily, special econometric techniques allow to take the individual fixed effects into account. Therefore, the seeming challenge of panel data can be turned into a significant advantage if the right tools are in place.

### Introducing Factor Models

The individual fixed effects, however, is still a very simplistic example. The key assumption is that they do not depend on time! Clearly, it is not very realistic. Stauskas (2022) analyzed estimation methods under very general specification of unobserved heterogeneity, which is known as common factors. For instance, a sample of countries can be affected by global common shocks (the factors), which evolve over time and each country responds to them to a different degree. This means that the countries are strongly dependent on each other, and this invalidates standard panel econometric techniques.

Pesaran (2006) proposed Common Correlated Effects (CCE) estimator to deal with such situations. The catch here is that in order to control for the factor effect, researchers need to estimate the factors, first. The suggestion of Pesaran (2006) was employing (weighted) averages of the individual specific explanatory variables. The intuition here is elegant – averages summarize the common information among the sample individuals. De Vos and Stauskas (2021) analyze the CCE methodology when one estimates too many factors or, in other words, employs too many averages as factor proxies. The brief lesson is this: too many factors mean too much redundant information which ‘contaminates’ statistical inference through extra noise. To tackle this, one needs to resort to the so-called bootstrapping technique for panel data popularized by Kapetanios (2008). Particularly, we rigorously adapted his methodology to factor models and CCE technique.

Stauskas (2021) explores whether the CCE estimator can be adapted to more realistic (macro)economic scenarios. The prevalent assumption in the CCE literature is mean reverting factors. This means that the global shocks that affect the sample do not have time trends, their variance and mean never change. Stauskas (2021) analyzes whether this estimator works if the factors have exploding or imploding nature, which would correspond to financial crises or pandemic-like shocks that clearly are not mean-reverting, at least in short run. It turns out that the methodology is robust to such irregularities, which is reassuring news for empirical researchers.

### Factor Information in Forecasting

While the latter topics focus on only controlling the effect of the factors, they clearly contain useful information about the core tendencies in the data. Stauskas and Westerlund (2021) employs this information for forecasting purposes. Since many economic variables can potentially be helpful in predicting another variable, the factors are useful as means to reduce a large dimension of the data. In other words, instead of using myriads of economic variables, Stauskas and Westerlund (2021) shows how one can elegantly summarize their predictive content in a relatively small number of factors. Particularly, the paper illustrates a rigorous new theory behind factor-augmented forecasts when dataset at hand exhibits block structure. An example of such data set is the popular FRED-MD dataset McCracken and Ng (2016), which summarizes macroeconomic indicators of the United States, and they are categorized as trade, stock market, production variables (8 blocks in total). Such structure helps to extract the factors in a user-friendly way, which results in a convenient forecasting exercise and a simple procedure to test competing forecasting models. The user-friendliness stems from the fact that we adapted the intuitive CCE methodology for the forecasting context because the averages are taken block-wise.

### Dynamics in Panel Models

It has been observed that macroeconomic series, such as GDP, clearly trend over time or even behave as a random walk, an example of which is stock prices Phillips (2001). To put it differently, economic series are usually dynamic, persistent and depend on its previous values. Dealing with high levels of persistence is a serious econometric challenge in a pure time series context. However, if we obtain a panel of such series, we face a multilevel challenge since high persistence must be combined with the above-discussed unobserved heterogeneity if our aim is a realistic economic modelling. To address this, Westerlund et al. (2021) explores a novel Factor Analytic estimator by Bai (2013). Since this study, it was well-known how the individual fixed effects can be controlled for in dynamic panels without harmful distortions for statistical inference. However, the underlying assumption was ‘low’ persistence. Westerlund et al. (2021) derives conditions under which this technique works in a ‘high’ persistence setup not only under individual fixed effects, but also under individual specific time trends.

**References:**

Bai Jushan (2013) ‘’Fixed effects dynamic panel models, a factor analytical method’’. Econometrica

McCracken W. Michael; Ng Serena (2016). ’’FRED-MD: A monthly database for macroeconomic research’’. Journal of Business and Economics Statistics.

Pesaran Hashem (2006) ‘’Estimation and Inference in Large Heterogeneous Panels With a Multifactor Error Structure’’. Econometrica.