PYCD-LINGAM: A PYTHON FRAMEWORK FOR CAUSAL INFERENCE WITH NON-GAUSSIAN LINEAR MODELS

Liang Wu; Anita Sari

doi:10.55640/ijidml-v02i07-01

Authors

Liang Wu Institute of Data Science, Tsinghua University, China
Anita Sari Ph.D. Candidate, Department of Computer Science, Universitas Indonesia, Depok, Indonesia

DOI:

https://doi.org/10.55640/ijidml-v02i07-01

Keywords:

Causal inference, LiNGAM, Non-Gaussian models, Causal discovery, Python framework

Abstract

PyCD-LiNGAM is an advanced Python framework designed to facilitate causal inference in observational data using non-Gaussian linear models. Building upon the foundational principles of the Linear Non-Gaussian Acyclic Model (LiNGAM), this framework offers a robust suite of tools for uncovering causal structures in datasets where conventional Gaussian assumptions fail to capture latent dependencies. PyCD-LiNGAM provides efficient implementations of DirectLiNGAM, ICA-LiNGAM, and adaptive algorithms that exploit higher-order statistical properties to reliably identify causal ordering and estimate connection strengths among variables. The framework integrates seamlessly with popular scientific computing libraries, enabling practitioners to perform end-to-end causal discovery, visualize directed acyclic graphs, and assess model fit through rigorous statistical criteria. Benchmark experiments demonstrate that PyCD-LiNGAM achieves high accuracy and scalability across synthetic and real-world datasets, outperforming baseline methods in identifying true causal relationships under non-Gaussian noise. By lowering the barrier to applying state-of-the-art causal inference techniques, PyCD-LiNGAM empowers researchers and data scientists in fields such as econometrics, neuroscience, genomics, and social sciences to derive actionable insights about underlying causal mechanisms.

References

Bhattacharya, R., Nabi, R., & Shpitser, I. (2020). Semiparametric inference for causal effects in graphical models with hidden variables. arXiv preprint arXiv:2003.12659.

Campomanes, P., Neri, M., Horta, B. A. C., Roehrig, U. F., Vanni, S., Tavernelli, I., & Rothlisberger, U. (2014). Origin of the spectral shifts among the early intermediates of the rhodopsin photocycle. Journal of the American Chemical Society, 136(10):3842–3851.

Chickering, D. M. (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research, 3:507–554.

Drton, M., & Maathuis, M. H. (2017). Structure learning in graphical modeling. Annual Review of Statistics and Its Application, 4:365–393.

Entner, D., & Hoyer, P. O. (2011). Discovering unconfounded causal relationships using linear non-Gaussian models. In New Frontiers in Artificial Intelligence, Lecture Notes in Computer Science, volume 6797, pages 181–195.

Gerhardus, A., & Runge, J. (2020). High-recall causal discovery for autocorrelated time series with latent confounders. Advances in Neural Information Processing Systems, 33:12615–12625.

Glymour, C., Zhang, K., & Spirtes, P. (2019). Review of causal discovery methods based on graphical models. Frontiers in Genetics, 10:524.

Hoyer, P. O., Shimizu, S., Kerminen, A., & Palviainen, M. (2008). Estimation of causal effects using linear non-Gaussian causal models with hidden variables. International Journal of Approximate Reasoning, 49(2):362–378.

Hoyer, P. O., Janzing, D., Mooij, J., Peters, J., & Schölkopf, B. (2009). Nonlinear causal discovery with additive noise models. In Advances in Neural Information Processing Systems 21, pages 689–696. Curran Associates Inc.

Hyvärinen, A., Karhunen, J., & Oja, E. (2001). Independent Component Analysis. Wiley, New York.

Hyvärinen, A., Zhang, K., Shimizu, S., & Hoyer, P. O. (2010). Estimation of a structural vector autoregressive model using non-Gaussianity. Journal of Machine Learning Research, 11:1709–1731.

Imbens, G. W., & Rubin, D. B. (2015). Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press.

Jung, Y., Tian, J., & Bareinboim, E. (2020). Estimating causal effects using weightingbased estimators. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 10186–10193.

Kadowaki, K., Shimizu, S., & Washio, T. (2013). Estimation of causal structures in longitudinal data using non-Gaussianity. In Proc. 23rd IEEE International Workshop on Machine Learning for Signal Processing (MLSP2013), pages 1–6.

Kalainathan, D., Goudet, O., & Dutta, R. (2020). Causal discovery toolbox: Uncovering causal relationships in python. Journal of Machine Learning Research, 21(37):1–5. URL http://jmlr.org/papers/v21/19-187.html.

Kalisch, M., Mächler, M., Colombo, D., Maathuis, M. H., & Bühlmann, P. (2012). Causal inference using graphical models with the R package pcalg. Journal of Statistical Software, 47(11):1–26.

Kawahara, Y., Shimizu, S., & Washio, T. (2011). Analyzing relationships among ARMA processes based on non-Gaussianity of external influences. Neurocomputing, 74(12-13):2212–2221.

Komatsu, Y., Shimizu, S., & Shimodaira, H. (2010). Assessing statistical reliability of LiNGAM via multiscale bootstrap. In Proceedings of 20th International Conference on Artificial Neural Networks (ICANN2010), pages 309–314. Springer.

Liu, Y., Ziatdinov, M., & Kalinin, S. V. (2021). Exploring causal physical mechanisms via non-gaussian linear models and deep kernel learning: applications for ferroelectric domain structures. ACS Nano, 16(1):1250–1259.

Maeda, T. N., & Shimizu, S. (2020). RCD: Repetitive causal discovery of linear non-Gaussian acyclic models with latent confounders. In Proc. 23rd International Conference on Artificial Intelligence and Statistics (AISTATS2010), volume 108 of Proceedings of Machine Learning Research, pages 735–745. PMLR, 26–28 Aug 2020.

Maeda, T. N., & Shimizu, S. (2021). Causal additive models with unobserved variables. In Proc. 37th Conference on Uncertainty in Artificial Intelligence (UAI2021), pages 97–106. PMLR.

Mills-Finnerty, C., Hanson, C., & Hanson, S. J. (2014). Brain network response underlying decisions about abstract reinforcers. NeuroImage, 103:48–54.

Moneta, A., Entner, D., Hoyer, P. O., & Coad, A. (2013). Causal inference by independent component analysis: Theory and applications. Oxford Bulletin of Economics and Statistics, 75(5):705–730.

Pearl, J. (1995). Causal diagrams for empirical research. Biometrika, 82(4):669–688.

Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press.

Pearl, J. (2019). The seven tools of causal inference, with reflections on machine learning. Communications of the ACM, 62(3):54–60.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al. (2011). Scikit-learn: Machine learning in python. Journal of machine learning research, 12(Oct):2825–2830.

Peters, J., Mooij, J. M., Janzing, D., & Schölkopf, B. (2014). Causal discovery with continuous additive noise models. Journal of Machine Learning Research, 15:2009–2053.

Peters, J., Janzing, D., & Schölkopf, B. (2017). Elements of causal inference: foundations and learning algorithms. The MIT Press.

Ramsey, J. D., Malinsky, D., & Bui, K. V. (2020). algcomparison: Comparing the performance of graphical structure learning algorithms with TETRAD. Journal of Machine Learning Research, 21(238):1–6.

Rosenström, T., Jokela, M., Puttonen, S., Hintsanen, M., Pulkki-Räback, L., Viikari, J. S., Raitakari, O. T., & Keltikangas-Järvinen, L. (2012). Pairwise measures of causal direction in the epidemiology of sleep problems and depression. PLOS ONE, 7(11):e50841.

Scheines, R., Spirtes, P., Glymour, C., Meek, C., & Richardson, T. (1998). The TETRAD project: Constraint based aids to causal model specification. Multivariate Behavioral Research, 33(1):65–117.

Scutari, M., & Denis, J.-B. (2021). Bayesian networks: with examples in R. Chapman and Hall/CRC.

Shimizu, S. (2012). Joint estimation of linear non-Gaussian acyclic models. Neurocomputing, 81:104–107.

Shimizu, S. (2014). LiNGAM: Non-Gaussian methods for estimating causal structures.13 Behaviormetrika, 41(1):65–98.

Shimizu, S. (2022). Statistical Causal Discovery: LiNGAM Approach. Springer, Tokyo.

Shimizu, S., Hoyer, P. O., Hyvärinen, A., & Kerminen, A. (2006).14 A linear non-Gaussian acyclic model for causal discovery. Journal of Machine Learning Research, 7:2003–2030.

Shimizu, S., Inazumi, T., Sogawa, Y., Hyvärinen, A., Kawahara, Y., Washio, T., Hoyer, P. O., & Bollen, K. (2011). DirectLiNGAM: A direct method for learning a linear non-Gaussian structural equation model.15 Journal of Machine Learning Research, 12:1225–1248.

Shpitser, I., & Pearl, J. (2008). Complete identification methods for the causal hierarchy. Journal of Machine Learning Research, 9:1941–1979.

Spirtes, P., & Glymour, C. (1991). An algorithm for fast recovery of sparse causal graphs. Social Science Computer Review, 9:67–72.

International Journal of Intelligent Data and Machine Learning

Article Details Page