A Python Framework for Causal Discovery in Non-Gaussian Linear Models: The PyCD-LiNGAM Library
DOI:
https://doi.org/10.55640/Keywords:
Causal discovery, LiNGAM, Python, machine learningAbstract
Background: Causal discovery from observational data is a critical challenge across scientific disciplines. While traditional methods often rely on correlation, they fail to distinguish between causation and spurious association. The Linear Non-Gaussian Acyclic Model (LiNGAM) addresses this by leveraging the non-Gaussianity of data to uniquely identify the causal structure, but a comprehensive, user-friendly, and open-source implementation in Python has been lacking.
Met
hods: We introduce PyCD-LiNGAM, a dedicated Python framework designed for state-of-the-art causal discovery using LiNGAM-based methods. The library's core is built around specialized algorithms such as ICA-LiNGAM and DirectLiNGAM for robustly inferring causal ordering and estimating connection strengths. The framework is architected with a modular design, enabling researchers to easily configure parameters, integrate new methods, and handle complex scenarios through advanced features for latent confounder detection and time-series analysis. For validation, PyCD-LiNGAM includes tools for statistical reliability assessment via bootstrap methods and uses metrics like the Structural Hamming Distance (SHD) to evaluate performance.
Results: Benchmark experiments conducted on both synthetic and real-world datasets demonstrate that PyCD-LiNGAM achieves high accuracy and strong scalability. The framework consistently outperforms established baseline methods by effectively recovering the true causal graph, especially in settings with non-Gaussian error distributions. The built-in visualization tools allow for clear and interpretable representation of the discovered directed acyclic graphs.
Conclusion: PyCD-LiNGAM serves as a foundational and accessible tool for researchers to apply advanced causal discovery techniques. Its specialized design and robust implementation lower the barrier for integrating causal inference into data analysis pipelines across fields such as econometrics, neuroscience, and genomics. While currently focused on linear, acyclic models, future development will aim to extend the framework to include non-linear methods and improve scalability, further solidifying its role in evidence-based scientific research.
References
Bhattacharya, R., Nabi, R., & Shpitser, I. (2020). Semiparametric inference for causal effects in graphical models with hidden variables. arXiv preprint arXiv:2003.12659.
Campomanes, P., Neri, M., Horta, B. A. C., Roehrig, U. F., Vanni, S., Tavernelli, I., & Rothlisberger, U. (2014). Origin of the spectral shifts among the early intermediates of the rhodopsin photocycle. Journal of the American Chemical Society, 136(10), 3842-3851. https://doi.org/10.1021/ja410334g
Chickering, D. M. (2002). Optimal structure identification with greedy search. Journal of Machine Learning Research, 3, 507-554.
Drton, M., & Maathuis, M. H. (2017). Structure learning in graphical modeling. Annual Review of Statistics and Its Application, 4, 365-393. https://doi.org/10.1146/annurev-statistics-060116-053803
Entner, D., & Hoyer, P. O. (2011). Discovering unconfounded causal relationships using linear non-Gaussian models. In New Frontiers in Artificial Intelligence (pp. 181-195). Springer. https://doi.org/10.1007/978-3-642-25655-4_17
Gerhardus, A., & Runge, J. (2020). High-recall causal discovery for autocorrelated time series with latent confounders. Advances in Neural Information Processing Systems, 33, 12615-12625.
Glymour, C., Zhang, K., & Spirtes, P. (2019). Review of causal discovery methods based on graphical models. Frontiers in Genetics, 10, 524. https://doi.org/10.3389/fgene.2019.00524
Hoyer, P. O., Shimizu, S., Kerminen, A., & Palviainen, M. (2008). Estimation of causal effects using linear non-Gaussian causal models with hidden variables. International Journal of Approximate Reasoning, 49(2), 362-378. https://doi.org/10.1016/j.ijar.2008.02.002
Hoyer, P. O., Janzing, D., Mooij, J., Peters, J., & Schölkopf, B. (2009). Nonlinear causal discovery with additive noise models. Advances in Neural Information Processing Systems, 21, 689-696.
Hyvärinen, A., Karhunen, J., & Oja, E. (2001). Independent Component Analysis. Wiley.
Hyvärinen, A., Zhang, K., Shimizu, S., & Hoyer, P. O. (2010). Estimation of a structural vector autoregressive model using non-Gaussianity. Journal of Machine Learning Research, 11, 1709-1731.
Imbens, G. W., & Rubin, D. B. (2015). Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press.
Jung, Y., Tian, J., & Bareinboim, E. (2020). Estimating causal effects using weighting-based estimators. Proceedings of the AAAI Conference on Artificial Intelligence, 34, 10186-10193. https://doi.org/10.1609/aaai.v34i06.6608
Kadowaki, K., Shimizu, S., & Washio, T. (2013). Estimation of causal structures in longitudinal data using non-Gaussianity. 2013 IEEE International Workshop on Machine Learning for Signal Processing, 1-6. https://doi.org/10.1109/MLSP.2013.6661910
Kalainathan, D., Goudet, O., & Dutta, R. (2020). Causal discovery toolbox: Uncovering causal relationships in python. Journal of Machine Learning Research, 21(37), 1-5. http://jmlr.org/papers/v21/19-187.html
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Prof. Elena M. Petrova (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors retain the copyright of their manuscripts, and all Open Access articles are disseminated under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which licenses unrestricted use, distribution, and reproduction in any medium, provided that the original work is appropriately cited. The use of general descriptive names, trade names, trademarks, and so forth in this publication, even if not specifically identified, does not imply that these names are not protected by the relevant laws and regulations.