PRE-PROCESSING TECHNIQUES FOR INTEGRATED ADVERSE DRUG REACTION DATASETS

Authors

  • SHIKSHA DUBEY Department of Computer Application, Thakur Institute of Management, Career Development and Research (TIMSCDR), Mumbai-400101, India https://orcid.org/0000-0002-9320-6408
  • SHIRSHENDU MAITRA Department of Computer Application, Thakur Institute of Management, Career Development and Research (TIMSCDR), Mumbai-400101, India

DOI:

https://doi.org/10.22159/ijpps.2025v17i8.54565

Keywords:

Adverse drug reactions (ADRs), Drugs, pharmacology

Abstract

Objective: To integrate and preprocess datasets from the FDA adverse event reporting system (FAERS), side effect resource (SIDER), DrugBank, and PubChem to extract meaningful insights into drug interactions, adverse events, and molecular properties, thereby supporting drug discovery and pharmacovigilance.

Methods: The study implements a preprocessing pipeline that includes data cleaning, normalization, and harmonization to ensure consistency across the diverse datasets. Standardization of drug nomenclature and handling of missing or inconsistent information are emphasized. The integrated data is then subjected to exploratory data analysis and advanced visualization techniques to uncover patterns and correlations within the data.

Results: The integration and preprocessing of the datasets improved the consistency and quality of the drug-related data. Exploratory analysis revealed patterns and potential associations among drugs, adverse events, and molecular features. Visualization tools effectively conveyed complex relationships and significant trends, enhancing interpretability.

Conclusion: The study successfully demonstrates that integrating and preprocessing multiple drug-related datasets improves data quality and facilitates comprehensive analysis. The resulting resource supports better-informed decision-making in drug development and pharmacovigilance by enabling a deeper understanding of drug interactions and safety profiles.

Downloads

Download data is not yet available.

References

1. Ventola CL. Big data and pharmacovigilance: data mining for adverse drug events and interactions. PT. 2018 Jun;43(6):340-51. PMID 29896033.

2. The ICH Expert Working Group. Post-approval safety data management: definitions and standards for expedited reporting. ICH Harmonised Tripartite; 2003 Nov. Available from: http://www.fda.gov/cber/gdlns/ichexrep.htm.

3. Dubey SA, Kharkar P, Pandit AA. Neural network-based adverse drug reaction prediction using molecular substructure analyses; 2023. http://dx.doi.org/10.2139/ssrn.4507918.

4. Panigrahy A, Begum A, Pingali U, Padmaja M, Sajeev A. Evaluation of cutaneous adverse drug reactions in a Tertiary Care Hospital in Southern India: a retrospective analysis. Asian J Pharm Clin Res. 2024;17(11):143-7. doi: 10.22159/ajpcr.2024v17i11.52291.

5. Kaur T, Margam N, Randhawa GK. Current trends of cutaneous adverse drug reactions in a Tertiary Care Hospital in North India: a retrospective study. Asian J Pharm Clin Res. 2024 Aug;17(8):142-6. doi: 10.22159/ajpcr.2024v17i8.51076.

6. Alomar MJ. Factors affecting the development of adverse drug reactions. Saudi Pharm J. 2014 Mar;22(2):83-94. doi: 10.1016/j.jsps.2013.02.003, PMID 24648818.

7. Amale PN, Sa D, Yd N, Na A. Pharmacovigilance process in India: an overview. J Pharmacovigil. 2018;6(2):259. doi: 10.4172/2329-6887.1000259.

8. The thalidomide tragedy: lessons for drug safety and regulation. Available from: https://helix.Northwestern.Edu/Article/Thalidomide-Tragedy-Lessons-Drug-Safety-and-Regulation. [Last accessed on 05 Jul 2025].

9. Friedrich F. Automated generation of business process models from natural language input [Master’s thesis]. Berlin (DE): Humboldt-Universität zu Berlin; 2010.

10. Létinier L, Jouganous J, Benkebil M, Bel Letoile A, Goehrs C, Singier A. Artificial intelligence for unstructured healthcare data: application to coding of patient reporting of adverse drug reactions. Clin Pharmacol Ther. 2021 Aug;110(2):392-400. doi: 10.1002/cpt.2266, PMID 33866552.

11. Dal Pan GJ, Arlett PR. The US food and drug administration-european medicines agency collaboration in pharmacovigilance: common objectives and common challenges. Drug Saf. 2015;38(1):13-5. doi: 10.1007/s40264-014-0259-3, PMID 25539878.

12. Kuhn M, Letunic I, Jensen LJ, Bork P. The SIDER database of drugs and side effects. Nucleic Acids Res. 2016 Jan;44(D1):D1075-9. doi: 10.1093/nar/gkv1075, PMID 26481350.

13. Pandit AA, Dubey SA. A comprehensive review on adverse drug reactions (ADRs) detection and prediction models. In: 13th International Conference on Computational Intelligence and Communication Networks (CICN); 2021. p. 123-7. doi: 10.1109/CICN51697.2021.9574639.

14. Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 2008 Jan;36:D901-6. doi: 10.1093/nar/gkm958, PMID 18048412.

15. Wang Y, Xiao J, Suzek TO, Zhang J, Wang J, Bryant SH. PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res. 2009 Jul;37 Suppl 2:W623-33. doi: 10.1093/nar/gkp456, PMID 19498078.

16. Van Bruggen R, Learning N4j. Birmingham, UK: Packt Publishing Ltd.; 2014.

17. Rodriguez P, Bautista MA, Gonzalez J, Escalera S. Beyond one-hot encoding: lower dimensional target embedding. Image Vis Comput. 2018;75(5). doi: 10.1016/j.imavis.2018.04.004.

18. Alin A. Multicollinearity. WIREs Computational Stats. 2010;2(3):370-4. doi: 10.1002/wics.84.

19. Das B, Resnick SI. QQ plots, random sets and data from a heavy tailed distribution. Stoch Models. 2008;24(1):103-32. doi: 10.1080/15326340701828308.

20. Bisong E. Introduction to Scikit-learn. In: Building machine learning and deep learning models on Google Cloud Platform. Berkeley, CA: Apress; 2019. p. 215-29. doi: 10.1007/978-1-4842-4470-8_18.

21. Kumar V. Feature selection: a literature review. Smart CR. 2014;4(3):211-29. doi: 10.6029/smartcr.2014.03.007.

22. Lal TN, Chapelle O, Weston J, Elisseeff A. Embedded methods. In: Guyon I, Nikravesh M, Gunn S, Zadeh LA, editors. Feature extraction. Berlin: Springer; 2006. p. 137-65. doi: 10.1007/978-3-540-35488-8_6.

23. Sharaff A, Gupta H. Extra-tree classifier with metaheuristics approach for email classification. In: Bhatia SK, Tiwari S, Mishra KK, Trivedi MC, editors. Advances in computer communication and computational sciences. Singapore: Springer; 2019. p. 189-97. doi: 10.1007/978-981-13-6861-5_17.

Published

01-08-2025

How to Cite

DUBEY, SHIKSHA, and SHIRSHENDU MAITRA. “PRE-PROCESSING TECHNIQUES FOR INTEGRATED ADVERSE DRUG REACTION DATASETS”. International Journal of Pharmacy and Pharmaceutical Sciences, vol. 17, no. 8, Aug. 2025, pp. 35-46, doi:10.22159/ijpps.2025v17i8.54565.

Issue

Section

Original Article(s)

Similar Articles

<< < 28 29 30 31 32 > >> 

You may also start an advanced similarity search for this article.