Analysis of storage formats for multidimensional data models in the context of multidimensional cubes
- Authors: Frolov V.A.1, Khayrullin R.Z.1, Afanasyev G.I.1
-
Affiliations:
- Bauman Moscow State Technical University
- Issue: Vol 12, No 4 (2025)
- Pages: 187-194
- Section: INFORMATICS AND INFORMATION PROCESSING
- URL: https://ogarev-online.ru/2313-223X/article/view/380199
- DOI: https://doi.org/10.33693/2313-223X-2025-12-4-187-194
- EDN: https://elibrary.ru/GSYBAH
- ID: 380199
Cite item
Abstract
The article considers the issues of efficient storage of multidimensional data models in the context of modern analytical systems. Particular attention is paid to the architecture of multidimensional cubes, which involve storing aggregated facts at the intersection of many dimensions. A review of modern data storage formats is provided – Parquet, ORC, Iceberg, Delta Lake, Hudi – from the standpoint of their applicability to multidimensional analytics tasks. It is shown that existing solutions are focused mainly on tabular structures and do not provide full support for multidimensional relationships, hierarchies and aggregations. The difficulties of integration between different storage formats and the lack of a unified approach to describing metadata are analyzed. Based on the identified limitations, design tasks facing the multidimensional cube storage format are formulated. A conceptual storage model is proposed that combines the principles of relational and multidimensional data organization. The multidimensional model is a table of facts, dimensions, as well as a metadata level and an API interface.
About the authors
Vladimir A. Frolov
Bauman Moscow State Technical University
Author for correspondence.
Email: vladimir.frolov.99@mail.ru
ORCID iD: 0009-0006-3090-7473
SPIN-code: 8207-1550
postgraduate student, Department of Information Processing and Control Systems
Russian Federation, MoscowRustam Z. Khayrullin
Bauman Moscow State Technical University
Email: zrkzrk@list.ru
ORCID iD: 0000-0002-0596-4955
SPIN-code: 6631-0932
Scopus Author ID: 4036
Dr. Sci. (Phys.-Math.), Senior Researcher, Professor
Russian Federation, MoscowGennady I. Afanasyev
Bauman Moscow State Technical University
Email: gaipcs@bmstu.ru
SPIN-code: 7790-1645
Cand. Sci. (Eng.), Associate Professor
Russian Federation, MoscowReferences
- Agrawal R., Gupta A., Sarawagi S. Modeling multidimensional databases. In: IBM research report. IBM Almaden Research Center, 1995. Pp. 1–17.
- Arnas D., Rodríguez M. Range searching in multidimensional databases using navigation metadata. Applied Mathematics and Computation. 2020. Pp. 1–8. doi: 10.1016/j.amc.2020.125510.
- Basani M., Kandi A. Optimizing Cloud Data Storage: Evaluating File Formats for Efficient Data Warehousing. International Journal for Research in Applied Science & Engineering Technology. 2024. Pp. 922–930. doi: 10.22214/ijraset.2024.64753.
- Boukraâ D., Latreche O. Self-service, on-demand construction of OLAP cubes from data lakes: Application to Twitter. ResearchGate. 2020. Pp. 1–10. doi: 10.13140/RG.2.2.29163.36646.
- Djiroun R., Boukhalfa K., Alimazighi Z. Data cubes retrieval and design in OLAP systems: From query analysis to visualization tool. Int. J. Business Intelligence and Data Mining. 2019. Vol. 14. No. 1/2. Pp. 267–270. doi: 10.1504/IJBIDM.2019.096813.
- Djiroun R., Boukhalfa K., Alimazighi Z. Designing data cubes in OLAP systems: A decision makers’ requirements-based approach. Cluster Computing. 2018. Pp. 1–14. doi: 10.1007/s10586-018-2883-7.
- Etcheverry L., Vaisman A.A. Enhancing OLAP analysis with Web cubes. LNCS. 2012. Vol. 7295. Pp. 469–482. doi: 10.1007/978-3-642-30284-8_38.
- Fan J., Han F., Liu H. Challenges of Big Data analysis. National Science Review. 2014. Vol. 1. No. 2. Pp. 293–299. doi: 10.1093/nsr/nwt032.
- Gray J., Bosworth A., Layman A., Pirahesh H. Data cube: A relational aggregation operator generalizing Group-By, Cross-Tab, and Sub-Totals. Proceedings of ICDE. 1996. Pp. 152–159.
- Gyssens M., Lakshmanan L.V.S. A Foundation for multi-dimensional databases. In: Proceedings of the 23rd International Conference on Very Large Data Bases VLDB’97. 1997. Pp. 1–10.
- Haelen B., Davis D. Delta lake: Up and running. O’Relly Media, 2023. Pp. 26–44.
- Kuznetsov S.D., Kudryavtsev Yu.A. A mathematical model of the OLAP cubes. Programming and Computer Software. 2009. Vol. 35. No. 5. Pp. 257–265. doi: 10.1134/S0361768809050028.
- Letrache K., El Beggar O., Ramdani M. The automatic creation of OLAP cube using an MDA approach. In: Software: Practice and experience. 2017. Pp. 3–14. doi: 10.1002/spe.2512.
- Nanda N., Gupta S., Vijrania M. A comprehensive survey of OLAP: Recent trends. In: Proceedings of the Third International Conference on Electronics Communication and Aerospace Technology ICECA 2019. IEEE, 2019. Pp. 425–427. doi: 10.1109/ICECA.2019.8822203.
- Ron C., L’Esteve. Delta lake. In: The deinitive guide to azure data engineering. 2021. Pp. 293–295. doi: 10.1007/978-1-4842-7182-7_15.
- Saha D. Disruptor in data engineering – comprehensive review of apache iceberg. SSRN preprint. 2024. No. 4987315. Pp. 5–11. doi: 10.2139/ssrn.4987315.
- Vassiliadis P. A cube algebra with comparative operations: Containment, overlap, distance, and usability. arXiv.org. 2023. Pp. 2–58. doi: 10.48550/arXiv.2203.09390v2.
- Vassiliadis P. Modeling multidimensional databases, cubes, and cube operations. Proceedings of SSDM. 1998. Pp. 1–9. doi: 10.1109/SSDM.1998.688111.
- Vohra D. Apache parquet. In: Practical Hadoop ecosystem. Berkeley, CA: Apress, 2016. Pp. 325–328. doi: 10.1007/978-1-4842-2199-0_8.
- Wang Z., Chu Y., Tan K.-L. et al. Scalable data cube analysis over big data. arXiv.org. 2013. Pp. 1–12. doi: 10.13140/RG.2.2.35874.25282.
- Tennick A. Practical MDX queries: For Microsoft SQL server analysis services 2008. McGraw-Hill Education Group. 2010. Pp. 5–25.
- Piasevoli T., Li S. MDX with Microsoft SQL Server 2016 Analysis Services Cookbook. 3rd ed. Packt Publishing, 2016. Pp. 401–440.
Supplementary files
