Analysis of storage formats for multidimensional data models in the context of multidimensional cubes

Cover Page

Cite item

Full Text

Open Access Open Access
Restricted Access Access granted
Restricted Access Subscription Access

Abstract

The article considers the issues of efficient storage of multidimensional data models in the context of modern analytical systems. Particular attention is paid to the architecture of multidimensional cubes, which involve storing aggregated facts at the intersection of many dimensions. A review of modern data storage formats is provided – Parquet, ORC, Iceberg, Delta Lake, Hudi – from the standpoint of their applicability to multidimensional analytics tasks. It is shown that existing solutions are focused mainly on tabular structures and do not provide full support for multidimensional relationships, hierarchies and aggregations. The difficulties of integration between different storage formats and the lack of a unified approach to describing metadata are analyzed. Based on the identified limitations, design tasks facing the multidimensional cube storage format are formulated. A conceptual storage model is proposed that combines the principles of relational and multidimensional data organization. The multidimensional model is a table of facts, dimensions, as well as a metadata level and an API interface.

About the authors

Vladimir A. Frolov

Bauman Moscow State Technical University

Author for correspondence.
Email: vladimir.frolov.99@mail.ru
ORCID iD: 0009-0006-3090-7473
SPIN-code: 8207-1550

postgraduate student, Department of Information Processing and Control Systems

Russian Federation, Moscow

Rustam Z. Khayrullin

Bauman Moscow State Technical University

Email: zrkzrk@list.ru
ORCID iD: 0000-0002-0596-4955
SPIN-code: 6631-0932
Scopus Author ID: 4036

Dr. Sci. (Phys.-Math.), Senior Researcher, Professor

Russian Federation, Moscow

Gennady I. Afanasyev

Bauman Moscow State Technical University

Email: gaipcs@bmstu.ru
SPIN-code: 7790-1645

Cand. Sci. (Eng.), Associate Professor

Russian Federation, Moscow

References

  1. Agrawal R., Gupta A., Sarawagi S. Modeling multidimensional databases. In: IBM research report. IBM Almaden Research Center, 1995. Pp. 1–17.
  2. Arnas D., Rodríguez M. Range searching in multidimensional databases using navigation metadata. Applied Mathematics and Computation. 2020. Pp. 1–8. doi: 10.1016/j.amc.2020.125510.
  3. Basani M., Kandi A. Optimizing Cloud Data Storage: Evaluating File Formats for Efficient Data Warehousing. International Journal for Research in Applied Science & Engineering Technology. 2024. Pp. 922–930. doi: 10.22214/ijraset.2024.64753.
  4. Boukraâ D., Latreche O. Self-service, on-demand construction of OLAP cubes from data lakes: Application to Twitter. ResearchGate. 2020. Pp. 1–10. doi: 10.13140/RG.2.2.29163.36646.
  5. Djiroun R., Boukhalfa K., Alimazighi Z. Data cubes retrieval and design in OLAP systems: From query analysis to visualization tool. Int. J. Business Intelligence and Data Mining. 2019. Vol. 14. No. 1/2. Pp. 267–270. doi: 10.1504/IJBIDM.2019.096813.
  6. Djiroun R., Boukhalfa K., Alimazighi Z. Designing data cubes in OLAP systems: A decision makers’ requirements-based approach. Cluster Computing. 2018. Pp. 1–14. doi: 10.1007/s10586-018-2883-7.
  7. Etcheverry L., Vaisman A.A. Enhancing OLAP analysis with Web cubes. LNCS. 2012. Vol. 7295. Pp. 469–482. doi: 10.1007/978-3-642-30284-8_38.
  8. Fan J., Han F., Liu H. Challenges of Big Data analysis. National Science Review. 2014. Vol. 1. No. 2. Pp. 293–299. doi: 10.1093/nsr/nwt032.
  9. Gray J., Bosworth A., Layman A., Pirahesh H. Data cube: A relational aggregation operator generalizing Group-By, Cross-Tab, and Sub-Totals. Proceedings of ICDE. 1996. Pp. 152–159.
  10. Gyssens M., Lakshmanan L.V.S. A Foundation for multi-dimensional databases. In: Proceedings of the 23rd International Conference on Very Large Data Bases VLDB’97. 1997. Pp. 1–10.
  11. Haelen B., Davis D. Delta lake: Up and running. O’Relly Media, 2023. Pp. 26–44.
  12. Kuznetsov S.D., Kudryavtsev Yu.A. A mathematical model of the OLAP cubes. Programming and Computer Software. 2009. Vol. 35. No. 5. Pp. 257–265. doi: 10.1134/S0361768809050028.
  13. Letrache K., El Beggar O., Ramdani M. The automatic creation of OLAP cube using an MDA approach. In: Software: Practice and experience. 2017. Pp. 3–14. doi: 10.1002/spe.2512.
  14. Nanda N., Gupta S., Vijrania M. A comprehensive survey of OLAP: Recent trends. In: Proceedings of the Third International Conference on Electronics Communication and Aerospace Technology ICECA 2019. IEEE, 2019. Pp. 425–427. doi: 10.1109/ICECA.2019.8822203.
  15. Ron C., L’Esteve. Delta lake. In: The deinitive guide to azure data engineering. 2021. Pp. 293–295. doi: 10.1007/978-1-4842-7182-7_15.
  16. Saha D. Disruptor in data engineering – comprehensive review of apache iceberg. SSRN preprint. 2024. No. 4987315. Pp. 5–11. doi: 10.2139/ssrn.4987315.
  17. Vassiliadis P. A cube algebra with comparative operations: Containment, overlap, distance, and usability. arXiv.org. 2023. Pp. 2–58. doi: 10.48550/arXiv.2203.09390v2.
  18. Vassiliadis P. Modeling multidimensional databases, cubes, and cube operations. Proceedings of SSDM. 1998. Pp. 1–9. doi: 10.1109/SSDM.1998.688111.
  19. Vohra D. Apache parquet. In: Practical Hadoop ecosystem. Berkeley, CA: Apress, 2016. Pp. 325–328. doi: 10.1007/978-1-4842-2199-0_8.
  20. Wang Z., Chu Y., Tan K.-L. et al. Scalable data cube analysis over big data. arXiv.org. 2013. Pp. 1–12. doi: 10.13140/RG.2.2.35874.25282.
  21. Tennick A. Practical MDX queries: For Microsoft SQL server analysis services 2008. McGraw-Hill Education Group. 2010. Pp. 5–25.
  22. Piasevoli T., Li S. MDX with Microsoft SQL Server 2016 Analysis Services Cookbook. 3rd ed. Packt Publishing, 2016. Pp. 401–440.

Supplementary files

Supplementary Files
Action
1. JATS XML


License URL: https://www.urvak.ru/contacts/

Согласие на обработку персональных данных

 

Используя сайт https://journals.rcsi.science, я (далее – «Пользователь» или «Субъект персональных данных») даю согласие на обработку персональных данных на этом сайте (текст Согласия) и на обработку персональных данных с помощью сервиса «Яндекс.Метрика» (текст Согласия).