**Berlin**: SAP has introduced the Sales Autocompletion Linked Business Tables (SALT) dataset, aimed at providing high-quality, anonymised enterprise data for AI research. This initiative seeks to address the existing gap between theoretical work and practical applications in real-world business contexts, enhancing AI’s role in enterprise analytics.
The continued evolution of generative artificial intelligence (AI) has showcased its capacity across various domains, profoundly enhancing workplace productivity through applications such as composing emails, answering complex inquiries, and even crafting wedding speeches. Central to this progress are large language models (LLMs), which have demonstrated significant advancements in processing and generating natural language. However, challenges remain when adapting these models beyond textual data to the structured, tabular datasets crucial for enterprise operations, especially given the scarcity of training data in this area.
The initiative spearheaded by SAP addresses this gap with the development of the “Sales Autocompletion Linked Business Tables” (SALT), a curated dataset that encompasses anonymised data derived from a customer’s enterprise resource planning (ERP) system. This dataset is designed to benefit researchers engaging in the training and benchmarking of AI models tailored for real-world business contexts.
The challenge of obtaining high-quality enterprise data is primarily attributed to issues of data privacy, confidentiality, and commercial interests, which restrict access to substantial, clean datasets necessary for model training. As a result, there exists a widening chasm between the theoretical work conducted by researchers and the complexities embodied in actual enterprise data.
SAP’s SALT dataset offers a robust solution to this problem by providing access to realistic enterprise data reflective of genuine industry scenarios, including millions of sales order entries interlinked through relational tables. The SALT dataset facilitates a deeper understanding of the characteristics of business data, aiding in the benchmarking of model performance and fostering the development of superior foundation models.
Speaking to SAP News Center, Tassilo Klein, one of the researchers behind SALT, stated, “There is a gap between academia and industry in terms of data. It cannot be closed easily because of privacy…But we want to enable the research community to work on real problems, not just simulated problems.” This insight highlights the initiative’s intent to bridge the divide between theoretical research and practical application within the enterprise sector.
SALT serves as a foundational dataset, beneficial for researchers aiming to understand complex data configurations that often define enterprise operations. Johannes Hoffart, CTO of Business AI at SAP, elaborated on the dataset’s significance, noting, “SALT is a first step to providing researchers with authentic representative industry data that gives a glimpse into actual enterprise data; for now, we are starting with just one customer and use case.” Hoffart also conveyed SAP’s intentions to disseminate additional datasets in the future, thereby expanding the scope of research and applications in enterprise contexts.
In a broader context, SAP is simultaneously advancing its SAP Foundation Model, designed specifically to manage enterprise tabular data effectively. This table-native AI model aspires to accelerate the time-to-value for predictive tasks while requiring minimal training data. Accompanied by the release of the PORTAL paper, which outlines the model’s capabilities, SAP is establishing a framework from which diverse applications can emerge.
Knowledge graphs also form an integral component of this initiative, as they encapsulate metadata related to the key aspects of data (the who, what, and when), illuminating the relationships between information. This structured representation enables AI models to optimise their processing capabilities, thus enhancing their functionality within varied enterprise use cases.
As SAP continues to invest in the open research community through the SALT dataset and develops its proprietary predictive models, the enterprise sector stands on the cusp of a pivotal transition towards more sophisticated utilisation of AI in managing and analysing structured data.
Source: Noah Wire Services