New partnership announcement with Databricks

Artificial intelligence ceased a while ago to represent only and exclusively the future, playing a key role in the technological development of businesses already today. Bitrock is aware of this, and starting this year it has decided to launch its new Data, AI & ML Engineering area, precisely to apply next-generation technologies to one of the activities that most affect business growth: the proper management of data. A topic dear to the 100% Made in Italy consulting company, but also to the entire reference Group (Fortitude), which with its sister company Radicalbit has been dealing with streaming data analysis for several years now.

Collecting the information at hand, cataloging, and exploring it, training a model, running it and maintaining it, are all steps in an extremely complex cycle that, if completed correctly, yields countless benefits: from the ability to make timely decisions to the ability to minimize the waste of energy and raw materials, with related impact on business costs. With this in mind, and to give the new unit greater momentum, a partnership has been signed with Databricks, the company known for creating Apache Spark, MLflow and the data lakehouse with Delta Lake: a combination of data warehouse and data lake in a single, simple platform to better manage all types of structured, semi-structured and unstructured data.

Antonio Barbuzzi has been appointed to Head the unit. The manager, who has a degree in telecommunications engineering and a Ph.D. in electrical engineering, has always been involved, including abroad, in everything related to data analytics, both for large companies and emerging startups. After several years in France and the UK, he returned to Italy at the end of 2019, to Unicredit Services, as Head of GCC CBK Branch Tools and Head of ICT CRM and later as technical manager of the integration of the bank’s new CRM. He joins Bitrock, precisely as Head of Data, AI & ML Engineering, in September last year.

“I am delighted to have joined such an innovative company as Bitrock. Helping the company in this new path will certainly be a difficult challenge but also a very compelling one. – declares Barbuzzi, Head of Data, AI & ML Engineering Area at Bitrock – Artificial intelligence and Machine Learning technologies, together with the Cloud, are crucial for our clients’ business development, particularly when applied to data management and analysis. The goal will, therefore, be to provide them with tools and skills that can support them in the most congenial way, creating tailor-made services from time to time.”

“Automation, simplification, and Artificial Intelligence are in our view the pillars of the future on which we base our work to ensure speed of development, cost reduction, and overall increase in efficiency for businesses. – Adds Leo Pillon, CEO of Bitrock – This is the vision of the entire Fortitude Group, as well as of Bitrock as it begins this new journey. The hope is that in a short time we can become an authoritative reference in a specific sector that is becoming more and more important day by day.”

Scenario

According to recent estimates by Expert Market Research (2022), investment in data management-related activities amounts to about $70 billion, one-fifth of the total spending used for infrastructure creation in 2021 according to Gartner. A fast-growing trend that is also reflected in the job market, where data scientist, data engineer and machine learning engineer are among the most sought-after figures globally. A similar scenario is expected for the future. According to McKinsey, by 2025 companies will base all kinds of decisions on data analysis, relying on real-time processing for increasingly precise insights.

Read More

Vision & Offering

This is the second part of our article which introduces Bitrock’s vision and offering in the Data, AI & ML Engineering area. The first part delimits the context where we focus and operate, while this one defines our vision and the proposition that follows.

Vision

Artificial Intelligence (AI) is shaping the future of mankind in nearly all industries, and it is driving advancements in heterogeneous fields such as big data, robotics, and Internet of Things. We have a strong conviction that AI will continue to be a driving force of innovation and progress in the future. As a company, we recognize the vital importance of AI and ML for organizations to not just survive but thrive in the market. 

That’s why we’re committed to providing our customers with the platform, tools, and expertise to harness the full potential of AI and help them create innovative solutions, helping them with operationalization of robust and reliable AI-based solutions, and we tailor our offering to meet the needs of customers in this field.

AI/ML is the last piece of the puzzle, the last stretch in a race. It needs strong pillars to build upon: a reliable and scalable data platform, designed to evolve and not just for latest delivery, where security and governance are central, with automatic tests, continuous integration/deployment in place. Indeed, for data even more so, the motto “garbage-in, garbage-out” is valid.

Data platforms should be tailored to the customer needs: there is no one-size-fit-all approach to data engineering problems, rather there are companies, customers, partners with different backgrounds and needs requiring different solutions. Paraphrasing Maslow’s hammer, not everything is a nail and can be pounded using a hammer.

We believe in bespoke solutions for our clients, driving them through the intricacies of the current data landscape, and designing the platform better fitting their existing infrastructure and needs.

Our ambition is also to help our clients to define a clear and effective data strategy that aligns with the overall business objective. Organizations should define goals, processes and business targets; provide data governance framework and processes balancing security, privacy concerns and simplifying the process to discover, access and use data.

In order to provide the best services, we value our partnerships: as of today, we’re partners with Databricks, Confluent and HashiCorp.

Design Principles

Our solutions follow specific design principles, driving our choices and design:

Cloud first

Cloud first means prioritizing cloud over on-premise solutions. In other words, having to justify picking on-premise solutions rather than making a case for cloud ones.

We’re aware of the reluctance of some companies towards cloud solutions: nevertheless, nowadays there are still very few reasons to not embrace cloud. The advantages provided by the cloud are too many: faster time to market, easy scaling, no upfront license/hardware costs, lower operative cost. Basically, it allows us to outsource non-core processes and focus on what matters the most to the business.

ML/AI from the beginning

Machine Learning (ML) and Artificial Intelligence (AI) have witnessed a tremendous leap forward in the latest years, mainly due to the increased availability of computing resources (faster GPUs, bigger memories) and data. Artificial intelligence has reached or surpassed human-level performances in many complex tasks: autonomous driving is now a reality and social networks use ML profusely to detect harmful content and target advertisements, while generative networks such as OpenAI’s GPT-3 or Google’s Imagen could be game changers in the quest toward artificial general intelligence (AGI).

AI/ML is no longer the future to look at, it’s the present. 

Some organizations will use it as a competitive advantage over its competitors; others will see it as a homework to keep up and remain competitive on the market. For sure, no one can really afford to ignore it anymore (or maybe just monopolies and the public administration?). 

AI and ML have a central role in our vision and shape our architectural and technological choices.

In this context, continuously interpreting data, discovering patterns and making timely decisions based on historical and real-time data, the so-called Continuous Intelligence, will play a crucial role in defining the business strategies and will be one of the most widespread applications of machine learning. Indeed, Gartner estimates that, within 3 years, more than 50% of all business initiatives will require continuous intelligence and, by 2023, more than one-third of enterprises will have analysts practising decision intelligence, including decision modelling.

MLOps and AI Engineering

MLOps, or Machine Learning Operations, is a field in the ML community that is rapidly gaining momentum. It advocates for the need to manage the ML lifecycle following software-inspired best practices and DevOps philosophy. This approach aims to make ML-powered software reproducible, testable, and evolvable, ensuring that models are deployed and updated in a controlled and efficient manner. The importance of MLOps lies in the ability to improve the speed and reliability of ML model deployment, while reducing the risk of errors and improving the overall performance of models.

Data democratization

We’ve already underlined the importance of data democratization. Achieving it requires several key elements to be in place. Firstly, it requires a data culture where data is seen as a strategic asset valued and leveraged throughout the company. This requires a buy-in and commitment from top management.

A widespread access to data urges for a widespread adoption of more robust Data Governance solutions, with data discoverability features, to effectively manage complex data processes and make data available and usable by everybody in need. 

Making data accessible means also lowering the entry-barrier to it, and therefore providing more user-friendly platforms, which can be usable in autonomy, without advanced knowledge (the so-called self-service platform).

Data Mesh is an approach oriented towards large-scale environments, going in this direction. It addresses silos and bottlenecks in large companies and emphasises the decentralization of data ownership, moving data ownership to the business domain teams.

Data mesh is an approach which increases overall complexity and introduces new challenges in organizations adopting it, but it may help them when scalability and data silos effectively represent an entry barrier to a company-wide data usage.

Reference Architecture

We at Bitrock refrain from providing a one-size-fit-all solution; we rather provide a reference data architecture modelled after technology stacks used across multiple companies, updated with more recent innovations.

We focus on a Multimodal data processing architecture, specialized in AI/ML and operational use-cases, able to support analytical needs typical of data warehouses. As previously explained, this is an alternative to a Business Intelligence oriented alternative, based on data warehouses.

At the core of the system there are the concepts of data lake and data lakehouse.

A data lake is a centralised repository that allows you to store and manage all your structured and unstructured data at any scale. They are traditionally oriented towards advanced data processing of operational data and ML/AI. The data lakehouse concept adds to them a robust storage layer paired with a processing engine (spark, presto, …) to enhance it with data-warehousing capabilities, making data lakes suitable for analytical workloads too. 

There is growing recognition for this architecture, which is supported by a wide range of vendors, including Databricks AWS, Google Cloud, Starburst, and Dremio – and by data warehouses vendors like Snowflake too.

For a more detailed introduction to it, please refer to a previous article on our Blog (Data Lakehouse, beyond the hype).

Our processing engine of choice is Apache Spark, which is the de-facto standard for operational workloads – paired with the battle-tested and reliable Apache Airflow or Astronom, a SaaS version. In the orchestration world, Dagster or Prefect are alternatives to Airflow which are gaining a lot of traction. They foster a switch to a higher-level abstraction, from managing workflow to handling dataflows.

Spark is suitable for both batch and real-time workloads, but for real-time data processing Apache Flink and Kafka Streams may be good alternatives, especially for applications with more stringent latency requirements.

In the streaming world applied to AI and ML, another option is Helicon from Radicalbit, which is a solution aimed at reducing the gap between data scientist and data engineering using a no-code/low-code approach. There’s a revived interest in the no-code/low-code solutions, which are ringing new users (i.e. analysts and software developers) into the ML market, pushed by new low code ML solutions like Databricks AutoML, H2O, Datarobot, etc.

Quick data exploration may be achieved by either the use of ad-hoc query engines like Trino/Presto/Starburst/Databricks SQL or using notebooks like Jupyter or their managed versions.

The integration is the boring homework preceding the fun part. However, it represents the largest fraction of cost of most data projects, ranging from 20-30% on average up to 70% for some pessimistic cases.

From a technical point of view, the injection layer is quite diversified and it is generally shaped following the organization’s data sources and infrastructure.

Traditionally, data is extracted from operational data sources and transformed before being loaded into a data warehouse, the so called ETL. Cheap cloud storage and the separation of storage and computing laid the foundation for a paradigm shift advocating the anticipation of the loading phase before the transformation phase (ELT). This pattern, actually not totally new for data lakes, shines as it removes the business logic from loading phase in the injection layer, making it possible to simplify the integration by outsourcing it.

Fivetran, along with Airbyte, Matillion and many others, are examples of ELT tools. Strictly speaking, ETL term usually is generally used more in data-warehousing context, however those integration tools are beneficial to lakes and lakehouse architectures too: Fivetran has recently become a partner of Databricks too for example.

In the injection layer, Confluent is also playing a more and more important role with Kafka Connectors, allowing it to pull (and push too) data from a variety of sources. The pair Kafka and CDC (Change Data Capture), with software like Debezium/Qlik/Fivetran, is a more and more common integration pattern used in this context.

The following figure, based on the unified data platform from Horowitz (Bornstein, Li, and Casado 2020), exemplifies our architecture, in particular the boxes highlighted in yellow:

Emerging Architectures for Modern Data Infrastructure

ML-platform

A central role in our platform is reserved to the operationalization of ML models and AI-based software.

MLOps, or Machine Learning Operations, is a rapidly growing field in the ML community that advocates for the need to manage the ML lifecycle following software-inspired best practices and DevOps philosophy. This approach aims to make ML-powered software reproducible, testable, and evolvable, ensuring that models are deployed and updated in a controlled and efficient manner. The importance of MLOps lies in the ability to improve the speed and reliability of ML model deployment, while reducing the risk of errors and improving the overall performance of models. Our idea of a generic platform for machine learning providing all the tools to operationalize ML lifecycle is best described by the following figure, based on (Bornstein, Li, and Casado 2020).

Emerging Architectures for Modern Data Infrastructure

Conclusions

We believe AI and ML are crucial for any organization and will be fundamental to succeed and thrive in the market.

Bitrock is committed to providing customers with the platform, tools, and expertise to harness the full potential of Artificial Intelligence (AI) and Machine Learning (ML) and operationalize it through AI engineering and MLOps.

We tailor our offering to meet the unique needs of our customers and believe in providing bespoke solutions for our clients. Our ambition is to jointly define a clear and effective data strategy that aligns with their overall business objectives. 

If you have any questions, doubts or just want to discuss data-related topics, please feel free to get in touch: we’d be more than happy to help or just chat!

References

Author: Antonio Barbuzzi, Head of Data, AI & ML Engineering @Bitrock

Read More

Vision & Offering

In this blog post we’re introducing Bitrock’s vision and offering in the Data, AI & ML Engineering area. We’ll provide an overview of the current data landscape, delimit the context where we focus and operate, and define our proposition.

This first part describes the technical and cultural landscape of the data and AI world, with an emphasis on the market and technology trends. The second part that defines our vision and technical offering is available here.

A Cambrian Explosion

The Data & AI landscape is rapidly evolving, with heavy investments in data infrastructure and an increasing recognition of the importance of data and AI in driving business growth.

Investment in managing data has been estimated to be worth over $70B [Expert Market Research 2022], accounting for over one-fifth of all enterprise infrastructure spent in 2021 according to (Gartner 2021).

This trend is tangible in the job market too: indeed, data scientists, data engineers, and machine learning engineers are listed in Linkedin’s fastest-growing roles globally (LinkedIn 2022).

And this trend doesn’t seem to slow down. According to (McKinsey 2022), by 2025 organizations will leverage on data for every decision, interaction, and process, shifting towards real-time processing to get faster and more powerful insights.

This growth is reflected also in the number of tools, applications, and companies in this area, and from what is generally called a “Cambrian explosion”, comparing this growth to the explosion of diverse life forms during the Cambrian period, when many new types of organisms appeared in a relatively short period of time. This is clearly depicted in the following figure, based on (Turk 2021).

A Cambrian Explosion

The Technological Scenario

Data architectures serve two main objectives, helping the business make better decisions exploiting and analyzing data – the so-called analytical plane – and provide intelligence to customer-facing applications – the so-called operational plane.

These two use-cases have led to two different architectures and ecosystems around them: analytical systems, based on data warehouses, and operational systems, based on data lakes.

The former, built upon data warehouses, have grown rapidly.  They’re focused on Business Intelligence, business users and business analysts, typically familiar with SQL. Cloud warehouses, like Snowflake, are driving this growth; the shift from on-prem towards cloud is at this point relentless.

Operational systems have grown too. These are based on data lakes; their growth is driven by the emerging lakehouse pattern and the huge interest in AI/ML. They are specialized in dealing with unstructured and structured data, supporting BI use cases too.

Since a few years ago, a path towards a convergence of both technologies has emerged. Data lake houses added ACID transactions and data-warehousing capabilities to data lakes, while warehouses have become capable of handling unstructured data and AI/ML workloads. Anyway, the two ecosystems are still quite different, and may or may not converge in the future.

In the ingestion and transformation sides, there’s a clear architectural shift from ETL to ELT (that is, data is firstly ingested and then transformed). This trend, made possible by the separation between storage and computing brought by the cloud, is pushed by the rise of CDC technologies and the promise to offload the non-business details to external vendors.

In this context Fivetran/DBT shine in the analytical world (along with new players like airbyte/matillion), while Databricks/Spark, Confluent/Kafka and Astronomer/Airflow are the de-facto standards in the operational world.

It is also noteworthy that there has been an increase in the use of stream processing for real-time data analysis. For instance, the usage of stream processing products from companies such as Databricks and Confluent has gained momentum.

Artificial Intelligence (AI) topics are gaining momentum too, and Gartner, in its annual report on strategic technological trends (Gartner 2021), lists Decision Intelligence, AI Engineering, Generative AI as priorities to accelerate growth and innovation.

Decision Intelligence involves the use of machine learning, natural language processing, and decision modelling to extract insights and inform decision-making. According to the report, in the next two years, a third of large organisations will be using it as a competitive advantage.

AI Engineering focuses on the operationalization of AI models to integrate them with the software development lifecycle and make them robust, reliable. According to Gartner analysts, it will generate three times more value than most enterprises not using it.
Generative AI is one of the most exciting and powerful examples of AI. It learns the context from training data and uses it to generate brand-new, completely original, realistic artefacts and will be used for a multitude of applications. It will account for 10% of all data produced by 2025 according to Gartner.

Data-driven Culture and Democratization

Despite the clear importance of data, it’s a common experience that many data initiatives fail. Gartner has estimated that 85% of big data projects fail (O’Neill 2019) and that through 2022 only 20% of analytic insights will deliver business outcomes (White 2019).

What goes wrong? Rarely problems lie in the inadequacies of the technical solutions. Technical problems are probably the simplest. Indeed, since ten years ago, technologies have evolved tremendously fast and Big Data technologies have matured a lot. More often, problems are rather cultural.

It’s not a mystery that a data lake by itself does not provide any business value. Collecting, storing, and managing data is a cost. Data become (incredibly) valuable when they are used to produce knowledge, hints, actions. To make the magic happen, data should be accessible and available to everybody in the company. In other words, organizations should invest in a company-wide data-driven culture and aim at a true data democratization.

Data should be considered a strategic asset that is valued and leveraged throughout the organization. Managers, starting from the C-levels, should remove obstacles and create the conditions for people in need of data to access them, by removing obstacles, bottlenecks, and simplifying processes.

Creating a data culture and democratizing data allows organizations to fully leverage their data assets and make better use of data-driven insights. By empowering employees with data, organizations can improve decision-making, foster innovation, and drive business growth.

Last but not least, Big Data’s power does not erase the need for vision or human insight (Waller 2020). It is fundamental to have a data strategy in mind to define how the company needs to use data and the link to the business strategy. And, of course, a buy-in and commitment from all management levels, starting from the top. 

The second part of this article can be found here.

References

Author: Antonio Barbuzzi, Head of Data, AI & ML Engineering @ Bitrock

Read More
An advanced Data Architecture for Manufacturing 4.0

Leveraging A.I., IoT and Stream Processing to enable the Smart Manufacturing paradigm


On June 3rd we hosted our webinar “An advanced Data Architecture for Manufacturing 4.0” in collaboration with our Partner Radicalbit.

During the event, attendees had the chance to learn more about the potential of the combination of A.I., IIoT and Stream Processing: a perfect blend of cutting-edge technologies that can enable the Smart Manufacturing paradigm, allowing companies to forecast in run-time production behavior, predict instantly economic and timing impacts, analyze good functioning of the equipment and, last but not least, have the predictive maintenance status in (near) real-time.

More specifically, attendees discovered how it is possible to transform data into information as quickly as possible and interpret it correctly thanks to RNA, a DataOps & MLOps enterprise-grade end-to-end platform designed to combine streaming event analysis and A.I., simplifying and accelerating developments in Advanced Analytics projects and Machine Learning enabled Decision Support Systems.

With the adoption of these cutting-edge technologies, companies in the Manufacturing industry can become more efficient, less resource consuming, services oriented (instead of merely product oriented), fast adaptable to business needs, and act autonomously leveraging human interaction.

Thanks to Roberto Mariotti (Technical Presales Manager at Radicalbit), Davide Fiacconi (Data Scientist at Radicalbit) and Cosma Rizzi (International Business Development at Bitrock) for sharing with us all your expertise in the field, and giving useful insights on the main challenges and trend topics of Manufacturing 4.0.

If you didn’t have the chance to attend the webinar live, or if you want to go back through the slides that were shown, you can access the presentation at the following link: https://bit.ly/3g4TR2v

If you want to access the webinar recording, or simply know more about RNA and Bitrock technology offering, send an email to info@bitrock.it


Davide Fiacconi, Data Scientist

Roberto Mariotti, Technical Presales Manager

Cosma Rizzi, International Business Development Manager

Read More