A few months ago, we began an incredible journey into the world of Java through a compelling retrospective written in collaboration with Danilo Ventura, Senior Software Engineer at Bitrock.

Enjoy the fourth and last episode following below!

In 2014 a revolutionary new version of Java arrived. All versions of Java are backward compatible, meaning that sources generated with a new version of Java are executable on older Java VMs. This revolutionary version of Java 8 allows for a leap in functional languages. So come many new tools such as streams, lambda expressions, optionals to write more functional code using both the object-oriented paradigm and the functional paradigm. Java 8 was really a milestone because it gave birth to new constructs and many Java developers took years to get used to it.

The first Java application framework like Spring Boot

From now on, Java is used to create services and for web applications. The first Java frameworks begin to emerge to quickly create applications that provide Rest Json services. For convenience, we will mention only the most important one: Spring Boot. The latter is based on a leaner library that was precisely Spring (launched in 2016) and made developers’s life easier allowing not to use the complex EJBs but to create JARs able to self-execute, without installing a server and running it.


Cloud, Container and Docker

It is precisely with this in mind that a new revolution takes place, that of the cloud and containers. Before, servers and their maintenance were 'in house' - just think of banks and insurance companies; right around this time, the concept of Cloud was born mainly with giants such as Amazon, Google, and Microsoft providing the necessary tools to create 'machines' on the web where you can install everything you need to run your website or web application. In this way, all the maintenance of the '(virtual) machine' is entrusted to the Cloud Manager.

The other major aspect is the container, since it changes the way applications are distributed. The distribution, indeed, is now published through a container that can be basically seen as a 'machine' with an operating system inside and a container executor called Docker that guarantees the security and functionality of the application.


“Write once, run anywhere (WORA)” or not?

All of this shakes the concept and motto of Java: "Write once, run anywhere". If the application can be distributed in a container with the preferred operating system, what are the advantages of still using Java? Indeed, there are none and this is precisely the reason why new and alternative languages emerged such as Rust and Google's GO whose container takes 5 milliseconds to start - instead of Java, which needs 30 seconds.

It seems like Java in this Cloud Native world would have no future. Actually, it was not true. Along comes a super revolutionary technology in Java: the GraalVM, which allows native code to be generated from bytecode of any type. So a bytecode starting from a source of any type - Java, C - is created for a special virtual machine which is precisely the GraalVM capable of generating an executable file for a given operating system. Why is it revolutionary? Java can be used to create a container with an executable in it that can start in times comparable to services written in GO.

Java continues to move forward by adapting to new frameworks that allow services to be written directly cloud-native. For instance, one noteworthy framework is Quarkus, which allows the generation of cloud-native containers that can run on Docker or any container orchestrator.

The new frontier of Web 3.0

We are on the threshold of Web 3.0, and the new frontier is the blockchain-based web. Perspectives are starting to change and new architectures are being released: server-side, there will no longer be a single server managed by a platform. Just to give you an example: YouTube is run by Google, owned by Google, and the videos that are uploaded are on Google's servers as well as owned by Google.

With Web 3.0, the perspective changes, since the platform is not owned by a single owner anymore, but distributed on the blockchain.

In line with the times and needs, the Web continues to evolve while facing many difficulties and uncertainties. How will Java adapt to this new type of architecture? We will see! In the meantime, we can say that we are not standing still but moving with this evolution; there are already several pieces of evidence, such as new framework libraries (e.g. Spring) that allow interaction with popular blockchains.

We have come to the end of this fourth and final episode of the retrospective dedicated to Java. We thank Danilo Ventura for sharing his valuable insights on Java technologies and wrapping up 20 years of web history.

You can also read the firstsecond and third parts. Follow us on LinkedIn and stay tuned for upcoming interesting technology articles on our blog!

Read More
Mobile App

Mobile applications are softwares that run on mobile devices such as smartphones, tablets, etc. Developing a mobile app is a great way to have your clients always connected one tap away from your business. With a tool like this you could open new roads to help your customers with your product anywhere they are, create more engagement and understand better what they like most by collecting some analytics.

Do you want to learn more about our Mobile App Development solutions? Visit our dedicated page and submit the form. One of our consultants will get back to you right away!

Not every app is built in the same way

A mobile app can be built in different ways, it all depends on what the software aims to do and it is never an easy choice nowadays. Let's see what kind of mobile app development approaches exist at the moment and understand better their main differences by looking at pros and cons of each.


A native mobile app is a software built for a specific platform, iOS or Android, by using the tooling system provided from Apple or Google respectively.

These kinds of apps are the most powerful, the code is optimized to work on a specific operating system in order to achieve the best performance. By using the native tooling systems with their latest updates the UI/UX can be developed to follow all the styling guidelines provided by Apple and Google more easily for the developers. This can still be the best option for apps that need to do a lot of computation or require a deep implementation of the system APIs like Bluetooth, GPS, NFC, and so on.

Looks everything cool but... where is the catch? Well, unfortunately these kinds of apps are the most expensive since both iOS developers and Android developers are needed to build and maintain the two apps completely separately and different from each other.

iOS developers use XCode as the main IDE (integrated development environment) and the main languages used are Swift and Objective-C, meanwhile for Android apps developers can use Android Studio writing Kotlin and Java code.


In the last few years a new technology came from JetBrains and it is called Kotlin Multiplatform (KMP).

This is a new way of developing apps where the developers can share part of the code written in Kotlin between the iOS and the Android app keeping the code native thus with the same performances (or so) of a native implementation. Usually developers share some Business Logics and the communications with servers or local storage and leave the UI/UX development natively. Of course depending on the project, more or less modules should be shared between the two apps. A good architect will choose the best structure to build the project.

From an android point of view the app development is close to a native app while for the iOS developers the UI will be written in Swift, as always, but it will connect to some logic written in Kotlin.

The Kotlin Multiplatform framework will compile both apps on native code for you. Looks like magic right? Yes, it does, but reality is not all about pros with KMP. You will still need both iOS and Android developers to build and maintain the apps, plus the framework is still new and there are some technical downsides with this approach. Depending on the project, a native approach could turn out to be even less expensive than a KMP one, especially when it comes to keep updated the project itself.

We at Bitrock love to explore new technologies, we already used this framework on some projects, and we're looking forward to seeing if this will be the new standard of mobile app development in the future.



Cross-Platform apps are built in a different way compared to the Multiplatform ones. At the moment there are two main frameworks on the top level of a cross-platform approach: Flutter and React Native. With both frameworks the developers can write the code only once for the two apps, usually reducing the cost and time to market for the clients. React Native is the older one, it was started back in 2015 by Facebook and made a huge step forward in Cross-Platform development reaching performances close to a native app. After that a new player joined in 2018: Flutter, built by Google. It rapidly gained popularity because of its stunning performances and for how quick it is to build beautiful UIs with that framework.

Under the cover, React Native is close to ReactJS, the UI is built with component blocks and the main language is Javascript. On the other hand, Flutter has widgets to compose the UI and the main language is Dart. Each RN UI component is chained to a native one for the specific platform, meanwhile for Flutter each component is drawn from scratch by the Flutter team and rendered on a canvas with Skia Graphics Engine.

They are both really great frameworks to build cross-platform apps with performances really close to the native implementation but with less expenses. 

The downsides of both frameworks are related to your project needs. These are technologies relatively new and we cannot say for how long they will be updated and maintained. If the framework will be shut down, it will not be as easy as the approaches above to work with deprecated technologies or, in the worst case, restore the app with a native implementation by rewriting it from scratch.

Our feeling is that these kinds of approaches are here to stay and they are and will be the best choice for a lot of beautiful projects.

Flutter and React Native

WebApp and PWA

These are the kind of websites that act like mobile apps when opened from a browser on a mobile device.

PWA (Progressive Web Application) are WebApp with superpowers, they can be used to interact with part of the device hardware as well and can perform different things based on which device is running the software.

Usually they are not available on the App Store and Play Store, even though for Android something started moving with TWA.

These kinds of apps are usually written with web languages: HTML, CSS and Javascript, like a normal website. At the moment these are the apps with the lowest performance but they are easier to maintain and release since there are no updates needed throughout the official stores, sometimes app updates can be tricky to achieve and with these solutions we avoid that completely.

PWA (Progressive Web Application)


Native vs Multiplatform vs Cross-Platform vs WebApp.

What is the best development approach for your app in 2023?

“It depends!” as developers love to say.

In choosing the right app, a lot will rely on your project, needs, goals, target audience and mostly on the budget you decide to invest.

We at Bitrock would love to hear more about your project idea to support you with all our expertise and experience on this jungle of code and frameworks. In the meantime, we will keep us updated on what's coming next and we are excited to see how the future of mobile apps will look like.

Special thanks to the main author of this article: Emanuele Maso, Mobile Developer at Bitrock. 

Do you want to learn more about our Mobile App Development solutions? Visit our dedicated page and submit the form. One of our consultants will get back to you right away!

Read More
Java 3

A few months ago, we began an incredible journey into the world of Java through a compelling retrospective written in collaboration with Danilo Ventura, Senior Software Engineer at Bitrock. We can move on with our narrative by depicting what happened in the mid- and late-2000s as well as highlighting the developments of Java with the advent of Web 2.0 .

Back in early 2005 there was a major revolution that kicked off what has been called Web 2.0. What is it? Web 2.0 is the revolution in the computer industry caused by the move to the internet as a platform. It’s an improved version of the first world wide web, characterized specifically by the change from static to dynamic or user-generated content and also the growth of social media like, for instance, the first one that was YouTube to publish video. Starting from these years, the web begins to transform itself into a container of platforms.

Ajax and JSON

What was the technical evolution that allowed this to happen? New browsers like Firefox and Chrome took advantage of a new technology called Ajax (Asynchronous JavaScript and XML) which allowed for asynchronous calls. What does it mean? Browsers launch a request and while the response arrives another request can be launched.  All this makes web pages more interactive and faster. Pages populate as long as requests are completed. It also changes the very structure of web applications because you no longer have to provide the entire page but only the data.
The web application development changed and new frameworks were based on Javascript. Between 2005 and 2010 came the first JavaScript library for web applications distributed as a free software: jQuery. Later came more powerful frameworks for open source web applications such as AngularJs up to Angular 2 and React JS.

Hence, server-side applications become just data providers. The new interchange of data between client/server applications was based on JSON, which stands for JavaScript Object Notation. This was an open standard file format and data interchange format that used human-readable text to store and transmit data objects consisting of attribute-value pairs and arrays (or other serialisable values).

Java and the Mobile Operating Systems

In 2008, the first commercial Android device was unveiled: this was another important moment for Java. Android is an operating system that basically runs programs that are written in Java and compiled by a Dalvik virtual machine that is able to take this Java code and turn it into the bytecode that is then executed by the Android operating systems. The competitor Apple developed iOS, a parallel proprietary operating system, while Google based everything on Android and the open source language of Java. One more time, Java played a major role as a programming language to write mobile applications.

Criticism of Java

In 2010, the largest database software and technology company Oracle acquired Sun Microsystem, consequently Java became property of Oracle. What changed? Java became a proprietary language and only the latest version remained free. For older versions you still have to go through Java distributors. 

Later the Java programming language and Java software platform were criticized by the developer communities for design choices, including the implementation of generics, forced object-oriented programming. New high-level programming languages have been designed to fully interoperate with the Java Virtual Machine version. We can mention Scala released in 2004 as well as Kotlin in 2011.

We have to wait until 2014 for another revolutionary version of Java.

You can also read the first, second and forth parts. Follow us on LinkedIn and stay tuned for upcoming interesting technology articles on our blog!

Thanks to Danilo Ventura for the valuable contribution to this article series.

Read More
Java | Part 2 | 2000s

A few months ago, we began an incredible journey into the world of Java through a compelling retrospective written in collaboration with Danilo Ventura, Senior Software Engineer at Bitrock. We can move on in our narrative by depicting what happened in the early 2000s.

From Java Server Pages to eXtensible markup language

We saw that Java servlets allowed higher degrees of interactivity and dynamicity in web design. Yet, a significant limitation still existed. Servers could only produce static HTML pages, with no variable elements. Inside a Servlet you had to write all the code of each html page putting together variable data (for example coming from a DB) and static data. 

To simplify the process, in 1999 Sun Microsystems released the first version of a new revolutionary technology: Java Server Pages (JSP).  With JSP  pages it was possible to insert in a HTML page, scripting section parts in java language, creating a “JSP page”. JSP was translated into Servlet and then compiled. With Java Server Pages and servlets led to the birth of server-side pages. This means that pages were no longer static (steady and immutable as website creators thought them): on the contrary, they were dynamically generated on the server side, using the JSP as a template with dynamic contents inside. 

We’ve come to the early 2000s: during these years, the first Java frameworks to write web applications emerge. The most relevant ones we must mention are Velocity (a template engine, rather than a framework), Struts and then Struts2. Tools like these were the first attempts to apply the M.V.C. pattern in web application development. Struts and Struts2 were based on XML (“eXtensible markup language”), which was designed to be a human and machine-readable markup language for handling data transport. XML was born as an exchange format for data but it was also (ab)used as a configuration language.

The Enterprise Java Beans and the Web 1.0 age

The evolution of distributed applications still lacked and it could be achieved only thanks to another Java technology: Enterprise Java Beans (EJB). We can now describe EJB as an architecture for setting up program components, written in the Java programming language, that run in the server parts of a computer network that uses the client/server model.

EJB was a very powerful, complex and also widely used technology. Enterprise Java Beans were based on application servers: we are no longer talking about simple servers or servlet engines. In order to execute EJB, developers required application servers that had an entire substrate (called EJB Container) thanks to which, whenever the application was deployed, a wide array of runtime components was generated, creating an actual ecosystem that allows the application to interact with server resources. Many Software houses, at that time, had its own product of EJB Application Server: WebSphere from IBM, Weblogic from BEA Systems (later acquired by Oracle), and, of course, were present open source products too such as JBoss, Apache TomEE and GlassFish.

Developing applications for this technology was rather complicated, IT since professionals had to deal not only with code writing operations, but also with repetitive actions: launching programs, packaging the application in a certain way, having the correct library dependencies for the projects and so on. All these tasks were done manually: as you can imagine, the chance of error was particularly high. Moreover, it was a huge waste of time (that could be dedicated to development) for developers to generate all these packages that would be deployed on application servers.

This led to the creation of new tools able to solve these needs, starting from Ant. Ant was a very innovative tool since, always through XML, it was possible to write procedures to be executed inside the Java world. Through an Ant script in XML, you could run a sequence of operations, such as copying and filing out files, moving some parts to a specific folder, creating a Java archive from that folder and moving it elsewhere. It was thus possible to automate a number of tasks.


Ant arrived in 2002, followed by Maven in 2004 (which is still used nowadays). Last but not least, Gradle appeared around 2012. Maven, now, allows a java developer to manage all dependencies and the lifecycle of its application in a standard way. All these tools, by solving a series of issues, have been essential to guarantee Java survival, otherwise it would have gone extinct.

In the early 2000s - the Web 1.0 age - there were some alternatives to Java. Some examples? PHP, C#, and Microsoft .NET (the Microsoft technology for building Web Applications in Windows environments).

This was also the glorious time of other technologies that were presumed to have an explosive success. Starting from Adobe Flash. At the time, if you didn’t have a Flash animation on your website, you were basically considered a loser! For this reason, Adobe launched this technology that allowed programmers to create, thanks to a language mainly based on Javascript and XML, a code that was executed from the browsers through a proper HTML tag that was generated.

Java 5

2004 was a breakthrough year, marked by the arrival of a version of Java that is still considered game-changing for many aspects: Java 5. Java 5 showed some new features, i.e. generics (taken from C language), annotations, enumerations, varargs, static imports. Sun Microsystem’s programmers and the Java communities, indeed, realized that in those years evolutions were fast-paced and, if they wanted to remain competitive and make Java a modern language, they would have to introduce additional features.

Not being a static language and continuously evolving through time, was the trump card that made Java survive in the long run. Through the implementation of new features, developers' needs have always been responded to. Java, as a matter of fact, is open-source: all Java developers can download the sources and can manifest within the community which features they would like to be implemented. Even then, the developer community, when talking about the implementation of new features, put it up to the vote.

In those years service-based architectures also began to evolve and companies started realizing that the mainframe was likely to fade away soon. However, there was the need within companies to maintain data sharing available through services.

At this stage, we begin to have problems and questions, like for instance, how do we exchange data? Which language do we use to make different technologies interact with each other? How do we restructure data? Then XML becomes established as an exchange language allowing to structure and to control data.

The first service-oriented architecture (SOA) arose based on a SOAP protocol able to exchange data among multiple technologies and in different formats. Between 2006 - 2009, new solutions, for more complex situations, were implemented, such as, for instance,  enterprise service bus (ESB), that resolves in a brilliant way the point to point communication problem for different software technologies.

Elements of SOA, by Dirk Krafzig, Karl Banke, and Dirk Slama (Source:Wikipedia)

Java is part of this revolution and is no longer just a language for web application, but is evolving to follow the new requirements of the server world.

You can also read the first, third and forth parts. Follow us on LinkedIn and stay tuned for upcoming interesting technology articles on our blog!

Thanks to Danilo Ventura for the valuable contribution to this article series.

Read More
New partnership announcement with Databricks

Artificial intelligence ceased a while ago to represent only and exclusively the future, playing a key role in the technological development of businesses already today. Bitrock is aware of this, and starting this year it has decided to launch its new Data, AI & ML Engineering area, precisely to apply next-generation technologies to one of the activities that most affect business growth: the proper management of data. A topic dear to the 100% Made in Italy consulting company, but also to the entire reference Group (Fortitude), which with its sister company Radicalbit has been dealing with streaming data analysis for several years now.

Collecting the information at hand, cataloging, and exploring it, training a model, running it and maintaining it, are all steps in an extremely complex cycle that, if completed correctly, yields countless benefits: from the ability to make timely decisions to the ability to minimize the waste of energy and raw materials, with related impact on business costs. With this in mind, and to give the new unit greater momentum, a partnership has been signed with Databricks, the company known for creating Apache Spark, MLflow and the data lakehouse with Delta Lake: a combination of data warehouse and data lake in a single, simple platform to better manage all types of structured, semi-structured and unstructured data.

Antonio Barbuzzi has been appointed to Head the unit. The manager, who has a degree in telecommunications engineering and a Ph.D. in electrical engineering, has always been involved, including abroad, in everything related to data analytics, both for large companies and emerging startups. After several years in France and the UK, he returned to Italy at the end of 2019, to Unicredit Services, as Head of GCC CBK Branch Tools and Head of ICT CRM and later as technical manager of the integration of the bank's new CRM. He joins Bitrock, precisely as Head of Data, AI & ML Engineering, in September last year.

"I am delighted to have joined such an innovative company as Bitrock. Helping the company in this new path will certainly be a difficult challenge but also a very compelling one. - declares Barbuzzi, Head of Data, AI & ML Engineering Area at Bitrock - Artificial intelligence and Machine Learning technologies, together with the Cloud, are crucial for our clients' business development, particularly when applied to data management and analysis. The goal will, therefore, be to provide them with tools and skills that can support them in the most congenial way, creating tailor-made services from time to time."

"Automation, simplification, and Artificial Intelligence are in our view the pillars of the future on which we base our work to ensure speed of development, cost reduction, and overall increase in efficiency for businesses. - Adds Leo Pillon, CEO of Bitrock - This is the vision of the entire Fortitude Group, as well as of Bitrock as it begins this new journey. The hope is that in a short time we can become an authoritative reference in a specific sector that is becoming more and more important day by day."


According to recent estimates by Expert Market Research (2022), investment in data management-related activities amounts to about $70 billion, one-fifth of the total spending used for infrastructure creation in 2021 according to Gartner. A fast-growing trend that is also reflected in the job market, where data scientist, data engineer and machine learning engineer are among the most sought-after figures globally. A similar scenario is expected for the future. According to McKinsey, by 2025 companies will base all kinds of decisions on data analysis, relying on real-time processing for increasingly precise insights.

Read More

Vision & Offering

This is the second part of our article which introduces Bitrock’s vision and offering in the Data, AI & ML Engineering area. The first part delimits the context where we focus and operate, while this one defines our vision and the proposition that follows.


Artificial Intelligence (AI) is shaping the future of mankind in nearly all industries, and it is driving advancements in heterogeneous fields such as big data, robotics, and Internet of Things. We have a strong conviction that AI will continue to be a driving force of innovation and progress in the future. As a company, we recognize the vital importance of AI and ML for organizations to not just survive but thrive in the market. 

That’s why we’re committed to providing our customers with the platform, tools, and expertise to harness the full potential of AI and help them create innovative solutions, helping them with operationalization of robust and reliable AI-based solutions, and we tailor our offering to meet the needs of customers in this field.

AI/ML is the last piece of the puzzle, the last stretch in a race. It needs strong pillars to build upon: a reliable and scalable data platform, designed to evolve and not just for latest delivery, where security and governance are central, with automatic tests, continuous integration/deployment in place. Indeed, for data even more so, the motto “garbage-in, garbage-out” is valid.

Data platforms should be tailored to the customer needs: there is no one-size-fit-all approach to data engineering problems, rather there are companies, customers, partners with different backgrounds and needs requiring different solutions. Paraphrasing Maslow's hammer, not everything is a nail and can be pounded using a hammer.

We believe in bespoke solutions for our clients, driving them through the intricacies of the current data landscape, and designing the platform better fitting their existing infrastructure and needs.

Our ambition is also to help our clients to define a clear and effective data strategy that aligns with the overall business objective. Organizations should define goals, processes and business targets; provide data governance framework and processes balancing security, privacy concerns and simplifying the process to discover, access and use data.

In order to provide the best services, we value our partnerships: as of today, we’re partners with Databricks, Confluent and HashiCorp.

Design Principles

Our solutions follow specific design principles, driving our choices and design:

Cloud first

Cloud first means prioritizing cloud over on-premise solutions. In other words, having to justify picking on-premise solutions rather than making a case for cloud ones.

We’re aware of the reluctance of some companies towards cloud solutions: nevertheless, nowadays there are still very few reasons to not embrace cloud. The advantages provided by the cloud are too many: faster time to market, easy scaling, no upfront license/hardware costs, lower operative cost. Basically, it allows us to outsource non-core processes and focus on what matters the most to the business.

ML/AI from the beginning

Machine Learning (ML) and Artificial Intelligence (AI) have witnessed a tremendous leap forward in the latest years, mainly due to the increased availability of computing resources (faster GPUs, bigger memories) and data. Artificial intelligence has reached or surpassed human-level performances in many complex tasks: autonomous driving is now a reality and social networks use ML profusely to detect harmful content and target advertisements, while generative networks such as OpenAI’s GPT-3 or Google’s Imagen could be game changers in the quest toward artificial general intelligence (AGI).

AI/ML is no longer the future to look at, it’s the present. 

Some organizations will use it as a competitive advantage over its competitors; others will see it as a homework to keep up and remain competitive on the market. For sure, no one can really afford to ignore it anymore (or maybe just monopolies and the public administration?). 

AI and ML have a central role in our vision and shape our architectural and technological choices.

In this context, continuously interpreting data, discovering patterns and making timely decisions based on historical and real-time data, the so-called Continuous Intelligence, will play a crucial role in defining the business strategies and will be one of the most widespread applications of machine learning. Indeed, Gartner estimates that, within 3 years, more than 50% of all business initiatives will require continuous intelligence and, by 2023, more than one-third of enterprises will have analysts practising decision intelligence, including decision modelling.

MLOps and AI Engineering

MLOps, or Machine Learning Operations, is a field in the ML community that is rapidly gaining momentum. It advocates for the need to manage the ML lifecycle following software-inspired best practices and DevOps philosophy. This approach aims to make ML-powered software reproducible, testable, and evolvable, ensuring that models are deployed and updated in a controlled and efficient manner. The importance of MLOps lies in the ability to improve the speed and reliability of ML model deployment, while reducing the risk of errors and improving the overall performance of models.

Data democratization

We’ve already underlined the importance of data democratization. Achieving it requires several key elements to be in place. Firstly, it requires a data culture where data is seen as a strategic asset valued and leveraged throughout the company. This requires a buy-in and commitment from top management.

A widespread access to data urges for a widespread adoption of more robust Data Governance solutions, with data discoverability features, to effectively manage complex data processes and make data available and usable by everybody in need. 

Making data accessible means also lowering the entry-barrier to it, and therefore providing more user-friendly platforms, which can be usable in autonomy, without advanced knowledge (the so-called self-service platform).

Data Mesh is an approach oriented towards large-scale environments, going in this direction. It addresses silos and bottlenecks in large companies and emphasises the decentralization of data ownership, moving data ownership to the business domain teams.

Data mesh is an approach which increases overall complexity and introduces new challenges in organizations adopting it, but it may help them when scalability and data silos effectively represent an entry barrier to a company-wide data usage.

Reference Architecture

We at Bitrock refrain from providing a one-size-fit-all solution; we rather provide a reference data architecture modelled after technology stacks used across multiple companies, updated with more recent innovations.

We focus on a Multimodal data processing architecture, specialized in AI/ML and operational use-cases, able to support analytical needs typical of data warehouses. As previously explained, this is an alternative to a Business Intelligence oriented alternative, based on data warehouses.

At the core of the system there are the concepts of data lake and data lakehouse.

A data lake is a centralised repository that allows you to store and manage all your structured and unstructured data at any scale. They are traditionally oriented towards advanced data processing of operational data and ML/AI. The data lakehouse concept adds to them a robust storage layer paired with a processing engine (spark, presto, …) to enhance it with data-warehousing capabilities, making data lakes suitable for analytical workloads too. 

There is growing recognition for this architecture, which is supported by a wide range of vendors, including Databricks AWS, Google Cloud, Starburst, and Dremio - and by data warehouses vendors like Snowflake too.

For a more detailed introduction to it, please refer to a previous article on our Blog (Data Lakehouse, beyond the hype).

Our processing engine of choice is Apache Spark, which is the de-facto standard for operational workloads - paired with the battle-tested and reliable Apache Airflow or Astronom, a SaaS version. In the orchestration world, Dagster or Prefect are alternatives to Airflow which are gaining a lot of traction. They foster a switch to a higher-level abstraction, from managing workflow to handling dataflows.

Spark is suitable for both batch and real-time workloads, but for real-time data processing Apache Flink and Kafka Streams may be good alternatives, especially for applications with more stringent latency requirements.

In the streaming world applied to AI and ML, another option is Helicon from Radicalbit, which is a solution aimed at reducing the gap between data scientist and data engineering using a no-code/low-code approach. There’s a revived interest in the no-code/low-code solutions, which are ringing new users (i.e. analysts and software developers) into the ML market, pushed by new low code ML solutions like Databricks AutoML, H2O, Datarobot, etc.

Quick data exploration may be achieved by either the use of ad-hoc query engines like Trino/Presto/Starburst/Databricks SQL or using notebooks like Jupyter or their managed versions.

The integration is the boring homework preceding the fun part. However, it represents the largest fraction of cost of most data projects, ranging from 20-30% on average up to 70% for some pessimistic cases.

From a technical point of view, the injection layer is quite diversified and it is generally shaped following the organization's data sources and infrastructure.

Traditionally, data is extracted from operational data sources and transformed before being loaded into a data warehouse, the so called ETL. Cheap cloud storage and the separation of storage and computing laid the foundation for a paradigm shift advocating the anticipation of the loading phase before the transformation phase (ELT). This pattern, actually not totally new for data lakes, shines as it removes the business logic from loading phase in the injection layer, making it possible to simplify the integration by outsourcing it.

Fivetran, along with Airbyte, Matillion and many others, are examples of ELT tools. Strictly speaking, ETL term usually is generally used more in data-warehousing context, however those integration tools are beneficial to lakes and lakehouse architectures too: Fivetran has recently become a partner of Databricks too for example.

In the injection layer, Confluent is also playing a more and more important role with Kafka Connectors, allowing it to pull (and push too) data from a variety of sources. The pair Kafka and CDC (Change Data Capture), with software like Debezium/Qlik/Fivetran, is a more and more common integration pattern used in this context.

The following figure, based on the unified data platform from Horowitz (Bornstein, Li, and Casado 2020), exemplifies our architecture, in particular the boxes highlighted in yellow:

Emerging Architectures for Modern Data Infrastructure


A central role in our platform is reserved to the operationalization of ML models and AI-based software.

MLOps, or Machine Learning Operations, is a rapidly growing field in the ML community that advocates for the need to manage the ML lifecycle following software-inspired best practices and DevOps philosophy. This approach aims to make ML-powered software reproducible, testable, and evolvable, ensuring that models are deployed and updated in a controlled and efficient manner. The importance of MLOps lies in the ability to improve the speed and reliability of ML model deployment, while reducing the risk of errors and improving the overall performance of models. Our idea of a generic platform for machine learning providing all the tools to operationalize ML lifecycle is best described by the following figure, based on (Bornstein, Li, and Casado 2020).

Emerging Architectures for Modern Data Infrastructure


We believe AI and ML are crucial for any organization and will be fundamental to succeed and thrive in the market.

Bitrock is committed to providing customers with the platform, tools, and expertise to harness the full potential of Artificial Intelligence (AI) and Machine Learning (ML) and operationalize it through AI engineering and MLOps.

We tailor our offering to meet the unique needs of our customers and believe in providing bespoke solutions for our clients. Our ambition is to jointly define a clear and effective data strategy that aligns with their overall business objectives. 

If you have any questions, doubts or just want to discuss data-related topics, please feel free to get in touch: we’d be more than happy to help or just chat!


Author: Antonio Barbuzzi, Head of Data, AI & ML Engineering @Bitrock

Read More

Vision & Offering

In this blog post we’re introducing Bitrock’s vision and offering in the Data, AI & ML Engineering area. We’ll provide an overview of the current data landscape, delimit the context where we focus and operate, and define our proposition.

This first part describes the technical and cultural landscape of the data and AI world, with an emphasis on the market and technology trends. The second part that defines our vision and technical offering is available here.

A Cambrian Explosion

The Data & AI landscape is rapidly evolving, with heavy investments in data infrastructure and an increasing recognition of the importance of data and AI in driving business growth.

Investment in managing data has been estimated to be worth over $70B [Expert Market Research 2022], accounting for over one-fifth of all enterprise infrastructure spent in 2021 according to (Gartner 2021).

This trend is tangible in the job market too: indeed, data scientists, data engineers, and machine learning engineers are listed in Linkedin’s fastest-growing roles globally (LinkedIn 2022).

And this trend doesn’t seem to slow down. According to (McKinsey 2022), by 2025 organizations will leverage on data for every decision, interaction, and process, shifting towards real-time processing to get faster and more powerful insights.

This growth is reflected also in the number of tools, applications, and companies in this area, and from what is generally called a “Cambrian explosion”, comparing this growth to the explosion of diverse life forms during the Cambrian period, when many new types of organisms appeared in a relatively short period of time. This is clearly depicted in the following figure, based on (Turk 2021).

A Cambrian Explosion

The Technological Scenario

Data architectures serve two main objectives, helping the business make better decisions exploiting and analyzing data - the so-called analytical plane - and provide intelligence to customer-facing applications - the so-called operational plane.

These two use-cases have led to two different architectures and ecosystems around them: analytical systems, based on data warehouses, and operational systems, based on data lakes.

The former, built upon data warehouses, have grown rapidly.  They’re focused on Business Intelligence, business users and business analysts, typically familiar with SQL. Cloud warehouses, like Snowflake, are driving this growth; the shift from on-prem towards cloud is at this point relentless.

Operational systems have grown too. These are based on data lakes; their growth is driven by the emerging lakehouse pattern and the huge interest in AI/ML. They are specialized in dealing with unstructured and structured data, supporting BI use cases too.

Since a few years ago, a path towards a convergence of both technologies has emerged. Data lake houses added ACID transactions and data-warehousing capabilities to data lakes, while warehouses have become capable of handling unstructured data and AI/ML workloads. Anyway, the two ecosystems are still quite different, and may or may not converge in the future.

In the ingestion and transformation sides, there’s a clear architectural shift from ETL to ELT (that is, data is firstly ingested and then transformed). This trend, made possible by the separation between storage and computing brought by the cloud, is pushed by the rise of CDC technologies and the promise to offload the non-business details to external vendors.

In this context Fivetran/DBT shine in the analytical world (along with new players like airbyte/matillion), while Databricks/Spark, Confluent/Kafka and Astronomer/Airflow are the de-facto standards in the operational world.

It is also noteworthy that there has been an increase in the use of stream processing for real-time data analysis. For instance, the usage of stream processing products from companies such as Databricks and Confluent has gained momentum.

Artificial Intelligence (AI) topics are gaining momentum too, and Gartner, in its annual report on strategic technological trends (Gartner 2021), lists Decision Intelligence, AI Engineering, Generative AI as priorities to accelerate growth and innovation.

Decision Intelligence involves the use of machine learning, natural language processing, and decision modelling to extract insights and inform decision-making. According to the report, in the next two years, a third of large organisations will be using it as a competitive advantage.

AI Engineering focuses on the operationalization of AI models to integrate them with the software development lifecycle and make them robust, reliable. According to Gartner analysts, it will generate three times more value than most enterprises not using it.
Generative AI is one of the most exciting and powerful examples of AI. It learns the context from training data and uses it to generate brand-new, completely original, realistic artefacts and will be used for a multitude of applications. It will account for 10% of all data produced by 2025 according to Gartner.

Data-driven Culture and Democratization

Despite the clear importance of data, it's a common experience that many data initiatives fail. Gartner has estimated that 85% of big data projects fail (O'Neill 2019) and that through 2022 only 20% of analytic insights will deliver business outcomes (White 2019).

What goes wrong? Rarely problems lie in the inadequacies of the technical solutions. Technical problems are probably the simplest. Indeed, since ten years ago, technologies have evolved tremendously fast and Big Data technologies have matured a lot. More often, problems are rather cultural.

It’s not a mystery that a data lake by itself does not provide any business value. Collecting, storing, and managing data is a cost. Data become (incredibly) valuable when they are used to produce knowledge, hints, actions. To make the magic happen, data should be accessible and available to everybody in the company. In other words, organizations should invest in a company-wide data-driven culture and aim at a true data democratization.

Data should be considered a strategic asset that is valued and leveraged throughout the organization. Managers, starting from the C-levels, should remove obstacles and create the conditions for people in need of data to access them, by removing obstacles, bottlenecks, and simplifying processes.

Creating a data culture and democratizing data allows organizations to fully leverage their data assets and make better use of data-driven insights. By empowering employees with data, organizations can improve decision-making, foster innovation, and drive business growth.

Last but not least, Big Data’s power does not erase the need for vision or human insight (Waller 2020). It is fundamental to have a data strategy in mind to define how the company needs to use data and the link to the business strategy. And, of course, a buy-in and commitment from all management levels, starting from the top. 

The second part of this article can be found here.


Author: Antonio Barbuzzi, Head of Data, AI & ML Engineering @ Bitrock

Read More
Data Lakehouse Cover

Why the Lakehouse is here to stay


The past few years witnessed a contraposition between two different ecosystems, the data warehouses and the data lakes - the former designed as the core for analytical and business intelligence, generally SQL centred, and the latter based on data lakes, providing the backbone for advanced processing and AI/ML, operating on a wide variety of languages ranging from Scala to Python, R and SQL.

Despite the contraposition between respective market leaders, thinking for example to Snowflake vs Databricks, the emerging pattern shows also a convergence between these two core architectural patterns [Bor20].

The lakehouse is the new concept that moves data lakes closer to data warehouses, making them able to compete in the BI and analytical world.

Of course, as with any emerging technical innovations, it is hard to separate the marketing hype from the actual technological value, which, ultimately, only time and adoption can prove. While it is undeniable that marketing is playing an important role in spreading the concept, there’s a lot more in this concept than just buzzwords.

Indeed, the Lakehouse architecture has been introduced separately and basically in parallel by three important and trustworthy companies, and with three different implementations. 

Databricks published its seminal paper on data lake [Zah21], followed by open sourcing Delta Lake framework [Delta, Arm20]

In parallel Netflix, in collaboration with Apple, introduced Iceberg [Iceberg], while Uber introduced Hudi [Hudi] (pronounced “Hoodie”), both becoming top tier Apache projects in May 2020.

Moreover, all major data companies are competing to support it, from AWS to Google Cloud, passing through Dremio, Snowflake and Cloudera, and the list is growing.

In this article, I will try to explain, in plain language, what a lakehouse is, why it is generating so much hype, and why it is rapidly becoming a centerpiece of modern data platform architectures.

What is a Lakehouse?

In a single sentence, a lakehouse is a “data lake” on steroids, unifying the concept of “data lake” and “data warehouse”.

In practice, the lakehouse leverages a new metadata layer providing a “table abstraction” and some features typical of data warehouses on top of a classical Data Lake.

This new layer is built on top of existing technologies in particular on a binary, often columnar, file format, which can be either Parquet, ORC or Avro, and on a storage layer.

Therefore, the main building blocks of a lakehouse platform (see figure 1.x), from a bottom-up perspective, are:

  • A File Storage layer, generally cloud based, for example AWS S3 or GCP Cloud Storage or Azure Data Lake Storage Gen2.
  • A binary file format like Parquet or ORC used to store data and metadata
  • The new table file format layer, Delta Lake, Apache Iceberg or Apache Hudi
  • A processing engine supporting the above table format, for example Spark or Presto or Athena, and so on.

To better understand the idea behind the lakehouse and the evolution towards it, let’s start with the background.

First generation, the data warehouse

Data Warehouses have been around for 40+ years now. 

They were invented to answer some business questions which were too computational intensive for the operational databases and to be able to join datasets coming from multiple sources.

The idea was to extract data from the operational systems, transform them in the more suitable format to answer those questions and, finally, load them into a single specialised database. Incidentally, this process is called ETL (Extract, Transform, Load).

This is sometimes also referred to as the first generation.

To complete the concept, a data mart is a portion of a data warehouse focused on a specific line of business or department.

The second generation, data lakes

The growing volume of data to handle, along with the need to deal with unstructured data (i.e. images, videos, text documents, logs, etc) made data warehouses more and more expensive and inefficient.

To overcome these problems, the second generation data analytics platforms started offloading all the raw data into data lakes, low-cost storage systems providing a file-like API. 

Data lakes started with Mapreduce and Hadoop (even if the name data lake came later) and were successively followed up by cloud data lakes, such as the one based on S3, ADLS and GCS.

Lakes feature low cost storage, higher speed, and greater scalability, but, on the other hand, they gave up many of the advantages of warehouses.

Data Lakes and Warehouses

Lakes did not replace warehouses: they were complementary, each of them addressed different needs and use cases. Indeed, raw data was initially imported into data lakes, manipulated, transformed and possibly aggregated. Then, a small subset of it would later be ETLed to a downstream data warehouse for decision intelligence and BI applications.

This two-tier data lake + warehouse architecture is now largely used in the industry, as you can see in the figure below:

Source: Martin Fowler

Problems with two-tiered Data Architectures

A two-tier architecture comes with additional complexity and in particular it suffers from the following problems:

  • Reliability and redundancy, as more copies of the same data exist in different systems and they need to be kept available and consistent across each other;
  • Staleness, as the data needs to be loaded first in the data lakes and, only later, into the data warehouse, introducing additional delays from the initial load to when the data is available for BI;
  • Limited support for AI/ML on top of BI data: business requires more and more predictive BI analysis, for example, “which customers should we offer discounts”. AI/ML libraries do not run on top of warehouses, so vendors often suggest offloading data back to the lakes, adding additional steps and complexity to the pipelines.
    Modern data warehouses are adding some support for AI/ML, but they’re still not ideal to cope with binary formats (video, audio, etc).
  • Cost: of course, keeping up two different systems increases the total cost of ownership, which includes administration, licences cost, additional expertise cost.

The third generation, the Data Lakehouse

A data lakehouse is an architectural paradigm adding a table layer backed up by file-metadata to a data lake, in order to provide traditional analytical DB features such as ACID transactions, data versioning, auditing, indexing, caching and query optimization.

In practice, it may be considered as a data lake on steroids, a combination of both data lakes and data warehouses.

This pattern allows to move many of the use cases traditionally handled by data warehouses into data lakes, it simplifies the implementation by moving from a two-tier pipeline to a single tier one.

In the following figure you can see a summary of the three different architectures.

Source: Databricks

Additionally, lakehouses move the implementation and support of data warehouses features from the processing engine to the underlying file format. As such, more and more processing engines are able to capitalise on the new features. Indeed most engines are coming up with a support for lake houses format (Presto, Starburst, Athena, …), contributing to the hype. The benefits for the users is that the existence of multiple engines featuring data warehouses capabilities allows them to pick the best solution suitable for each use case. For example, exploiting spark for more generic data processing and AI/ML problems, or Trino/Starburst/Athena/Photon/etc for quick SQL queries.

Characteristics of Data Lakehouse

For those who may be interested, let’s dig (slightly) more into the features provided by lake houses and on their role.


The most important feature, available across all the different lakehouse implementations, is the support of ACID transactions.

ACID, standing for atomicity, consistency, isolation, durability, is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps.

Indeed, cloud object stores haven't always provided strong consistency, so stale reads were possible - this is called eventual consistency.

Anyway, there’s no mutual exclusion guarantees, so that multiple writers can update the same file without external coordination and there’s no atomic update support across multiple keys, so that updates to multiple files may be seen at different times.

Lakehouse implementations guarantee ACID transactions on a single table, despite the underlying used storage and regardless of the number of files used underlying.

This is achieved in different ways in the three major players, but generally speaking, they all use metadata files to identify which files are part of a table snapshot and some WAL-like file to track all the changes applied to a table.

Note that there are alternative ways to provide ACID consistency, in particular by using an external ACID consistent metadata storage, like an external DB. This is what HIVE 3 ACID does for example, or Snowflake. However, not having to depend on an external system removes a bottleneck, a single point of failure, and allows multiple processing engines to leverage on the data structure.


Automatic partitioning is another fundamental feature, used to reduce queries’ process requirements and simplify table maintenance. This feature is implemented by partitioning data into multiple folders and while it can be easily implemented at application level, this is easily provided transparently by the lakehouse. Moreover some lakeshouses (see Iceberg) can support partitioning evolution automatically.

Time Travel

Time Travel is the ability to query/restore a table to a previous state in time.

This is achieved by keeping metadata containing snapshot information for longer time periods.

Time travel is a feature provided by traditional DBs oriented to OLAP workloads too, as this feature may be implemented relying on write-ahead-logs. Indeed it was available also in Postgres DB for example, until version 6.2, SQL Server. The separation between storage and processing makes this feature easier to support in lake houses, relying them on cheap underlying storage.

Of course, to reduce cost/space usage, you may want to periodically clean up past metadata, so that time travel is possible up to the oldest available snapshot.

Schema Evolution and Enforcement

Under the hood, Iceberg, Delta and Hudi rely on binary file formats (Parquet/ORC/Avro), which are compatible with most of the data processing frameworks.

Lakehouse provides an additional abstraction layer allowing in-place schema evolution, a mapping between the underlying files’ schemas and the table schema, so that schema evolution can be done in-place, without rewriting the entire dataset.

Streaming support

Data Lakes are not well suited for streaming applications for multiple reasons, to name a few cloud storages do not allow to append data to files for example, they haven’t provided for a while a consistent view on written files, etc. Yet this is a common need and, for example, offloading kafka data into a storage layer is a fundamental part of the lambda architecture.

The main obstacles are that object stores do not offer an “append” feature or a consistent view across multiple files.

Lake Houses make it possible to use delta tables as both input and output. This is achieved by an abstraction layer masking the use of multiple files and a background compassion process joining small files into larger ones, in addition to “Exactly-Once Streaming Writes” and “efficient log tailing”. For details please see [Arm20].

The great convergence

Will lake house-based platforms completely get rid of data warehouses? I believe this is unlikely. What’s sure at the moment is that the boundaries between the two technologies are becoming more and more blurred.

Indeed, while data lakes, thanks to Deta Lake, Apache Iceberg and Apache Hudi are moving into data warehouses territory, the opposite is true as well.

Indeed, Snowflake has added support for the lakehouse table layer (Apache Iceberg/Delta at the time of writing), becoming one of the possible processing engines supporting the lakehouse table layer.

At the same time, warehouses are moving into AI/ML applications, traditionally a monopoly of data-lakes: Snowflake released Snowpark, a AI/ML python library, allowing to write data pipelines and ML workflow directly in Snowflake. Of course it will take a bit of time for the data science community to accept and master yet another library, but the directions are marked.

But what’s interesting is that warehouses and lakes are becoming more and more similar: they both rely on commodity storage, offer native horizontal scaling, support semi-structured data types, ACID transactions, interactive SQL queries, and so on.

Will they converge to the point where they’ll become interchangeable in the data stacks? This is hard to tell and experts have different opinions: while the direction is undeniable, differences in languages, use cases or even marketing will play an important role in defining how future data stacks will look like. Anyway, it is a safe bet to say that the lakehouse is here to stay.


Author: Antonio Barbuzzi, Head of Data, AI & ML Engineering @ Bitrock

Read More
Mobile Application Development

Interview with Samantha Giro, Team Lead Mobile Engineering @ Bitrock

A few months ago, we decided to further invest in Bitrock’s User-Experience & Front-end Engineering area by creating a vertical unit dedicated to Mobile Application Development.

The decision stemmed from several inputs: first of all, we perceived a high demand from organizations looking for Professionals specialized in mobile app development that could support them in their digital evolution journey. 

Secondly, we already had the chance to implement successful projects related to mobile app development for some of our clients, primarily in the fintech and banking sectors.

Furthermore, since we are a team of young entrepreneurs and technicians continuously looking for new opportunities and challenges, we deeply wanted to explore this area within Front-end engineering, which we found extremely interesting and could perfectly fit in the 360° technology consulting approach offered by Bitrock.

Creating a unit specifically dedicated to mobile programming was thus a natural step towards continuous improvement and growth.

We are now ready to delve deeper into the world of mobile application development by asking a few questions to Samantha Giro, Team Lead Mobile Engineering at Bitrock. 

What is mobile application development? And what are the main advantages of investing in an app?

Mobile application development is basically the set of processes and procedures involved in writing software for small, wireless computing devices, such as smartphones, tablets and other hand-held devices.

Mobile applications offer a wide range of opportunities. First of all, they are installed on mobile devices - smartphones, iPhones, tablets, iPads - that users can easily bring with them, anywhere they go. 

It is thus possible to use them in work environments, such as manufacturing industries (just think about when workers control loading and unloading operations from a single device), to manage sales workflows, or events. Many solutions work even offline, allowing people to use them continuously and without interruption.

Moreover, mobile apps give users the opportunity to interact with the product readily and effectively. Through a push-notification campaign, for instance, it is possible to activate typical marketing affiliation mechanisms. These allow companies to run advertising campaigns and retain clients through continuous usage - for example by inviting users to discover new product functionalities or other related solutions.

Mobile technologies can also be associated with other external hardware solutions through bluetooth or Wifi connection, thus widening the range of usage possibilities.

The sensors and hardware of the device, such as the integrated camera, increase the number and type of functionalities that a product can perform. This brings great advantages and comfort to our daily lives: for example, if you need the digital copy of a signed paper, with the camera of your mobile device you can easily scan it and have the document online in real-time, ready to be shared with other users.

Last but not least, the interaction and integration with AI systems, augmented reality and vocal assistants grant easier access and an up-to-date user experience. 

Users can for instance “rule” their houses remotely: as we all know, nowadays we can easily turn on the lights of our house or activate the warning system simply by accessing an app on our mobile device. 

What types of mobile applications are there?

There are different ways to develop a product dedicated to mobile: native, hybrid and web applications. 

Native mobile app development implies the engagement of resources dedicated to devices that use Android as an operating system and other resources that use Apple systems. There are no limits to the customization of the product, apart from those defined by the operating systems themselves (Android and iOS). 

Native apps have to be approved by the stores before being published, and require a different knowledge base, since each platform has its specific operating system, integrated development environment (IDE) and language that must be taken into account.

They imply higher costs in terms of update and maintenance than hybrid apps since they usually require at least two different developers dedicated to the product. 

Native apps can take full advantage of the hardware with which they interact, and they are usually more performing and fast compared to their respective hybrid versions. Also the final dimension benefits from the fact they do not have any framework that converts them. Compatibility is always granted over time, net of updates that need to be executed following the guidelines issued by the parent company.

Thanks to specific frameworks, hybrid mobile app development allows the creation of applications for both operating systems with one, shared code, thus reducing maintenance costs. Development is subjected to the limitations of the framework and to its updates, which must be frequent and follow the native ones. For complex functionalities they still need a customization of the native part. Lastly, they must undergo the approval from the stores before being published. 

The most popular development frameworks are React Native and Flutter.Based on Javascript, React Native is widely used and known by many web developers. It is highly compatible with the development of hybrid applications, and it is highly fast and performing. Its nature as an interface for native development

makes applications less performing compared to the purely native ones; nevertheless, it is a good product since it facilitates the sharing of the code also for web applications. The community is wide, and the high number of open-source libraries makes up for any functionality you may need to integrate. It has two different development modes that allow to create applications entirely in Javascript or with the possibility of customization on native.

Flutter is a more recent product compared to React Native, and it is based on the Dart language developed by the Google team. The ease of the language and of the tool is convincing more and more developers to use it. Differently from React Native,  Flutter’s components do not depend on native: for this reason, when the operating systems are updated, the product keeps on functioning well. The plugins for specific functionalities, such as localization and maps, are created and managed from the Google team, which grants truthfulness, compatibility and constant update. Dart, indeed, is still a little-known language compared to Javascript, which requires a very  specific knowledge in the field.

Last but not least, there are web applications, which are similar to mobile apps, but are developed by using web technologies. They are not installed through stores, but as a website: the user can add a short link in the mobile screen and launch the web application. In this case, offline usability is limited and not granted. Moreover, it is not possible to take full advantage of the hardware resources. 

How is a mobile app created? And what are the most widely used programming languages?

The development of a mobile app generally starts with the study of its key functionalities and an analysis of what the customer needs, along with the creation of a dedicated design. Another preliminary step consists in working on the user experience: for this reason, a close collaboration with an expert of UI/UX is essential.

When all this is defined, the team decides what the best technologies and solutions to develop that specific application are. Then, developers write code, recreate the design and functionality, and run some tests to prove that everything works properly. 

Once the whole package is created through the system and approved by the customer, it is published on the different stores (Google or Apple) - of course, only if that’s the case. 

Let’s now have a quick look at the main programming languages.

If we’re talking about hybrid development, the main tools are Ionic (JavaScript), Flutter (Dart) and React Native (JavaScript). As for native apps, the top iOS development language is Swift or, alternatively, the previous language Objective-C. While the most popular Android development language is Kotlin, some still use Java. Of course, developers must rely on an IDE. Although there are many other alternatives, the above mentioned can be considered as the most widely used.

What are the main market differences compared to website development?

Let me start by saying that websites will never die, for the simple reason that when you need information rapidly and for a very specific circumstance, the product or service’s website is always available. However, websites cannot take advantage of all the instruments and hardware available for a mobile application. 

Mobile apps grant the memorization of a certain amount of data - this is something that a website cannot always provide (unless you have your own account and you always work online). 

They can access information quickly from hardware such as an accelerometer, a gyroscope and others. And other instruments enable the adoption of strategies for customers’ retention (even though nowadays there are push notifications for websites, too). With a mobile app, it is thus possible to grow customer loyalty with specific features and functionalities.

Furthermore, mobile applications are specifically designed to grant ease of use, while websites traditionally provide a different kind of usability. 

Most of the time, web and mobile can perfectly work together (see, for example, what happens with Amazon: users can buy an item via website or by using the mobile app); other times, a mobile app can “overcome” its web counterpart, especially when users have to manage specific things or data, or use specific technologies. For example, you will hardly do biometric authentication via a website.

At Bitrock, we always put clients’ needs first: the creation of the brand-new Mobile Application Development unit within our User Experience & Front-end Engineering team has the goal to widen our technology offering in the field. In this way, we can provide a broad range of cutting-edge, versatile and scalable mobile technology solutions to a variety of markets and businesses. 

We always collaborate closely with our clients to plan, create, and deploy solutions that are tailored to their specific requirements and in line with their contingent needs.

If you want to find out more about our proposition, visit our dedicated page on Bitrock’s website or contact us by sending an email!

Thanks to Samantha Giro > Team Lead Mobile Engineering @ Bitrock

Read More
Apache Airflow


Apache Airflow is one of the most used workflow management tools for data pipelines - both AWS and GCP have a managed Airflow solution in addition to other SaaS offerings (notably Astronomer).

It allows developers to programmatically define and schedule data workflows and monitor them using Python. It is based on directed acyclic graphs (DAGs) concept, where all the different steps (tasks) of the data processing (wait for a file, transform it, ingest it, join with other datasets, process it, etc.) are represented as graph’s nodes.

Each node can be either an “operator”, that is a task doing some actual job (i.e. transform data, load it, etc.), or “sensors”, a task waiting for some event to happen (i.e. a file arrival, a Rest api call, etc.).

In this article we will discuss sensors and tasks controlling external systems and, in particular, the internals of some of the (relatively) new most interesting features, Reschedule sensors, SmartSensors and Deferrable Operators.

Sensors are synchronous by default

Sensors are a special type of Operators designed to wait for an event to occur and then succeed so that their downstream tasks can run.

Sensors are a fundamental building block to create pipelines in Airflow; however, historically, as they share the Operator’s main execution method, they were (and by default still are) synchronous. 

By default, they busy-wait for an event to occur consuming a worker’s slot.

Too many “sensors” busy waiting may, if not well dimensioned, use all the worker’s slots and bring to starvation and deadlocks (if TaskExternalSensor were used for example). Even when enough slots are available, workers may be hogged by tons of sleeping processes.

Working around it

The first countermeasure is to confine sensors in separate pools. This only partially limits the problems.

A more efficient workaround exploits the airflow’s ability to retry failed tasks. Basically, the idea is to make unmet sensor fail if sensing conditions are unmet, and set the sensor’s number of retries and retry delay to account for it, in particular number_of_retries * retry_delay should be equal to the sensor’s timeout. This frees the worker’s slot, making it possible to run other tasks.

This solution works like a charm, it doesn’t require any Airflow code change.

Main drawbacks are:

  • bugs and errors in the sensors may be masked by timeouts, which however may be mitigated by properly written unit tests.
  • Some overhead is added to the scheduler, as such polling intervals may not be too frequent - and a separate process is spawned.

Reschedule mode

Sensor’s reschedule mode is quite similar to the previous workaround.

In practice, sensors have a new “mode” attribute which may have two values, “poke”, the default one, providing the old synchronous behaviour, and “reschedule”.

When mode is set to reschedule:

  • BaseSensorOperator’s “execute” method raises an AirflowRescheduleException when the sensing condition is unmet, containing the reschedule_date
  • This exception is caught by the TaskInstance run method, which persists it in the TaskReschedule table along with id of the task associated with it and updates the task state to “UP_FOR_RESCHEDULE
  • When the TaskInstance run method is called, if it is in “UP_FOR_RESCHEDULE” state, the task is run if the reschedule_date allows it

This approach improves over the above mentioned workaround as it allows to distinguish between actual errors and unmet sensor condition, otherwise shares the same limitations, and lightweight checks are quite resource intensive.

Smart sensors

In parallel to the “reschedule” mode, a “different” approach was proposed in AIP-17, called Smart Sensor, merged in release 2.0.0 and already deprecated and planned to be removed in the next Airflow 2.4.0 release (they’re not in the main branch anymore).

All smart sensor poke-contexts are serialised in the DB and picked up by a separate process, running in special built-in smart sensor DAGs.

I won’t add any additional details on them, as they’ve been replaced by Deferrable Operators.

Smart Sensor were a sensible solution; however, despite considerable changes in airflow code, they have two main pitfails:

  • No High Availability support
  • Sensor’s suspension is a subset of a more generic problem, suspension of tasks - this solution can’t be easily extended to it.

For referece, please refer to AIP-17 here and here.

Deferrable Operators

Deferrable Operators, introduced in AIP-40, are instead a more generic solution: they’re a superset of Smart Sensors, supporting broader Task suspension, and built from the design to be highly-available. Therefore, no surprise they’ve replaced SmartSensors.

Albeit quite elegant, this solution is slightly more complex. To fully understand it, let’s start from a  use case to grasp the solution details.

A typical airflow use-case is to orchestrate jobs running on external systems (for example, a Spark Job runs on Yarn/EMR/…). More and more frequently, those systems offer an asynchronous API returning a job id and a way to poll its status.

Without Deferrable Operators, a common way to implement it is through a custom operator triggering the job in the execute method, getting the job id, and polling for it until it finishes, in a busy-wait loop. One may be tempted to use two separate operators, one for the “trigger” and one for the “poll” calls, anyway this would invalidate the airflow retry mechanism.

Deferrable Operators solve this problem and add to the tasks the ability to suspend themselves. If the polling condition is unmet, task execution may be suspended and resumed after a configurable delay.
Suspension of tasks is achieved by raising a TaskDeferred exception in a deferrable operator. A handy “defer” method is added to the BaseOperator to do it. This exception contains the following information:

  • The function to resume, along with the needed arguments.
  • A Trigger object, containing the details on when to trigger the next run.

The function arguments are a simple way to keep the task state, for example the job_id of the triggered spark job to poll.

Most useful trigger objects are generally time-based, and most commons are already provided by airflow: DateTimeTrigger, triggering at a specific time, and TimeDeltaTrigger, triggering after a delay, so it is generally not necessary to implement them.

Triggers and Triggerer implementation leverages Python’s async library introduced with Python 3.5 (Airflow 2.0.0 requires Python version 3.6 or higher). A trigger extends a BaseTrigger and provides an async-compatible “run” method, which yields control when idle. 

Time based trigger are implemented in a while loop using await asyncio.sleep rather than thread.sleep.

This allows them to coexist with thousands of other Triggers within one process.

Note that, to limit the number of triggers, there is a one-to-many relationship between Triggers and TaskInstances, in particular the same trigger may be shared by multiple tasks.

Let’s see how everything is orchestrated.

When a TaskDeferred exceptions is caught in the run method of TaskInstance, these steps are followed:

  • TaskInstance state is updated to DEFERRED.
  • The method and the arguments to resume the execution of the task are serialised in the TaskInstance (and not in the Trigger), in the next_method and next_kwargs columns table. Task instance is linked to the trigger through a trigger_id attributed to TaskInstance.
  • The Trigger is persisted in the DB, in a separate table, Trigger.

A separate airflow component, the Triggerer,  forming a new continuously-running-process part of an Airflow installation, is in charge of executing the triggers.

This process contains an async event loop which drains all the triggers serialised in the DB and creates all the not-yet-created triggers, running the coroutines concurrently. Thousands of triggers may run at once efficiently.

A trigger does some lightweight check. For example, the DateTimeTrigger verifies that the triggering date is passed; if so, it yields a “TriggerEvent”. 

All events are handled by the Triggerer, and for each TriggerEvent all the corresponding TaskInstance to schedule are picked up, their state is updated from DEFERRED to SCHEDULED.
The TaskInstance run method has been updated to check if the task should resume (it checks if “next_method” is set); if so, it resumes it, otherwise it proceeds as usual.

The availability of the system is increased by allowing multiple Triggerer to run in parallel - this is implemented adding to each Trigger the id of the triggerer in charge of it - and adding a heartbeat to each triggerer, serialised in the DB. Each trigger will pick up only assigned triggers. 

Author: Antonio Barbuzzi, Head of Data Engineering @ Bitrock

Read More