Streamlining Nithio’s Data Pipeline Using Databricks

 

A case study coauthored by:
Andrei Ismail, George Voicu and Emma Ungureanu.

Nithio is an AI-enabled energy financing platform whose main objective is to achieve universal energy access. Standardizing credit risk analysis, Nithio catalyzes billions of dollars of capital to address climate change and increase access to universal energy.

Nithio needed to scale their data infrastructure to better manage the lifecycle of their AI capabilities and to allow their data science team to focus on developing valuable insights. The company partnered with Vitamin Software to align internal processes with industry best practices and improve their data pipeline.

 
Nithio Logo
 
 

Vitamin Software identified the main challenge typical to this vertical and applied data engineering best practices that resulted in easier onboarding of customer data and extraction of insights, without compromising the accuracy of Nithio’s machine learning models.

 

About Nithio

Nithio is a company that supports the fight for universal energy access. Hundreds of millions of people worldwide still live without power, and most of this population is on the African continent. Nithio offers financing solutions to operators that produce and distribute renewable, off-grid energy systems to households across Africa.

To scale energy access in a sustainable, environmentally conscious manner that is adapted to climate change, Nithio helps investors understand risks and opportunities in the sector. Combining localized geospatial population data, customer repayment data, and artificial intelligence (AI), the Nithio Risk Analytics Engine offers a standardized and purpose-built view of credit risk. The Portfolio Portal tool allows investors and grant providers to assess end users’ repayment ability, forecast cash flows, and track portfolio health and social impact in real-time.

“Investment to the off-grid solar sector has stalled over the last five years and, at this rate, we will not achieve universal energy access. […] Nithio built an AI-enabled solution to solve this challenge.”

nithio.com/about

 

Customer Challenge

Everything Nithio does is backed by data. Their team of data scientists combine hyper-localized consumer and geospatial data with customized AI methods to generate actionable intel for impact-minded investors.

Nithio’s main challenge was enabling their data scientists to focus on what they do best – analyze and model data, then interpret the result – and not be distracted by getting a hold of data. Because the work of a data scientist is only as good as the data they work with, Nithio needed a technical partner that would help improve the way data is acquired, cleaned, stored, and transferred between systems.

Aiming to expand their machine learning models and deliver insights to their clients in a secure, efficient way, Nithio first had to align its internal processes with industry best practices by adding structure into their data flows, building reliable data pipelines, and bringing all the data into one place.

Just like you sometimes use GPS in your own city to find the fastest route, Nithio’s data science team needed support to connect the data scientist and the data source in the most efficient, scalable way possible.

 

Why Nithio Chose Vitamin

They say you should do one thing and do it well. In a construction team everyone technically knows how to build that bridge, but responsibilities are split between an architect, an engineer, a project manager, and a contractor for efficiency. Similarly, Nithio’s data scientists could very well build the necessary infrastructure and integrations, but that would have taken focus away from their core activity. Vitamin Software was brought in to help Nithio for one main reason: we are good at data engineering.

First, we at Vitamin know what such a system should look like, end-to-end. We design, build, and operate systems that collect, store, and analyze data at scale, preparing it for further processing by data analysts and scientists. Although it’s a relatively new and dynamic discipline, our team is familiar with the data engineering best practices and tools that would best serve Nithio.

Second, we have experience with data integration. Vitamin data engineers have connected Troy Medicare’s system to their partners’ using data exchange standards like EDI and FHIR. Not only did we retrieve data from disparate sources for Troy Medicare, we also cleaned, transformed, and stored it in a unified data warehouse.

Last but not least, we’ve successfully collaborated with other data science teams from organizations like Fraym. We understand the activity of a data scientist and we are mindful of the challenges they face. We therefore know how to ask the right questions, ease pain points, and set them up for success.

 

Partner Solution

Nithio required technical expertise for scaling their data infrastructure to support both the initial sales cycle and continuous AI R&D initiatives. Specifically, the goal was to quickly onboard customer data and extract insights, while maintaining reliable and accurate machine learning models to ensure the relevancy of those insights.

A big contributor to the success of the project was that our engineers recognized a particular challenge in this vertical early on. Simply applying proven best practices from the product development and software development world would not be enough to empower Nithio’s data science team to create a successful data intelligence product. This was recently recognized as a common problem by the industry and ModelOps promises to be the methodology to overcome it. To ensure that Nithio’s full AI capabilities were deployed into production and would truly bring value to the company, we applied the ModelOps framework for rapidly and iteratively moving models through the analytics life cycle.

This work entailed structuring and adapting the main data pipeline and the processes around it with the goal of consolidation in Databricks, a PaaS tool for data, analytics, and AI. Databricks leverages mainstream cloud resources, including AWS, offering Nithio’s data science team a familiar yet powerful development environment that allowed Nithio to immediately streamline and standardize their day-to-day activities. To manage the ML lifecycle, including experimentation, deployment, and tracking, we used MLFlow. To be specific:

  • We transitioned Nithio’s mixed R / Python codebase to a standardised model training and data analytics pipeline, and we orchestrated both pipelines in Databricks;

  • We leveraged Databricks-managed clusters to build a custom remote development environment tailored for the specific needs of the data science team;

  • We introduced the industry best practices that we consistently use for all our customers: a documented individual contributions flow, separate development, testing, and production environments, a testing flow, and package management;

  • We helped Nithio port their machine learning codebase to MLflow, where the MLflow model registry is used to store trained machine learning models, enabling the data science team to easily manage their models' lifecycle.

The main advantage of the solution implemented by Vitamin Software is that Nithio’s operations are more streamlined. Various stages of the data pipeline can be now combined together and scheduled as jobs running on managed clusters with tailored resources to keep costs down. Reduced delivery time coupled with infrastructure, operations, and maintenance cost efficiency help Nithio navigate the initial sales phase and provide a solid foundation for the transition to economies of scale.

 

Results and Benefits

A 2019 Kaggle survey uncovered that almost half of data scientists spend a significant amount of time building and operating data infrastructure. Considering this is not the core activity of a data scientist, it’s safe to say that data engineering challenges are a major pain point. By partnering with Vitamin Software, Nithio helped their data science team avoid this pain.

By helping the Nithio team pick the right technology, set up their data processing pipeline, and streamline the usage of this pipeline, Nithio data scientists now spend less time on infrastructure-related tasks and more time developing algorithms, running experiments, and training ML models.


portrait of Madeleine Gleave

"Working with Vitamin has enabled us to streamline our data pipeline from a repetitive, script-based approach to a truly scalable system. We’ve reduced the amount of time it takes to onboard a new client’s data from 4+ weeks to as little as three days, the majority of which is now focused on substantive data analysis rather than infrastructure set-up."

— Madeleine Gleave, Chief Data Scientist at Nithio


 

Our engineering team bridged the gap between data scientist and data source in an ideal way. Nithio’s highly qualified data science team no longer worries how to pull, clean, and store data, and can channel all their efforts into creating the valuable insights that enable Nithio to achieve their goals.

 

Next Steps

The partnership between Nithio and Vitamin Software has gone a long way, and our skills continue to be at Nithio’s service. Beyond data infrastructure, we at Vitamin have experience building web-based tools that deliver visual representations of data and allow users to interact with data, and we’ve worked with Nithio to develop and launch the Portfolio Portal. Future plans include continuing to refine the front-end delivery of Nithio’s insights to be even more user-friendly, and increasing the speed and automation of data onboarding to make Nithio’s MLOps even more efficient.

 

About Vitamin Software

A product development agency, Vitamin Software specializes in taking over your product and supporting it until it reaches Product-Market Fit and beyond. The customer leads and we drive the product to follow.

Vitamin Software is an Amazon Web Services (AWS) Partner Network Select Consulting Partner and is Cyber Essentials Plus certified. We specialize in building, operating, and maintaining robust and secure products.

Vitamin Software builds bridges: From problem to solution, from difficult to feasible, from impossible to outstanding.

Want to know how?