top of page

Revolutionizing data architecture: how we built a self-managing data platform at Akua

Writer's picture: Juan Jose BehrendJuan Jose Behrend

Have you ever felt that traditional data solutions are like using a rocket launcher to light a candle? Yes, we have been there too. At Akua, we have been through the journey of complex data architectures - from data warehouses powered by Pentaho (a complete data integration and Business Intelligence suite including ETL, reporting and visualization) to near real-time data lakes built on:

  • Apache Spark: A distributed processing engine that enables processing large volumes of data in memory

  • Amazon EMR (Elastic MapReduce): AWS managed service for running frameworks such as Spark, Hive, and others

  • Delta Lake: A storage layer that brings ACID transactionality to data lakes

  • Amazon Kinesis: Service for real-time data ingestion and processing

  • Amazon S3: Scalable Storage

While these are powerful tools and in many cases make sense to implement, they often felt like taking a Formula 1 car to the grocery store - impressive but impractical for our current needs.

The challenge: finding our "levitating train"

As Mathias Parodi, our Head of Engineering, often says, we needed a solution that “levitates like a train . Something that would effortlessly glide through our data needs while keeping its feet on the ground. Our requirements were crystal clear:

  • Maintenance should be a piece of cake, not a nightmare

  • Perfect fit for our current scale and 2-3 year horizon

  • Economical but lightning fast

  • Delivering immediate value to the business

The solution: embracing simplicity with power

The foundation: PostgreSQL as our data warehouse

Instead of jumping on the latest data fad bandwagon, we took a step back and looked at traditional PostgreSQL with fresh eyes. This battle-tested database became our data warehouse, starting with a “raw data” layer (data exactly like the source or operational databases), with real-time replication via AWS DMS (Database Migration Service, a service that allows data to be replicated in real-time between different databases) for relational databases and DynamoDB Streams (a service that captures real-time changes to DynamoDB tables) along with AWS Lambda (a serverless service that runs code in response to events) for our NoSQL data.

This approach allowed us to:

  • Real-time replication with latency less than 1 second

  • Zero data loss thanks to DMS checkpoint mechanisms

  • In-flight transformations using DMS mapping capabilities

  • Serverless processing that automatically scales with load


Handling semi-structured data like a champ


One of our biggest successes? Leveraging PostgreSQL’s JSONB capabilities (a binary data type that stores JSON documents in an optimized way) to handle DynamoDB data without breaking a sweat. Advantages of the JSONB type include:

  • Compressed and efficient storage

  • GIN indexing for fast searches within JSON

  • Native operators for querying and manipulating JSON data

  • Full support for standard SQL queries

The performance? Simply mind-blowing. In our tests, we got:

  • Queries with predicates on JSON fields in less than 50ms

  • Aggregates over millions of records in seconds

  • Efficient joins between structured and semi-structured data This approach gave us the flexibility of NoSQL with the reliability of a traditional warehouse.


Infrastructure as Code: The Magic of IDP


Remember our Internal Development Platform (IDP)? Our platform team put on their data engineering hats and did some magic. In just two weeks, we had a fully automated real-time data replication system. No manual interventions – pure good old automation.


AI-powered data modeling


This is where things get interesting. We combined our carefully crafted entity-relationship diagrams with AI to design a future-proof data mart of facts and dimensions. But we didn’t stop there – we built an AI-powered system that automatically detects and adapts to new tables. It’s like having a data model that evolves on its own and grows with your business.


Orchestration: Keep it simple, keep it real-time


While tools like Airflow are great, we chose a different path. Using n8n (an open source automation platform that allows you to create complex workflows using a visual interface), we built a robust orchestration system. N8n gives us:

  • Over 200 pre-built integrations with popular services

  • Ability to run custom code in JavaScript

  • Visual interface for designing and debugging flows

  • Webhooks and scheduled triggers

  • Integrated queuing system for asynchronous processes

  • Error handling and automatic retries

With these capabilities, we create a robust orchestration system that includes:

  • Real-time data synchronization

  • Slack notifications for data quality issues

  • Automated monitoring and alerts

  • Integrated data quality controls



Plataforma de datos de Akua

The result: a data platform that just works


Our final architecture delivers:

  1. A clean and well-structured data mart with:

    • Fact tables for payments

    • Dimension tables for customers, merchants, payment instruments

    • Complete denormalization for ultra-fast queries

  2. Zero ETL integration with Amazon Redshift (a cloud data warehouse service optimized for analytics), letting it do what it does best - ultra-fast columnar queries without the overhead of joins. Redshift provides us with:

  3. Columnar storage that dramatically reduces I/O on analytical queries

  4. Massively parallel processing (MPP) for distributing queries

  5. Automatic compression based on data type

  6. Separate scaling of compute and storage

  7. Ability to query data directly in S3 (Redshift Spectrum)

  8. Automatic query optimization and maintenance

With Zero ETL functionality, we achieve:

  • Automatic replication from PostgreSQL to Redshift

  • Near real-time synchronization (less than 2 minutes latency)

  • No need for additional ETL pipelines

  • Consistency guaranteed between source and destination


Visualization: Metabase for victory

After years of struggling with “world-class” (read: complicated and expensive) BI tools, we found our perfect match in Metabase, an open source Business Intelligence platform designed to be easy to use without sacrificing power. Why?

Metabase gives us enterprise features without the traditional complexity:

  • Query engine that allows you to create analysis without knowing SQL

  • Ability to write direct SQL queries when needed

  • Smart cache system for frequent queries

  • Granular access control at table and row level

  • SSO and LDAP authentication

  • Full API for automation and integration

  • Embedded Analytics via SDK

  • Creating dashboards with one click

  • Native support for SQL queries

  • AI-powered features that are constantly improving


The impact: from zero to live in 30 days


In just one month, we built:

  • A self-managed data warehouse

  • Automatic ingestion for new data sources

  • Self-evolving data mart

  • Sub-100ms response times on millions of records


The secret? Simplicity


This wasn’t just another technology implementation – it was rethinking how modern data platforms should work. By choosing simplicity over complexity, automation over manual processes, and practical solutions over trendy technology, we built something that truly serves our needs.

The result? A data platform as agile as a startup needs to be, but robust enough to handle enterprise-scale data. It's living proof that sometimes the best solutions aren't the most complex - they're the ones that fit your needs perfectly while leaving room to grow.

Special thanks to the amazing team at Akua, most notably Luispe Toloy and German Yepes and our extended family of consultants who helped shape this elegant, efficient and remarkably simple solution. This is what happens when experience meets innovation, and we couldn’t be more proud of the result.



8 views0 comments

Recent Posts

See All

Comments


bottom of page