September 2024

Think about RIOTS

CRUD describes what happens to stored records. RIOTS helps developers trace how data moves and changes between those operations.

Career & Craft
CRUD describes what happens to stored records. RIOTS helps developers trace how data moves and changes between those operations.

I consider myself a teacher, and teaching has taught me that one explanation is rarely enough. A concept that feels obvious through one lens may not connect with the person hearing it. Sometimes the answer is not more repetition. It is a different way of framing the same idea.

Most developers learn early that nearly every business application is a CRUD application.

Create. Read. Update. Delete.

That is broadly true, but it can be too compressed to help a junior developer reason through a system. CRUD tells you what happens to stored records. It does not always make the journey of the data obvious.

Where did the data come from? What shape crossed into the process? What changed? What came out? Where did the result go?

I use a second mnemonic to make that journey easier to see:

  • Retrieval: Acquire data from a source.
  • Input: Pass that data across a boundary into a process.
  • Output: Produce a result that another process or user can consume.
  • Transformation: Change, combine, filter, interpret, or evaluate the data.
  • Storage: Persist, update, or delete state.

RIOTS is not a complete theory of software. It does not replace CRUD, ETL, control flow, error handling, concurrency, state management, or architecture. It is another lens for tracing data through a system when the first explanation is too abstract.

Retrieval and Input Are Different

Retrieval and Input sound redundant until you look at the boundary between them.

Retrieval describes how the data is acquired. A query reads rows from a database. An API client fetches a response. A process reads bytes from a file. A message consumer takes an event from a queue.

Input describes what the next process receives. The database rows may become domain objects. The API response may be reduced to a smaller contract. The file bytes may become a block with an offset and metadata. The event may be validated before entering a transformation pipeline.

That distinction forces useful questions. Are we passing the entire source record into the next step, or only the fields it needs? Is the input contract stable? Does the process know too much about how the data was retrieved? Can we test the transformation without connecting to the original source?

CRUD says the data was read. RIOTS asks what happened after the read.

Using RIOTS to Explain a Data Warehouse

I am currently building a data warehouse that must consolidate data from 847 tables across 12 databases. The two data analysts working with me are capable people, but they are only about three years into their careers. This company is the only place either has worked, and neither has deep experience designing a warehouse.

I could tell them we are building an ETL process: extract, transform, and load.

That names the stages. It does not necessarily help someone inexperienced picture the work inside them.

RIOTS lets us slow the discussion down:

  • Where will each dataset be retrieved from?
  • What shape becomes the input to the pipeline?
  • Which transformations clean, normalize, join, or derive values?
  • What output contract should the warehouse receive?
  • Where and how will the result be stored?

The warehouse is still being built, so I do not have a finished result to claim. The value so far has been shared understanding. RIOTS gave the analysts a more concrete way to discuss how source data becomes warehouse data and what decisions belong at each boundary.

ETL remained the architecture. RIOTS made the mechanics easier to teach.

A Sparse File Is More Than an Upload

Years ago, at an insurance software provider, we needed to upload multi-gigabyte files to Azure Blob Storage as quickly as possible. Treating the problem as a normal file upload left performance on the table.

The files contained large regions filled entirely with zeros. Uploading those blocks would consume time and bandwidth without transferring meaningful data.

We scanned the source file and divided it into blocks sized to align with Azure Blob Storage chunks, approximately 64 KB as I recall. Each block retained its offset from the original file stream. The process inspected the contents and skipped blocks containing only zeros.

The remaining blocks could be uploaded in parallel at their original offsets. Azure received only the meaningful data while preserving the logical shape of the file.

RIOTS provides a useful way to explain the design:

  • Retrieve a block from the file.
  • Pass the bytes, offset, and metadata as input to the inspection step.
  • Transform the block into a decision: upload or skip.
  • Output meaningful blocks with their original offsets.
  • Store those blocks in Azure Blob Storage.

The design uploaded multi-gigabyte files in a fraction of the time available through a straightforward use of the Azure SDK. I no longer have the benchmark numbers, so I will not manufacture precision. The important design decision was avoiding work that did not need to happen.

The fastest block to upload was the block we never sent.

A Teaching Tool, Not a Law

RIOTS is useful when a developer understands the nouns in a requirement but cannot yet see the data flow between them.

It helps turn “process this file” into a sequence of boundaries and decisions. It makes “build an ETL pipeline” more concrete. It helps a reviewer ask whether retrieval, transformation, output, and persistence have become tangled together.

It is not the only way to model software, and not every design needs all five operations. Some steps combine them. Some systems perform several transformations before producing an output. Some inputs arrive without an explicit retrieval step inside the application.

That flexibility is the point. The acronym is a prompt for questions, not an architecture to impose. It can help one developer understand CRUD and help another picture what happens inside an ETL pipeline. Neither explanation has to defeat the other. A teacher needs enough ways into the concept to find the one that lands.

CRUD tells you what happened to the stored data.

RIOTS helps you trace the journey.

Receipts

  • Current warehouse project: The teaching model is being used with two junior data analysts to reason through a warehouse that will consolidate 847 tables across 12 databases. The warehouse is still under development, so no delivery outcome is claimed.
  • Sparse-file upload: At an insurance software provider, a file uploader scanned multi-gigabyte files, preserved block offsets, skipped zero-filled regions, and uploaded meaningful blocks to Azure in parallel.
  • Measurement limit: The sparse-file approach completed uploads in a fraction of the time required by the straightforward SDK path, but the original benchmark numbers are no longer available.
← All writing