Ansible Data Manipulation with Modules

Ansible loves to pretend that YAML is a programming language. It isn’t. And every engineer who has ever tried to munge data inside a playbook knows the pain. You have filters everywhere, Jinja spaghetti, and tasks that look like they were written during a period of sleep deprivation. How do I know, guilty as charged.

Just to be clear, what i’m saying is YAML and Jinja are not intended to be a Data‑Processing stack

The usual Ansible talking point is that “Ansible is declarative, not imperative”. Sure. But then I immediately need to write imperative logic in Jinja because the playbook layer simply isn’t built for data transformation. I do it, you do it, we’ve all done it. It’s usually quick, and depending on the use case, relatively painless, but at some point, you’ve taken it too far. I know I have, so I’m talking about it now. Automation should be declarative, but you need imperative to achieve declarative – stuff needs to be queried and computed to achieve a desired state. Ansible provides all the batteries needed to achieve this.

The pain points you run into are real, you end up with:

  • Complex list/dict transformations
  • Conditional logic that becomes unreadable in YAML
  • Repeated filter chains that break the moment your data shape changes
  • Playbooks that become untestable because the logic is embedded in templates

Fundamentally, if you’re doing anything non‑trivial with data, YAML based tasks and Jinja are the wrong tool.

Ansible does have a solution: Move the logic into a module

Stop abusing filters, YAML, Jinja. Write a module. If your playbook contains more than two chained filters, or chains of set_facts, or complex jinja, you probably should have written a module.

I’ve written my fair share of modules, they aren’t that difficult, but my mindset has always gone to the convoluted set_fact, conditional, filter, jinja fiasco – because somehow it seems easier at the time. Perhaps it is when you’re trying to capture that initial thought process. But at some point, you need to give yourself a reality check, and maybe it’s just simpler to start with modules than convert later. That’s the thought experiment i’d like you to consider. A module gives you:

  • Real programming constructs
  • Real error handling
  • Real testability
  • Real maintainability
  • Real version control and reuse

Why is this a better Pattern?

Input validation – YAML doesn’t. Playbooks don’t, (don’t say assert to me as i’ve abused that as well). Jinja definitely doesn’t.

Modules let you validate input before you do your thing with it.

Modules are testable

You can unit test a module. You cannot unit test a Jinja filter chain inside a task, and when your processing is a sequence of knitted tasks full of set_facts and recursive playbook calls, you’ve crossed the line into prayer-based testing.

Modules are reusable across roles and playbooks

Copy‑pasting filter chains, or jinja compute, or those wonderful blocks of set_facts and conditionals is how outages happen.

Modules reduce cognitive load

A 20‑line Python function is easier to understand than a 20‑line set_fact, conditional, jinja monstrosity.

The summary is Playbooks orchestrate. Modules compute. This is how Ansible should always have been used.

So, what is my example problem and how do I fix it with modules.

My ansible role was running Proxmox backups in my home lab. I was only backing up systems in my lab that had been powered on since the last backup, either daily or weekly. My pve_backup role was doing all of the following in YAML:

  • Multi‑node API discovery
  • Cross‑node VM enumeration
  • Tag parsing and normalization
  • Per‑VM filtering
  • Per‑VM state evaluation
  • Time‑window logic
  • Task‑history correlation
  • Backup triggering
  • UPID polling
  • Error handling

This is imperative logic. YAML + Jinja is not an imperative language. I had effectively built a Python program using a markup language. Yay me!

Based on my thought process that I describe above, I could identify many ‘code smells’:

Excessive set_fact

This is always a sign the playbook is doing computation, not orchestration.

Nested loops + sub elements

This is a red flag that the data model is too complex for YAML.

Repeated REST calls with identical headers

Modules handle this cleanly; playbooks do not.

Per‑VM include files

This is a workaround for the fact that YAML cannot express real logic.

State accumulation (vms_powered_on_last_week)

This is business logic, not orchestration.

UPID polling in YAML

This is the worst possible place to do it.

Debug statements everywhere

Because debugging YAML logic is hell.

My role wasn’t bad in the terms that it did ‘work’. It’s simply doing something Ansible playbooks were never designed to do.

How did I refactor this mess?

A good module model I use is:

One module = one conceptual operation

My conceptual operations are:

“Given a Proxmox cluster, return the list of VMIDs that should be backed up.”

“Given a VMID, run vzdump and wait for completion.”

That’s it. Two modules replace ~300 lines of gnarly YAML.

Why this is objectively better (oh and I simply feel better about it)

Testable

You can unit‑test the module logic without running Ansible.

Faster

Fewer tasks, leading to fewer forks, leading to fewer HTTP sessions.

Maintainable

No more Jinja filter soup.

Debuggable

You can print structured Python objects, not YAML hacks.

Reusable

Other roles can use the same modules.

Correct abstraction

Playbooks orchestrate. Modules compute.

In summary

Think of your future self now 🙂

Train yourself to spot the above code smells sooner rather than later.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Navigation