Step-by-step guide for writing a custom babel transformation

Today, I will share a step-by-step guide for writing a custom babel transformation. You can use this technique to write your own automated code modifications, refactoring and code generation.

What is babel?

Babel is a JavaScript compiler that is mainly used to convert ECMAScript 2015+ code into backward compatible version of JavaScript in current and older browsers or environments. Babel uses a plugin system to do code transformation, so anyone can write their own transformation plugin for babel.

Before you get started writing a transformation plugin for babel, you would need to know what is an Abstract Syntax Tree (AST).

What is Abstract Syntax Tree (AST)?

I am not sure I can explain this better than the amazing articles out there on the web:

Leveling Up One’s Parsing Game With ASTs by Vaidehi Joshi * (Highly recommend this one! 👍)
Wikipedia's Abstract syntax tree
What is an Abstract Syntax Tree by Chidume Nnamdi

To summarize, AST is a tree representation of your code. In the case of JavaScript, the JavaScript AST follows the estree specification.

AST represents your code, the structure and the meaning of your code. So it allows the compiler like babel to understand the code and make specific meaningful transformation to it.

So now you know what is AST, let's write a custom babel transformation to modify your code using AST.

How to use babel to transform code

The following is the general template of using babel to do code transformation:

import { parse } from '@babel/parser';
import traverse from '@babel/traverse';
import generate from '@babel/generator';

const code = 'const n = 1';

// parse the code -> ast
const ast = parse(code);

// transform the ast
traverse(ast, {
  enter(path) {
    // in this example change all the variable `n` to `x`
    if (path.isIdentifier({ name: 'n' })) {
      path.node.name = 'x';
    }
  },
});

// generate code <- ast
const output = generate(ast, code);
console.log(output.code); // 'const x = 1;'

You would need to install @babel/core to run this. @babel/parser, @babel/traverse, @babel/generator are all dependencies of @babel/core, so installing @babel/core would suffice.

So the general idea is to parse your code to AST, transform the AST, and then generate code from the transformed AST.

code -> AST -> transformed AST -> transformed code

However, we can use another API from babel to do all the above:

import babel from '@babel/core';

const code = 'const n = 1';

const output = babel.transformSync(code, {
  plugins: [
    // your first babel plugin 😎😎
    function myCustomPlugin() {
      return {
        visitor: {
          Identifier(path) {
            // in this example change all the variable `n` to `x`
            if (path.isIdentifier({ name: 'n' })) {
              path.node.name = 'x';
            }
          },
        },
      };
    },
  ],
});

console.log(output.code); // 'const x = 1;'

Now, you have written your first babel transform plugin that replace all variable named n to x, how cool is that?!

Extract out the function myCustomPlugin to a new file and export it. Package and publish your file as a npm package and you can proudly say you have published a babel plugin! 🎉🎉

At this point, you must have thought: "Yes I've just written a babel plugin, but I have no idea how it works...", so fret not, let's dive in on how you can write the babel transformation plugin yourself!

So, here is the step-by-step guide to do it:

1. Have in mind what you want to transform from and transform into

In this example, I want to prank my colleague by creating a babel plugin that will:

reverse all the variables' and functions' names
split out string into individual characters

function greet(name) {
  return 'Hello ' + name;
}

console.log(greet('tanhauhau')); // Hello tanhauhau

into

function teerg(eman) {
  return 'H' + 'e' + 'l' + 'l' + 'o' + ' ' + eman;
}

console.log(teerg('t' + 'a' + 'n' + 'h' + 'a' + 'u' + 'h' + 'a' + 'u')); // Hello tanhauhau

Well, we have to keep the console.log, so that even the code is hardly readable, it is still working fine. (I wouldn't want to break the production code!)

2. Know what to target on the AST

Head down to a babel AST explorer, click on different parts of the code and see where / how it is represented on the AST:

targeting

If this is your first time seeing the AST, play around with it for a little while and get the sense of how is it look like, and get to know the names of the node on the AST with respect to your code.

So, now we know that we need to target:

Identifier for variable and function names
StringLiteral for the string.

3. Know how the transformed AST looks like

Head down to the babel AST explorer again, but this time around with the output code you want to generate.

output

Play around and think how you can transform from the previous AST to the current AST.

For example, you can see that 'H' + 'e' + 'l' + 'l' + 'o' + ' ' + eman is formed by nested BinaryExpression with StringLiteral.

4. Write code

Now look at our code again:

function myCustomPlugin() {
  return {
    visitor: {
      Identifier(path) {
        // ...
      },
    },
  };
}

The transformation uses the visitor pattern.

During the traversal phase, babel will do a depth-first search traversal and visit each node in the AST. You can specify a callback method in the visitor, such that while visiting the node, babel will call the callback method with the node it is currently visiting.

In the visitor object, you can specify the name of the node you want to be callbacked:

function myCustomPlugin() {
  return {
    visitor: {
      Identifier(path) {
        console.log('identifier');
      },
      StringLiteral(path) {
        console.log('string literal');
      },
    },
  };
}

Run it and you will see that "string literal" and "identifier" is being called whenever babel encounters it:

identifier
identifier
string literal
identifier
identifier
identifier
identifier
string literal

Before we continue, let's look at the parameter of Identifer(path) {}. It says path instead of node, what is the difference between path and node? 🤷‍

In babel, path is an abstraction above node, it provides the link between nodes, ie the parent of the node, as well as information such as the scope, context, etc. Besides, the path provides method such as replaceWith, insertBefore, remove, etc that will update and reflect on the underlying AST node.

You can read more detail about path in Jamie Kyle's babel handbook

So let's continue writing our babel plugin.

Transforming variable name

As we can see from the AST explorer, the name of the Identifier is stored in the property called name, so what we will do is to reverse the name.

Identifier(path) {
  path.node.name = path.node.name
    .split('')
    .reverse()
    .join('');
}

Run it and you will see:

function teerg(eman) {
  return 'Hello ' + eman;
}

elosnoc.gol(teerg('tanhauhau')); // Hello tanhauhau

We are almost there, except we've accidentally reversed console.log as well. How can we prevent that?

Take a look at the AST again:

member expression

console.log is part of the MemberExpression, with the object as "console" and property as "log".

So let's check that if our current Identifier is within this MemberExpression and we will not reverse the name:

Identifier(path) {
  if (
    !(
      path.parentPath.isMemberExpression() &&
      path.parentPath
        .get('object')
        .isIdentifier({ name: 'console' }) &&
      path.parentPath.get('property').isIdentifier({ name: 'log' })
    )
  ) {
   path.node.name = path.node.name
     .split('')
     .reverse()
     .join('');
 }
}

And yes, now you get it right!

function teerg(eman) {
  return 'Hello ' + eman;
}

console.log(teerg('tanhauhau')); // Hello tanhauhau

So, why do we have to check whether the Identifier's parent is not a console.log MemberExpression? Why don't we just compare whether the current Identifier.name === 'console' || Identifier.name === 'log'?

You can do that, except that it will not reverse the variable name if it is named console or log:

const log = 1;

So, how do I know the method isMemberExpression and isIdentifier? Well, all the node types specified in the @babel/types have the isXxxx validator function counterpart, eg: anyTypeAnnotation function will have a isAnyTypeAnnotation validator. If you want to know the exhaustive list of the validator functions, you can head over to the actual source code.

Transforming strings

The next step is to generate a nested BinaryExpression out of StringLiteral.

To create an AST node, you can use the utility function from @babel/types. @babel/types is also available via babel.types from @babel/core.

StringLiteral(path) {
  const newNode = path.node.value
    .split('')
    .map(c => babel.types.stringLiteral(c))
    .reduce((prev, curr) => {
      return babel.types.binaryExpression('+', prev, curr);
    });
  path.replaceWith(newNode);
}

So, we split the content of the StringLiteral, which is in path.node.value, make each character a StringLiteral, and combine them with BinaryExpression. Finally, we replace the StringLiteral with the newly created node.

...And that's it! Except, we ran into Stack Overflow 😅:

RangeError: Maximum call stack size exceeded

Why 🤷‍ ?

Well, that's because for each StringLiteral we created more StringLiteral, and in each of those StringLiteral, we are "creating" more StringLiteral. Although we will replace a StringLiteral with another StringLiteral, babel will treat it as a new node and will visit the newly created StringLiteral, thus the infinite recursive and stack overflow.

So, how do we tell babel that once we replaced the StringLiteral with the newNode, babel can stop and don't have to go down and visit the newly created node anymore?

We can use path.skip() to skip traversing the children of the current path:

StringLiteral(path) {
  const newNode = path.node.value
    .split('')
    .map(c => babel.types.stringLiteral(c))
    .reduce((prev, curr) => {
      return babel.types.binaryExpression('+', prev, curr);
    });
  path.replaceWith(newNode);
  path.skip();
}

...And yes it works now with now stack overflow!

Summary

So, here we have it, our first code transformation with babel:

const babel = require('@babel/core');
const code = `
function greet(name) {
  return 'Hello ' + name;
}
console.log(greet('tanhauhau')); // Hello tanhauhau
`;
const output = babel.transformSync(code, {
  plugins: [
    function myCustomPlugin() {
      return {
        visitor: {
          StringLiteral(path) {
            const concat = path.node.value
              .split('')
              .map(c => babel.types.stringLiteral(c))
              .reduce((prev, curr) => {
                return babel.types.binaryExpression('+', prev, curr);
              });
            path.replaceWith(concat);
            path.skip();
          },
          Identifier(path) {
            if (
              !(
                path.parentPath.isMemberExpression() &&
                path.parentPath
                  .get('object')
                  .isIdentifier({ name: 'console' }) &&
                path.parentPath.get('property').isIdentifier({ name: 'log' })
              )
            ) {
              path.node.name = path.node.name
                .split('')
                .reverse()
                .join('');
            }
          },
        },
      };
    },
  ],
});
console.log(output.code);

A summary of the steps on how we get here:

Have in mind what you want to transform from and transform into
Know what to target on the AST
Know how the transformed AST looks like
Write code

Further resources

If you are interested to learn more, babel's Github repo is always the best place to find out more code examples of writing a babel transformation.

Head down to https://github.com/babel/babel, and look for babel-plugin-transform-* or babel-plugin-proposal-* folders, they are all babel transformation plugin, where you can find code on how babel transform the nullish coalescing operator, optional chaining and many more.

Manipulating AST with JavaScript using Babel

If you like what you've read so far, and want to learn how you could do it with Babel. I've created a video course, showing you step-by-step, how to write a babel plugin and codemod.

In the video course, I detailed tips and tricks, such as how to handle scope, how to use state, and also nested traversals.

Sounds interesting, let's take a look at the video course