---
title: 'Understanding AST for JavaScript Developers (2026 Update)'
tags:
  - javascript
  - compiler
published: true
date: 2021-05-10 09:40:39
description: 'A comprehensive guide covering AST concepts, parsing process, key node types, and practical usage in tools like Babel and ESLint.'
---

When working on JavaScript projects these days, you'll notice there are tons of dependencies in `devDependencies`. JavaScript transpiling, code minification, CSS pre-processors, eslint, prettier, and so on. While these features don't make it to production code, they handle important tasks during development. And all of these tools operate based on AST processing.

## Table of Contents

## What is AST?

> In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code.
>
> https://en.wikipedia.org/wiki/Abstract_syntax_tree

Simply put, it's the transformation of code strings into tree-structured data that computers can understand. Each element in the code (variable declarations, function calls, operators, etc.) becomes a node in the tree. An example will make this clearer.

> All examples can be verified at https://astexplorer.net/

```javascript
function square(n) {
  return n * n
}
```

When this code is converted to AST, the tree structure looks roughly like this:

```
Program
└── FunctionDeclaration (name: "square")
    ├── params
    │   └── Identifier (name: "n")
    └── body (BlockStatement)
        └── ReturnStatement
            └── BinaryExpression (operator: "*")
                ├── left: Identifier (name: "n")
                └── right: Identifier (name: "n")
```

You can see that every element of the code maps 1:1 to tree nodes. `function square(n)` becomes a `FunctionDeclaration` node, and `return n * n` inside becomes a `BinaryExpression` under `ReturnStatement`.

The actual AST JSON generated by parsers is much more verbose because it includes metadata like location information (`loc`, `range`, `start`, `end`). Here's the core structure extracted:

```json
{
  "type": "Program",
  "body": [
    {
      "type": "FunctionDeclaration",
      "id": { "type": "Identifier", "name": "square" },
      "params": [{ "type": "Identifier", "name": "n" }],
      "body": {
        "type": "BlockStatement",
        "body": [
          {
            "type": "ReturnStatement",
            "argument": {
              "type": "BinaryExpression",
              "operator": "*",
              "left": { "type": "Identifier", "name": "n" },
              "right": { "type": "Identifier", "name": "n" }
            }
          }
        ]
      }
    }
  ]
}
```

> If you're curious about the complete AST JSON, paste the code above into [AST Explorer](https://astexplorer.net/) to see it immediately.

## The Process of Creating AST from Code

But how does such a tree get created from a code string? It goes through two main stages.

### Stage 1: Lexical Analysis

The lexical analyzer (also called scanner or tokenizer) breaks down the code string into **token** units. Tokens are the smallest meaningful units.

```javascript
function square(n) {
  return n * n
}
```

Tokenizing the above code produces this result:

```
[
  { type: 'keyword',    value: 'function' },
  { type: 'identifier', value: 'square' },
  { type: 'punctuator', value: '(' },
  { type: 'identifier', value: 'n' },
  { type: 'punctuator', value: ')' },
  { type: 'punctuator', value: '{' },
  { type: 'keyword',    value: 'return' },
  { type: 'identifier', value: 'n' },
  { type: 'punctuator', value: '*' },
  { type: 'identifier', value: 'n' },
  { type: 'punctuator', value: '}' },
]
```

The lexical analyzer reads the code character by character, distinguishing keywords like `function`, identifiers like `square`, and punctuation like `(`. Whitespace and line breaks are removed during this process.

Let's look at how a very simple tokenizer actually works. Below is a mini tokenizer that only handles numbers and arithmetic operations:

```javascript
function tokenize(code) {
  const tokens = []
  let i = 0

  while (i < code.length) {
    const char = code[i]

    // Skip whitespace
    if (/\s/.test(char)) {
      i++
      continue
    }

    // Numbers: treat consecutive digits as one token
    if (/[0-9]/.test(char)) {
      let value = ''
      while (i < code.length && /[0-9]/.test(code[i])) {
        value += code[i++]
      }
      tokens.push({ type: 'number', value })
      continue
    }

    // Operators
    if ('+-*/'.includes(char)) {
      tokens.push({ type: 'operator', value: char })
      i++
      continue
    }

    throw new Error(`Unknown character: ${char}`)
  }

  return tokens
}

tokenize('12 + 3 * 45')
// [
//   { type: 'number', value: '12' },
//   { type: 'operator', value: '+' },
//   { type: 'number', value: '3' },
//   { type: 'operator', value: '*' },
//   { type: 'number', value: '45' },
// ]
```

The core principle is simple. Look at the current character and determine "is this the start of a number, an operator, or whitespace?" then classify it into the appropriate token. Real JavaScript parser tokenizers have to handle much more complex cases like strings (`'...'`, `"..."`), regular expressions (`/.../`), template literals (`` `...` ``), but the basic principle is the same.

### Stage 2: Syntax Analysis

The syntax analyzer (parser) takes the token list from above and assembles it into a **tree structure** according to the language's grammar rules. It applies grammar rules like "function keyword followed by identifier, with parameters in parentheses..." to create relationships between tokens as a tree. If code that doesn't match the grammar comes in, `SyntaxError` is thrown here. And the result of this process is the `Abstract Syntax Tree`.

One of the most interesting aspects of what parsers do is handling **operator precedence**. Consider `1 + 2 * 3`. Simply reading left to right would give `(1 + 2) * 3 = 9`, but the mathematically correct result is `1 + (2 * 3) = 7`. The parser represents this precedence through tree structure.

```
// AST for 1 + 2 * 3
// Multiplication is at a deeper position, so it's calculated first

BinaryExpression (+)
├── left: NumericLiteral (1)
└── right: BinaryExpression (*)
    ├── left: NumericLiteral (2)
    └── right: NumericLiteral (3)
```

The `*` is positioned deeper (further down) in the tree than `+`. When evaluating the tree from bottom to top, `2 * 3` is calculated first, then `1` is added to that result. The operation order is naturally encoded in the tree structure even without explicitly writing parentheses.

The reason it's called "Abstract" is that syntactic decorations like parentheses and semicolons are implicitly contained in the tree structure itself, so they're not represented as separate nodes. In the above example, even if you write `(2 * 3)` with parentheses, the AST structure remains the same. The meaning of the parentheses (precedence) is already reflected in the tree structure.

> Note: Trees that include all syntactic elements including parentheses and semicolons are called CST (Concrete Syntax Tree). Tools like prettier that need to preserve the original code format as much as possible sometimes use representations closer to CST.

### JavaScript Parsers

Several parsers exist in the JavaScript ecosystem. Most follow the [ESTree](https://github.com/estree/estree) AST spec, so basic node structures are compatible even across different parsers.

| Parser | Language | Features |
|--------|----------|----------|
| [acorn](https://github.com/acornjs/acorn) | JS | Lightweight and fast. Default parser for webpack, eslint |
| [@babel/parser](https://github.com/babel/babel/tree/master/packages/babel-parser) | JS | Supports JSX, TypeScript, up to Stage 0 proposals. Provides ESTree compatibility mode |
| [typescript](https://github.com/microsoft/TypeScript) | TS | Built-in parser for TypeScript compiler. Uses its own AST format (not ESTree) |
| [SWC](https://swc.rs/) | Rust | Written in Rust. Dozens of times faster than Babel |
| [oxc](https://oxc-project.github.io/) | Rust | Written in Rust. Parser from project aiming to replace ESLint |

Regardless of which parser you use, the basic "code → tokens → AST" pipeline structure is the same. However, they differ in supported grammar scope, performance, error recovery capabilities, etc.

### Learn More

- If you want to learn about compilers, I recommend checking out [The-super-tiny-compiler](https://github.com/jamiebuilds/the-super-tiny-compiler). It implements the simplest compiler example written in JavaScript.
- [AST Explorer](https://astexplorer.net/) - Paste code to see AST immediately. You can also choose from multiple parsers.
- [@babel/parser](https://github.com/babel/babel/tree/master/packages/babel-parser) formerly babylon

## Understanding AST Node Types

To work with AST, you need to understand the main node types. Based on the [ESTree spec](https://github.com/estree/estree), JavaScript AST nodes are broadly divided into three categories.

### Statement vs Expression

This distinction is most important.

- **Statement**: Performs actions. Doesn't produce values. `if`, `for`, `return`, `variable declarations`, etc.
- **Expression**: Produces values. `1 + 2`, `foo()`, `a ? b : c`, etc.

```javascript
// Statement: doesn't produce values (can't assign to variables)
if (true) { }
for (let i = 0; i < 10; i++) { }

// Expression: produces values (can assign to variables)
const x = 1 + 2
const y = condition ? 'a' : 'b'
const z = foo()
```

This distinction is important because it determines "what node types to look for" when traversing AST. For example, if you want to find all function calls, target `CallExpression`; if you want to find variable declarations, target `VariableDeclaration` (Statement).

### Major Node Types

Here are the commonly encountered node types organized with code examples:

```javascript
// VariableDeclaration + VariableDeclarator
const x = 1
// { type: "VariableDeclaration", kind: "const",
//   declarations: [{ type: "VariableDeclarator",
//     id: Identifier("x"), init: NumericLiteral(1) }] }

// FunctionDeclaration
function foo(a, b) { return a + b }
// { type: "FunctionDeclaration", id: Identifier("foo"),
//   params: [Identifier("a"), Identifier("b")],
//   body: BlockStatement }

// ArrowFunctionExpression
const add = (a, b) => a + b
// { type: "ArrowFunctionExpression",
//   params: [Identifier("a"), Identifier("b")],
//   body: BinaryExpression("+") }

// CallExpression
console.log('hello')
// { type: "CallExpression",
//   callee: MemberExpression(console, log),
//   arguments: [StringLiteral("hello")] }

// MemberExpression
obj.prop
obj['prop']
// { type: "MemberExpression", object: Identifier("obj"),
//   property: Identifier("prop"), computed: false | true }

// ConditionalExpression (ternary operator)
a ? b : c
// { type: "ConditionalExpression",
//   test: Identifier("a"),
//   consequent: Identifier("b"),
//   alternate: Identifier("c") }

// IfStatement
if (condition) { doA() } else { doB() }
// { type: "IfStatement",
//   test: Identifier("condition"),
//   consequent: BlockStatement,
//   alternate: BlockStatement }
```

Do you see the pattern? All nodes are distinguished by the `type` field, and each node type has predetermined properties. `BinaryExpression` has `left`, `operator`, `right`, and `IfStatement` has `test`, `consequent`, `alternate`. Understanding this structure makes working with AST-based tools much easier.

> The complete ESTree spec can be found at [estree/estree](https://github.com/estree/estree/blob/master/es2015.md).

## Use Case 1: Transpiling (Babel)

The most representative use case for AST is transpiling. https://babeljs.io/ Babel is a JavaScript compiler that works in three main stages:

1. **Parsing**: Convert code to AST
2. **Transforming**: Traverse AST and transform it to desired form
3. **Generation**: Output transformed AST back to code string

### Parse & Generate

The most basic form is parsing and then generating code again.

```javascript
import * as parser from '@babel/parser'
import generate from '@babel/generator'

const code = `const welcome = 'hello world'`

// 1. Code → AST
const ast = parser.parse(code)

// 2. AST → Code
const output = generate(ast)
console.log(output.code) // const welcome = 'hello world'
```

Looking at this alone might make you think "so what?" The key is transforming the AST between steps 1 and 2.

### Traverse & Transform

Babel's real power lies in using `@babel/traverse` to traverse the AST and modify nodes. Let's look at a simple example. Code that changes all `const` to `let`:

```javascript
import * as parser from '@babel/parser'
import _traverse from '@babel/traverse'
import _generate from '@babel/generator'

const traverse = _traverse.default
const generate = _generate.default

const code = `
  const a = 1
  const b = 2
`

const ast = parser.parse(code)

// Traverse AST and transform const → let
traverse(ast, {
  VariableDeclaration(path) {
    if (path.node.kind === 'const') {
      path.node.kind = 'let'
    }
  },
})

const output = generate(ast)
console.log(output.code)
// let a = 1;
// let b = 2;
```

The keys of the object passed to `traverse` are exactly the AST node types. Every time a node of type `VariableDeclaration` is encountered, the callback is executed. This structure is called the **visitor pattern**, and almost all AST-based tools use this pattern.

### Understanding the path Object

The `path` that the callback receives in the above example isn't just a simple node wrapper. It's an object that contains all position and relationship information within the AST tree.

```javascript
traverse(ast, {
  Identifier(path) {
    path.node             // The current AST node itself
    path.parent           // Parent node
    path.parentPath       // Parent's path object
    path.scope            // Current scope information

    // Manipulation methods
    path.replaceWith(newNode)    // Replace current node with another node
    path.remove()                // Remove current node
    path.insertBefore(newNode)   // Insert new node before current node
    path.insertAfter(newNode)    // Insert new node after current node

    // Search methods
    path.findParent(p => p.isFunction())  // Find parent matching condition
    path.getSibling(0)                     // Access sibling nodes
  }
})
```

`path.scope` is also a powerful feature. You can track where variables are declared and where they're referenced.

```javascript
traverse(ast, {
  Identifier(path) {
    const binding = path.scope.getBinding(path.node.name)
    if (binding) {
      console.log(binding.kind)           // 'const', 'let', 'var', 'param', etc.
      console.log(binding.referenced)     // Whether it's referenced
      console.log(binding.references)     // Number of references
      console.log(binding.referencePaths) // Reference locations
    }
  }
})
```

Because of these features, tasks like "finding unused variables" and "safely renaming variables" become possible.

### Babel Plugins

Let's look at one more practical example. A Babel plugin that removes all `console.log`:

```javascript
// babel-plugin-remove-console.js
export default function () {
  return {
    visitor: {
      CallExpression(path) {
        const { callee } = path.node
        if (
          callee.type === 'MemberExpression' &&
          callee.object.name === 'console' &&
          callee.property.name === 'log'
        ) {
          path.remove()
        }
      },
    },
  }
}
```

Among `CallExpression` nodes, it finds `console.log` calls and removes them with `path.remove()`. Actual plugins that remove console logs from production builds work this way.

> For more details about babel, you can study at https://github.com/jamiebuilds/babel-handbook.

## Use Case 2: Automated Code Refactoring (JSCodeShift)

The next use case we'll explore is [JSCodeShift](https://github.com/facebook/jscodeshift), which automatically refactors code. For example, let's say you want to perform this transformation:

```javascript
// before
load().then(function (response) {
  return response.data
})

// after
load().then((response) => response.data)
```

Since this isn't simple find-and-replace, it's impossible with regular text editors. `jscodeshift` makes this possible.

`jscodeshift` is a toolkit for running `codemods`. The actual AST-based transformation happens in `codemods`. The basic idea is similar to the relationship between babel and its plugins.

Writing a codemod that performs the above transformation looks like this:

```javascript
// function-to-arrow.js
export default function transformer(file, api) {
  const j = api.jscodeshift

  return j(file.source)
    .find(j.FunctionExpression)
    .replaceWith((path) => {
      const { params, body } = path.node

      // If body is just a return statement, make it a concise arrow function
      if (
        body.body.length === 1 &&
        body.body[0].type === 'ReturnStatement'
      ) {
        return j.arrowFunctionExpression(params, body.body[0].argument)
      }

      return j.arrowFunctionExpression(params, body)
    })
    .toSource()
}
```

```bash
npx jscodeshift -t function-to-arrow.js src/
```

Running this will transform all function expressions to arrow functions in all files under `src/`. Whether there are hundreds or thousands of files doesn't matter. This is where AST-based transformation shines in large-scale refactoring.

React also provides codemods for major version updates. [react-codemod](https://github.com/reactjs/react-codemod) has codemods that automatically handle transformations like createClass → ES6 class, PropTypes separation, etc.

## Use Case 3: Linting (ESLint)

ESLint also operates based on AST. It parses code into AST, then each rule uses the visitor pattern to traverse specific node types and find problems. The structure is almost identical to Babel plugins.

Let's create a simple custom rule. A rule that prohibits `var` usage:

```javascript
// no-var.js
module.exports = {
  meta: {
    type: 'suggestion',
    fixable: 'code',
  },
  create(context) {
    return {
      VariableDeclaration(node) {
        if (node.kind === 'var') {
          context.report({
            node,
            message: 'Use let or const instead of var.',
            fix(fixer) {
              return fixer.replaceTextRange(
                [node.range[0], node.range[0] + 3],
                'let',
              )
            },
          })
        }
      },
    }
  },
}
```

Comparing with Babel plugin visitor structure, they're remarkably similar. The keys of the object returned by the `create` function are AST node types, and callbacks are executed every time a node of that type is encountered. The difference is that while Babel directly modifies the AST, ESLint reports problems with `context.report()`, and if auto-fixing is needed, it fixes at the text level through the `fix` function.

> If you want to create custom ESLint rules yourself, also check out [Creating My Own ESLint Rules](/2022/06/how-to-write-my-own-eslint-rules).

## Use Case 4: Code Formatting (Prettier)

[Prettier](https://prettier.io/) also utilizes AST. It takes code, creates an AST, and outputs it again in a consistent style based on the AST. However, prettier has one more stage:

1. Code → AST
2. AST → IR (Intermediate Representation, called `Doc`)
3. IR → Formatted code

Stage 2 is the key. While converting AST nodes to an intermediate representation called `Doc`, it includes formatting hints like "if this part fits on one line, keep it on one line; if not, split into multiple lines." Then an algorithm called `printer` traverses the `Doc` and determines optimal formatting considering overall line length.

Understanding is faster when you see what `Doc` actually looks like. The Doc for code like `foo(arg1, arg2, arg3)` conceptually has this structure:

```
group([
  "foo(",
  indent([
    softline,
    "arg1,",
    line,
    "arg2,",
    line,
    "arg3",
  ]),
  softline,
  ")"
])
```

Here `group` means "put on one line if possible, but split into multiple lines if not." `line` becomes a space in single-line mode and a line break in multi-line mode. `softline` inserts nothing in single-line mode and only line breaks in multi-line mode.

Thanks to this structure, prettier can format the same code differently depending on the situation to fit `printWidth`.

```javascript
// When it fits within printWidth → one line
foo(arg1, arg2, arg3)

// When it exceeds printWidth → multiple lines
foo(
  arg1,
  arg2,
  arg3,
)
```

This decision isn't made by simply looking at string length, but by understanding the AST structure, which produces consistent results even in nested structures.

> If you want to learn more about prettier's algorithm, refer to Philip Wadler's paper [A prettier printer](https://homepages.inf.ed.ac.uk/wadler/papers/prettier/prettier.pdf).

## Use Case 5: Code Visualization

AST also enables visual representation of code. [js2flowchart](https://github.com/Bogdan-Lyashenko/js-code-to-svg-flowchart) is a library that converts JavaScript code to flowchart SVG.

The operating principle follows the same context we've examined so far:

1. Code → AST
2. AST → FlowTree (simplified tree with unnecessary nodes omitted)
3. FlowTree → ShapesTree (visual type, position, relationship information for each node)
4. ShapesTree → SVG

Ultimately, the pattern of using AST as an intermediate representation to transform code into other forms is consistent.

## Summary

The common pattern of the tools we've examined so far can be summarized like this:

```
Code (string)
  ↓ Lexical Analysis (tokenization)
Token list
  ↓ Syntax Analysis (parsing)
AST
  ↓ Transform/analyze/output
Result (new code, error report, SVG, etc.)
```

And tools that handle AST almost without exception use the **visitor pattern**. Babel, ESLint, and jscodeshift all have the same structure of passing an object with "interested node types as keys, callback functions as values."

```javascript
// Babel plugin
{ visitor: { CallExpression(path) { ... } } }

// ESLint rule
{ create() { return { CallExpression(node) { ... } } } }

// jscodeshift
j(source).find(j.CallExpression).forEach(path => { ... })
```

Ultimately, there's one core principle: **When you treat code as structured data rather than strings, precise operations that are impossible with text replacement become possible.** AST is the most universal method for creating that structured data.