Home
Parsers turn strings of characters into meaningful data structures (like a JSON object!). nearley is a fast, feature-rich, and modern parser toolkit for JavaScript. nearley is an npm Staff Pick.
nearley 101
- Install:
$ npm install -g nearley
(or try nearley live in your browser here!) - Write your grammar:
# Match a CSS color # http://www.w3.org/TR/css3-color/#colorunits @builtin "whitespace.ne" # `_` means arbitrary amount of whitespace @builtin "number.ne" # `int`, `decimal`, and `percentage` number primitives csscolor -> "#" hexdigit hexdigit hexdigit hexdigit hexdigit hexdigit | "#" hexdigit hexdigit hexdigit | "rgb" _ "(" _ colnum _ "," _ colnum _ "," _ colnum _ ")" | "hsl" _ "(" _ colnum _ "," _ colnum _ "," _ colnum _ ")" | "rgba" _ "(" _ colnum _ "," _ colnum _ "," _ colnum _ "," _ decimal _ ")" | "hsla" _ "(" _ colnum _ "," _ colnum _ "," _ colnum _ "," _ decimal _ ")" hexdigit -> [a-fA-F0-9] colnum -> int | percentage
- Compile your grammar:
$ nearleyc csscolor.ne -o csscolor.js
- Test your grammar:
$ nearley-test -i "#00ff00" csscolor.js Parse results: [ [ '#', [ '0' ], [ '0' ], [ 'f' ], [ 'f' ], [ '0' ], [ '0' ] ] ]
- Turn your grammar into a generator:
$ nearley-unparse -n 3 csscolor.js
#Ab21F2 rgb ( -29.889%,7,8172) #a40
- You try it! Type a CSS color here:
…and the parsed output will appear here!
- Create beautiful railroad diagrams to document your grammar formally.
See a demo here.$ nearley-railroad csscolor.ne -o csscolor.html
Features
- nearley is the first JS parser to use the Earley algorithm (insert your own ‘early bird’ pun here). It also implements Joop Leo’s optimizations for right-recursion, making it effectively linear-time for LL(k) grammars.
- nearley lives happily in node, but doesn’t mind the browser.
- nearley outputs small files. And its expressive DSL comes with plenty of syntactic sugar to keep your source files short. And sweet.
- nearley’s grammar language is powerful and expressive: you can use macros, import from a large builtin library of pre-defined parser-pieces, use a tokenizer for extra performance, and more!
- nearley is built on an idiomatic streaming API. You even have access to partial parses to build predictive user interfaces.
- nearley processes left recursion without choking. In fact, nearley will
parse anything you throw at it without complaining or going into a
sulkinfinite loop. - nearley handles ambiguous grammars gracefully. Ambiguous grammars can be parsed in multiple ways: instead of getting confused, nearley gives you all the parsings (in a deterministic order!).
- nearley allows for debugging with generous error detection. When it catches a parse-time error, nearley tells you exactly what went wrong and where.
- nearley is powerful enough to be bootstrapped. That means nearley uses nearley to compile parts of nearley. nearleyception!
- nearley parsers can be inverted to form generators which output random strings that match a grammar. Useful for writing test cases, fuzzers, and Mad-Libs.
- You can export nearley parsers as railroad diagrams, which provide easy-to-understand documentation of your grammar.
- nearley comes with fantastic tooling. You can find editor plug-ins for vim, Sublime Text, Atom, and VS Code; there are also plug-ins for Webpack and gulp.
Projects using nearley
Artificial Intelligence, NLP, Linguistics: Shrdlite is a programming project in Artificial Intelligence, a course given at the University of Gothenburg and Chalmers University of Technology. It uses nearley for reading instructions in natural language (i.e. English). lexicon-grammars was used to parse lexicons for a project at Australian National University.
Standard formats: node-dmi is a module that reads iconstate metadata from BYOND DMI files, edtf.js is a parser for Extended Date Time Format, node-krl-parser is a KRL parser for node, bibliography is a BibTeX-to-HTML converter, biblatex-csl-converter converts between bibtex/CSL/JSON, scalpel parses CSS selectors (powering enzyme, Airbnb’s React testing tool), rfc5545-rrule helps parse iCalendar data, mangudai parses RMS scripts for Age of Empires II, tf-hcl parses and generates HCL config files, css-selector-inspector parses and tokenizes CSS3 selectors, css-property-parser validates and expands CSS shorthands, node-scad-parser parses OpenSCAD 3D models, js-sql-parse parses SQL statements, pg-mem is an in-memory Postgres database emulator, resp-parser is a parser for the RESP protocol, celio parses Celestia star catalogs, Haraka is an SMTP server that powers Craigslist (and others).
Templating and files: uPresent is a markdown-based presentation authoring system, saison is a minimal templating language, Packdown is a tool to generate human-readable archives of multiple files.
Programming languages: Carbon is a C subset that compiles to JavaScript, optimized for game development, ezlang is a simple language, tlnccuwagnf is a fun general-purpose language, nanalang is a silly esoteric language, english is a less esoteric programming language, ecmaless is an easily-extensible language, hm-parser parses Haskell-like Hindley-Milner type signatures, kozily implements the Oz language, abstract-machine inspects execution models, fbp-types provides typechecking primitives for flow-based systems, lp5562 is an assembler for the TI LP5562 LED driver, VSL is a Versatile Scripting Language, while-typescript is an implementation of the WHILE language, lo is a language for secure distributed systems, jaco is an implementation of CMU’s C0 teaching language, walt is a subset of JavaScript that targets WebAssembly, N-lang is a general-purpose language designed by a group of high school students.
Mathematics: Solvent is a powerful desktop calculator, Truth-table is a tool to visualize propositional logic in truth tables, Emunotes is a personal Wiki with inline graphing and computation, react-equation parses and renders equations in React, the mLab generates category theory papers.
Domain-specific languages: Hexant is a cellular automata simulator with a DSL for custom automata, Dicetower is an advanced dice plugin for hubot, deck.zone is a language to create board games, in-seconds is a time calculator for music applications, website-spec is a tool for functional web testing, pianola allows declarative function composition, idyll is a markup language for data-driven documents, virtsecgroup provides virtual AWS security groups, deadfad is a hex editor that lets you specify structs, bishbosh helps you create command-line interfaces, syso codifies aspects of French legal contracts, siteswap parses Siteswap notation for juggling patterns, jsgrep provides syntactic grep for JavaScript, electro-grammar parses descriptions of electronic components like resistors and capacitors, cicero helps create smart legal contracts, Eventbot is a calendar plugin for Slack used by thousands of teams, Obyte is a cryptocurrency platform, OptiCSS is a CSS optimizer built by LinkedIn, Nestup is a language for specifying nested rhythmic tuplets, htlengine parses Adobe’s HTL template language, fhir-works is an AWS-provided tool to parse FHIRPath search parameters (FHIR is an interface for healthcare data), Penrose is a language for expressing mathematical diagrams, sema is a DSL for live-coding music performances, tinsl is a DSL for creating multi-pass rendering pipelines for real-time post-processing effects using GLSL-like syntax.
Other: ProceduralPsychEpisode generates “random episodes of the hilarious but formulaic show,” parse-vbb-station-name parses names of public transit stops in Berlin.
Parsing libraries: nearley is a parser toolkit for JavaScript. It has a nearley-based DSL to specify parsers.
Give to nearley
nearley has been maintained by volunteers since 2014. If you want to help support us, contact @kach or @tjvr on GitHub. We’ll send over our PayPal information – and maybe something nice. :-).