HTML parsing/serialization toolset for Node.js. WHATWG HTML Living Standard (aka HTML5)-compliant.
$ npm install parse5
Use parse5.parse method.
innerHTML setter."Use parse5.parseFragment method.
Use parse5.serialize method.
<script> execution and document.write"Use parse5.ParserStream class.
Use parse5.SAXParser class.
Use parse5.PlainTextConversionStream class.
Use parse5.SerializerStream class.
Use locationInfo options: ParserOptions.locationInfo, SAXParserOptions.locationInfo.
Use treeAdapter options: ParserOptions.treeAdapter and SerializerOptions.treeAdapter
with one of two built-in tree formats.
Implement TreeAdapter interface and then use treeAdapter option to pass it to parser or serializer.
parse5 package includes a TypeScript definition file.
Due to multiple issues typings are not enabled
by default. To use built-in parse5 typings you need first install @types/node
if you don't have it installed yet and add following lines to your tsconfig.json file:
// snip...
"compilerOptions": {
"baseUrl": ".",
"paths": {
"parse5": ["./node_modules/parse5/lib/index.d.ts"]
},
// snip...
Note that since parse5 supports multiple output tree formats you need to manually cast generic node interfaces to the appropriate tree format to get access to the properties:
import * as parse5 from 'parse5';
// Using default tree adapter.
var document1 = parse5.parse('<div></div>') as parse5.AST.Default.Document;
// Using htmlparser2 tree adapter.
var document2 = parse5.parse('<div></div>', {
treeAdapter: parse5.TreeAdapters.htmlparser2
}) as parse5.AST.HtmlParser2.Document;
You can find documentation for interfaces in API reference.
You can create a custom tree adapter, so that parse5 can work with your own DOM-tree implementation.
Then pass it to the parser or serializer via the treeAdapter option:
const parse5 = require('parse5');
const myTreeAdapter = {
//Adapter methods...
};
const document = parse5.parse('<div></div>', { treeAdapter: myTreeAdapter });
const html = parse5.serialize(document, { treeAdapter: myTreeAdapter });
Refer to the API reference for the description of methods that should be exposed by the tree adapter, as well as links to their default implementation.
Compile it with browserify and you're set.
<img src="foo"> with the SAXParser and I expect the selfClosing flag to be true for the <img> tag. But it's not. Is there something wrong with the parser?No. A self-closing tag is a tag that has a / before the closing bracket. E.g: <br/>, <meta/>.
In the provided example, the tag simply doesn't have an end tag. Self-closing tags and tags without end tags are treated differently by the
parser: in case of a self-closing tag, the parser does not look up for the corresponding closing tag and expects the element not to have any content.
But if a start tag is not self-closing, the parser treats everything that follows it (with a few exceptions) as the element content.
However, if the start tag is in the list of void elements, the parser expects the corresponding
element not to have content and behaves in the same way as if the element was self-closing. So, semantically, if an element is
void, self-closing tags and tags without closing tags are equivalent, but it's not true for other tags.
TL;DR: selfClosing is a part of lexical information and is set only if the tag has / before the closing bracket in the source code.
Most likely, it's not. There are a lot of weird edge cases in HTML5 parsing algorithm, e.g.:
<b>1<p>2</b>3</p>
will be parsed as
<b>1</b><p><b>2</b>3</p>
Just try it in the latest version of your browser before submitting an issue.
This is a major release that delivers few minor (but breaking) changes to workaround recently appeared issues with TypeScript Node.js typings versioning and usage of parse5 in environments that are distinct from Node.js (see https://github.com/inikulin/parse5/issues/235 for the details).
ParserStream, PlainTextConversionStream,
SerializerStream, SAXParser) is now lazily loaded. That enables bundling of the basic functionality
for other platforms (e.g. for browsers via webpack).@types/node (by @gfx).ParserStream (GH #195) (by @stevenvachon).location.startTag is not available if end tag is missing (GH #181).MarkupData.Location.col description in TypeScript definition file (GH #170).document.quirksMode property was replaced with document.mode property which can have
'no-quirks', 'quirks' and 'limited-quirks' values. Tree adapter setQuirksMode and isQuirksMode methods were replaced with setDocumentMode and getDocumentMode methods (GH #83).<!DOCTYPE html> as per spec (GH #137).__location.endTag when the start tag contains newlines (GH #166) (by @webdesus).LocationInfo.endOffset for implicitly closed <p> element (GH #109).SAXParser (by @RReverser)
\n in <pre>, <textarea> and <listing>.<image>.Latest spec changes
Fixed: Element nesting corrections now take namespaces into consideration.
parseFragment with locationInfo regression when parsing <template>(GH #90)
(by @yyx990803).parseFragment arguments fallback (GH #84).parseFragment arguments processing (GH #82).start and end were renamed to startOffset and endOffset respectively.SimpleApiParser was renamed to SAXParser.Parser class.Serializer class.decodeHtmlEntities and encodeHtmlEntities options. (GH #75).default tree adapter now stores <template> content in template.content property instead of template.childNodes[0].DocumentType.data property rendering (GH #45).<html> and <body> elements (GH #44).<form> processing in <template> (GH #40).<template> serialization problem with custom tree adapter (GH #38).encodeHtmlEntities option.<template> supportparseFragment now uses <template> as default contextElement. This leads to the more "forgiving" parsing manner.TreeSerializer was renamed to Serializer. However, serializer is accessible as parse5.TreeSerializer for backward compatibility .htmlparser2 tree format (DOM Level1 node emulation).jsdom internal use only.document element for fragment parsing (required by jsdom)..travis.yml, .editorconfig) are removed from NPM package.require().<pre>.SYSTEM-only DOCTYPE serialization.DOCTYPE IDs.appendChild in htmlparser2 tree adapter.<menuitem> handling in <body>.Generated using TypeDoc