BisonGen parser

From wikidev.net

Table of contents

Nesting classes

Inline

Core

ticks/, five tildes, variables, templates, nowiki/, entities, inline macros

+ in blocklevels, headers, toplevel text, ticks

links, four tildes, ISBN/RFC

+ in image-links

links

Blocklevel/toplevel

  • table, th, tbody, tr, td and wikitex eq
  • blocklevel macros, multi-line input
  • ----+


Items

Misc

  • templates {{something}}, {{something|param}}, {{something|param=something}}
  • tildes ~~~, ~~~~
  • nowiki: <nowiki> anything_but_close_nowiki </nowiki>
  • ISBN whitespace [0-9X-]+, RFC whitespace \d+
  • & entityspecification ;
  • variables
  • inline macros, can be multi-line input (math, ploticus..)

Ticks

  • ''''' ''''' inline
  • ''' ''' inline, em
  • '' '' inline, strong

Lists

  • (\* | # )+ -- only at linestart, alternative moin syntax using tabs and different enumeration formats
  •  ; (at start of line)
  •  : (start of line or with two spaces ' : ')

Lists can contain: inline, pre,

Links

inline

  • prefix[ [ articlenamespecification]]trail
  • prefix[ [ articlenamespecification| anythingbutclosingbrace ] ]trail
  • urlspecification
  • [ urlspecification space anythingbutclosingbrace]

Headers

  • ={1,6} (at linestart or lineend)

Can contain: ticks, links, variables, templates?

Other tokens

  • New Line
    • not line-limited: pre, all html, multi-line macros
    • double newline corresponds to close/open
  • whitespace +
  • <pre> anything_but_close_pre </pre>
  • <!-- anything_but_close_html_comment -->
  • (one token per valid HTML tag)
  • anyothercharacter
  • variables (+ magic to parse "articlename" for variables to emulate current multipass parser)
  • End of File

Links

List of allowed html tags

$htmlpairs = array( # Tags that must be closed
                                'b', 'del', 'i', 'ins', 'u', 'font', 'big', 'small', 'sub', 'sup', 'h1',
                                'h2', 'h3', 'h4', 'h5', 'h6', 'cite', 'code', 'em', 's',
                                'strike', 'strong', 'tt', 'var', 'div', 'center',
                                'blockquote', 'ol', 'ul', 'dl', 'table', 'caption', 'pre',
                                'ruby', 'rt' , 'rb' , 'rp', 'p'
                        );
                        $htmlsingle = array(
                                'br', 'hr', 'li', 'dt', 'dd'
                        );
                        $htmlnest = array( # Tags that can be nested--??
                                'table', 'tr', 'td', 'th', 'div', 'blockquote', 'ol', 'ul',
                                'dl', 'font', 'big', 'small', 'sub', 'sup'
                        );
                        $tabletags = array( # Can only appear inside table
                                'td', 'th', 'tr'
                        );

Attributes:
$htmlattrs = array( # Allowed attributes--no scripting, etc.
                                'title', 'align', 'lang', 'dir', 'width', 'height',
                                'bgcolor', 'clear', /* BR */ 'noshade', /* HR */
                                'cite', /* BLOCKQUOTE, Q */ 'size', 'face', 'color',
                                /* FONT */ 'type', 'start', 'value', 'compact',
                                /* For various lists, mostly deprecated but safe */
                                'summary', 'width', 'border', 'frame', 'rules',
                                'cellspacing', 'cellpadding', 'valign', 'char',
                                'charoff', 'colgroup', 'col', 'span', 'abbr', 'axis',
                                'headers', 'scope', 'rowspan', 'colspan', /* Tables */
                                'id', 'class', 'name', 'style' /* For CSS */
                                );


Navigation