BisonGen parser
From wikidev.net
| Table of contents |
|
|
[edit]
Nesting classes
[edit]
Inline
[edit]
Core
ticks/, five tildes, variables, templates, nowiki/, entities, inline macros
[edit]
+ in blocklevels, headers, toplevel text, ticks
links, four tildes, ISBN/RFC
[edit]
+ in image-links
links
[edit]
Blocklevel/toplevel
- table, th, tbody, tr, td and wikitex eq
- blocklevel macros, multi-line input
- ----+
[edit]
Items
[edit]
Misc
- templates {{something}}, {{something|param}}, {{something|param=something}}
- tildes ~~~, ~~~~
- nowiki: <nowiki> anything_but_close_nowiki </nowiki>
- ISBN whitespace [0-9X-]+, RFC whitespace \d+
- & entityspecification ;
- variables
- inline macros, can be multi-line input (math, ploticus..)
[edit]
Ticks
- ''''' ''''' inline
- ''' ''' inline, em
- '' '' inline, strong
[edit]
Lists
- (\* | # )+ -- only at linestart, alternative moin syntax using tabs and different enumeration formats
- ; (at start of line)
- : (start of line or with two spaces ' : ')
Lists can contain: inline, pre,
[edit]
Links
inline
- prefix[ [ articlenamespecification]]trail
- prefix[ [ articlenamespecification| anythingbutclosingbrace ] ]trail
- urlspecification
- [ urlspecification space anythingbutclosingbrace]
[edit]
Headers
- ={1,6} (at linestart or lineend)
Can contain: ticks, links, variables, templates?
[edit]
Other tokens
- New Line
- not line-limited: pre, all html, multi-line macros
- double newline corresponds to close/open
- whitespace +
- <pre> anything_but_close_pre </pre>
- <!-- anything_but_close_html_comment -->
- (one token per valid HTML tag)
- anyothercharacter
- variables (+ magic to parse "articlename" for variables to emulate current multipass parser)
- End of File
[edit]
Links
- http://www.gnu.org/software/bison/manual/html_mono/bison.html
- ftp://ftp.fourthought.com/pub/BisonGen/
- http://docs.python.org/ext/building.html
- irc://irc.freenode.net/4suite
- meta:Wikipedia lexer
[edit]
List of allowed html tags
$htmlpairs = array( # Tags that must be closed
'b', 'del', 'i', 'ins', 'u', 'font', 'big', 'small', 'sub', 'sup', 'h1',
'h2', 'h3', 'h4', 'h5', 'h6', 'cite', 'code', 'em', 's',
'strike', 'strong', 'tt', 'var', 'div', 'center',
'blockquote', 'ol', 'ul', 'dl', 'table', 'caption', 'pre',
'ruby', 'rt' , 'rb' , 'rp', 'p'
);
$htmlsingle = array(
'br', 'hr', 'li', 'dt', 'dd'
);
$htmlnest = array( # Tags that can be nested--??
'table', 'tr', 'td', 'th', 'div', 'blockquote', 'ol', 'ul',
'dl', 'font', 'big', 'small', 'sub', 'sup'
);
$tabletags = array( # Can only appear inside table
'td', 'th', 'tr'
);
Attributes:
$htmlattrs = array( # Allowed attributes--no scripting, etc.
'title', 'align', 'lang', 'dir', 'width', 'height',
'bgcolor', 'clear', /* BR */ 'noshade', /* HR */
'cite', /* BLOCKQUOTE, Q */ 'size', 'face', 'color',
/* FONT */ 'type', 'start', 'value', 'compact',
/* For various lists, mostly deprecated but safe */
'summary', 'width', 'border', 'frame', 'rules',
'cellspacing', 'cellpadding', 'valign', 'char',
'charoff', 'colgroup', 'col', 'span', 'abbr', 'axis',
'headers', 'scope', 'rowspan', 'colspan', /* Tables */
'id', 'class', 'name', 'style' /* For CSS */
);
