Table of Contents
This appendix covers the escaping rules used to represent non-ASCII characters in Haskell character and string literals. Haskell's escaping rules follow the pattern established by the C programming language, but expand considerably upon them.
A single character is surrounded by ASCII single quotes,
'
, and has type Char.
ghci>
'c'
'c'ghci>
:type 'c'
'c' :: Char
A string literal is surrounded by double quotes,
"
, and has type [Char] (more
often written as String).
ghci>
"a string literal"
"a string literal"ghci>
:type "a string literal"
"a string literal" :: [Char]
The double-quoted form of a string literal is just syntactic sugar for list notation.
ghci>
['a', ' ', 's', 't', 'r', 'i', 'n', 'g'] == "a string"
True
Haskell uses Unicode internally for its Char data type. Since String is just an alias for [Char], a list of Chars, Unicode is also used to represent strings.
Different Haskell implementations place limitations on the character sets they can accept in source files. GHC allows source files to be written in the UTF-8 encoding of Unicode, so in a source file, you can use UTF-8 literals inside a character or string constant. Do be aware that if you use UTF-8, other Haskell implementations may not be able to parse your source files.
When you run the ghci interpreter interactively, it may not be able to deal with international characters in character or string literals that you enter at the keyboard.
Some characters must be escaped to be represented inside a character or string literal. For example, a double quote character inside a string literal must be escaped, or else it will be treated as the end of the string.
Haskell uses essentially the same single-character escapes as the C language and many other popular languages.
Table?B.1.?Single-character escape codes
Escape | Unicode | Character |
---|---|---|
\0 | U+0000 | null character |
\a | U+0007 | alert |
\b | U+0008 | backspace |
\f | U+000C | form feed |
\n | U+000A | newline (line feed) |
\r | U+000D | carriage return |
\t | U+0009 | horizontal tab |
\v | U+000B | vertical tab |
\" | U+0022 | double quote |
\& | n/a | empty string |
\' | U+0027 | single quote |
\\ | U+005C | backslash |
To write a string literal that spans multiple lines, terminate one line with a backslash, and resume the string with another backslash. An arbitrary amount of whitespace (of any kind) can fill the gap between the two backslashes.
"this is a \ \long string,\ \ spanning multiple lines"
Haskell recognises the escaped use of the standard two- and three-letter abbreviations of ASCII control codes.
Table?B.2.?ASCII control code abbreviations
Escape | Unicode | Meaning |
---|---|---|
\NUL | U+0000 | null character |
\SOH | U+0001 | start of heading |
\STX | U+0002 | start of text |
\ETX | U+0003 | end of text |
\EOT | U+0004 | end of transmission |
\ENQ | U+0005 | enquiry |
\ACK | U+0006 | acknowledge |
\BEL | U+0007 | bell |
\BS | U+0008 | backspace |
\HT | U+0009 | horizontal tab |
\LF | U+000A | line feed (newline) |
\VT | U+000B | vertical tab |
\FF | U+000C | form feed |
\CR | U+000D | carriage return |
\SO | U+000E | shift out |
\SI | U+000F | shift in |
\DLE | U+0010 | data link escape |
\DC1 | U+0011 | device control 1 |
\DC2 | U+0012 | device control 2 |
\DC3 | U+0013 | device control 3 |
\DC4 | U+0014 | device control 4 |
\NAK | U+0015 | negative acknowledge |
\SYN | U+0016 | synchronous idle |
\ETB | U+0017 | end of transmission block |
\CAN | U+0018 | cancel |
\EM | U+0019 | end of medium |
\SUB | U+001A | substitute |
\ESC | U+001B | escape |
\FS | U+001C | file separator |
\GS | U+001D | group separator |
\RS | U+001E | record separator |
\US | U+001F | unit separator |
\SP | U+0020 | space |
\DEL | U+007F | delete |
Haskell recognises an alternate notation for control
characters, which represents the archaic effect of pressing
the control key on a
keyboard and chording it with another key. These sequences
begin with the characters \^
, followed by a
symbol or uppercase letter.
Table?B.3.?Control-with-character escapes
Escape | Unicode | Meaning |
---|---|---|
\^@ | U+0000 | null character |
\^A through \^Z | U+0001 through U+001A | control codes |
\^[ | U+001B | escape |
\^\ | U+001C | file separator |
\^] | U+001D | group separator |
\^^ | U+001E | record separator |
\^_ | U+001F | unit separator |
Haskell allows Unicode characters to be written using
numeric escapes. A decimal character begins with a digit,
e.g. \1234
. A hexadecimal character begins
with an x
, e.g. \xbeef
.
An octal character begins with an o
,
e.g. \o1234
.
The maximum value of a numeric literal is
\1114111
, which may also be written
\x10ffff
or
\o4177777
.
String literals can contain a zero-width escape sequence,
written \&
. This is not a real
character, as it represents the empty string.
ghci>
"\&"
""ghci>
"foo\&bar"
"foobar"
The purpose of this escape sequence is to make it possible to write a numeric escape followed immediately by a regular ASCII digit.
ghci>
"\130\&11"
"\130\&11"
Because the empty escape sequence represents an empty string, it is not legal in a character literal.
兰花是什么季节开的kuyehao.com | 电饭煲内胆什么材质好hcv8jop0ns4r.cn | 氯化钠是什么盐hlguo.com | 女人性高潮是什么感觉hcv7jop7ns3r.cn | 备孕男性吃什么精子强hcv8jop6ns6r.cn |
汉族人是什么人种hcv8jop9ns8r.cn | 血浓稠是什么原因引起的hcv9jop1ns3r.cn | 食用葡萄糖是什么hcv8jop0ns4r.cn | darker是什么意思aiwuzhiyu.com | 降7是什么调hcv8jop4ns0r.cn |
青霉素主治什么病hcv7jop6ns2r.cn | 脑梗塞用什么药效果好hcv8jop5ns3r.cn | 口臭要做什么检查hcv8jop0ns1r.cn | 风湿性心脏病是什么原因引起的hcv9jop4ns4r.cn | 嗳气是什么症状xinmaowt.com |
内膜薄是什么意思hcv7jop6ns1r.cn | 号外是什么意思hcv7jop6ns1r.cn | 免是什么意思naasee.com | 鼻子有痣代表什么hcv7jop7ns1r.cn | 子宫肌瘤有什么症状表现hcv9jop7ns9r.cn |