文字列

Zenにおける文字列の扱いについて解説します。

まず、Zenには文字列専用の型は存在しません。全てのテキストはUTF-8でエンコードされたバイト列として処理されます。

実行時に動的に文字数が変化する文字列を作るには、標準ライブラリのArrayListまたはBufferを使います。

文字列は""で囲う単一行の文字列リテラルと、\\から始まる複数行の文字列リテラルで記述します。

単一行の文字列リテラル

文字列リテラルはu8の配列です。""で囲んだ中身が単一行の文字列リテラルになります。また、その配列はC言語と同様NULL文字 (\0) で終端しています。

単一行の文字列リテラル内では改行できません。

examples/ch02-primitives/src/strings.zen:4:16

test "string literals" {
    const hello = "hello";
    ok(@TypeOf(hello) == *[5:0]u8);
    ok(hello[0] == 'h');
    ok(hello.len == 5);

    // ひらがなは1文字3バイト
    const japanese = "こんにちは";
    ok(@TypeOf(japanese) == *[15:0]u8);
    // 「こ」は`{ 0xe3, 0x81, 0x93 }`
    ok(japanese[0] == 0xe3);
    ok(japanese.len == 15);
}

上の例では、変数 hello の型は *[5:0]u8 です。配列のポインタ型に似ていますが :0 がついていることで "hello" がNULL文字で終端していることを示しています。

文字列リテラルはNULL文字で終端していますが、要素数にNULL文字の分は含みません。上の例では変数 hello の要素数はNULL文字を含まない 5 となります。

終端文字自体にはアクセスすることはできません。

以下のコードはコンパイルエラーになります。

examples/ch02-primitives/src/strings.zen:18:20

test "termination character access" {
    const hello = "hello";
    ok(hello[5] == 0);
}

error[E04028]: index is out of bounds
    ok(hello[5] == 0);

いくつかの文字はバックスラッシュ (\) によるエスケープが必要です。

文字	エスケープシーケンス
シングルクォート (`'`)	\'
ダブルクォート (`"`)	\"
改行	\n
キャリッジリターン	\r
タブ	\t
バックスラッシュ (`\`)	\\
16進数文字コード	\xHH
16進数Unicode	\u{HHHHHH}

例えば、次のようなエスケープシーケンスにより、文字列リテラル内にUnicodeの文字を埋め込むことができます。

examples/ch02-primitives/src/strings.zen:24:28

test "escape sequence" {
    const japanese = "こんにちは";
    const unicode = "こ\u{3093}にちは";
    equalSlices(u8, japanese, unicode);
}

複数行の文字列リテラル

\\ で複数行の文字列リテラルを記述できます。

改行は文字列リテラルに\nで格納されます。ただし、最終行の改行は含まれません。

test "string multi-line literals" {
    const hello = 
        \\Hello
        \\World
    ;

    equalSlices(u8, hello, "Hello\nWorld");
}

複数行の文字列リテラルでは、エスケープシーケンスを記述することはできません。

test "no-escape sequence" {
    const text_not_newline =
        \\\n
    ;
    ok(text_not_newline[0] != '\n');

    const text_not_n =
        \\\u{3093}
    ;
    ok(text_not_n.len > 3);
    for("\u{3093}") | chara, idx | {
        ok(text_not_n[idx] != chara);
    }
}

文字列リテラルの連結と文字列リテラルの積

配列リテラルと同様に、コンパイル時計算可能な文字列リテラルは、++演算子によって連結することができます。

examples/ch02-primitives/src/strings.zen:30:35

test "concatenates string literals" {
    const hello = "hello";
    const world = "world";
    const message = hello ++ " " ++ world ++ "\n";
    equalSlices(u8, message, "hello world\n");
}

また、コンパイル時計算可能な文字列リテラルは、**演算子によって複数回繰り返した文字列を作ることができます。

examples/ch02-primitives/src/strings.zen:37:41

test "multiplies string literals" {
    const hyphen = "-";
    const line = hyphen ** 10;
    equalSlices(u8, line, "----------");
}

文字列のスライス

文字列はu8の配列なので、スライスとして使用することもできます。

examples/ch02-primitives/src/strings.zen:43:48

test "string slice" {
    const hello = "hello world";
    const world = hello[6..];

    equalSlices(u8, world, "world");
}

Zen

文字列

単一行の文字列リテラル

複数行の文字列リテラル

文字列リテラルの連結と文字列リテラルの積

文字列のスライス

Chapter 1

Chapter 2

Chapter 3

Chapter 4

Chapter 5

Chapter 6

Chapter 7

Chapter 8

Chapter 9

Chapter 10

Chapter 11

Chapter 12

Chapter 13

Chapter 14

Chapter 15

Appendix

Error Explanation