Max Wash 2fcadf7f39 core: string: add UTF-8 and null-char support; and some new string functions
b_string now uses UTF-8 internally, and can correctly manipulate strings
that contain non-ASCII and multi-byte codepoints.

b_string now tracks the length of a string in both bytes and unicode codepoints.

string insertion functions have been updated to correctly handle strings with
multi-byte codepoints, so the index parameter of each function now refers to codepoints
rather than bytes. inserting single-byte chars into a string with no multi-byte codepoints
is still optimised to used array indexing and memmove.

a b_string_iterator has been added to simplify iterating through a UTF-8 string, without
having to use a charAt()-style interface that would incur performance penalties.

strings can now also contain null bytes.

new functions include:
  - b_string_tokenise: a b_iterator interface for iterating through tokens
    in a string. similar to strtok except that:
    * it is re-entrant, and uses no global state.
    * it supports delimiters that are longer than one character and/or contain
      multi-byte UTF-8 codepoints.
    * it doesn't modify the string that is being iterated over.
    * it correctly handles strings with multi-byte UTF-8 codepoints and null chars.
  - b_string_compare: for comparing strings. necessary to use this rather than strcpy
    as b_strings can now contain null chars.
2025-09-22 10:36:26 +01:00
2025-08-09 19:36:46 +01:00
2024-10-24 21:32:28 +01:00
2024-10-24 13:09:16 +01:00
2024-10-27 19:43:05 +00:00
2025-09-19 15:47:59 +01:00
2024-10-24 13:09:16 +01:00
2024-10-24 13:09:16 +01:00
2024-10-24 13:09:16 +01:00
2024-10-24 13:09:16 +01:00
Description
Cross-platform C framework
Readme BSD-3-Clause 1.1 MiB
Languages
C 98.8%
CMake 1.2%