UTF8_TO_CODEPOINTS(3f) - [M_unicode:CONVERSION] Convert UTF-8-encoded data to Unicode codepoints (LICENSE:MIT)
Synopsis
Characteristics
Description
Options
Examples
See Also
Author
License
pure subroutine utf8_to_codepoints(utf8,codepoints,nerr)
character(len=1),intent(in) :: utf8(:) ! or character(len=*),intent(in) :: utf8 ! integer,allocatable,intent(out) :: codepoints(:) integer,intent(out) :: nerr
o UTF8 is a scalar CHARACTER variable or array of single-byte CHARACTER values o the returned values in CODEPOINTS are of default INTEGER kind o the error flag NERR is default integer kind
UTF8_TO_CODEPOINTS(3f) takes either a scalar CHARACTER variable or an array of CHARACTER(LEN=1) bytes which are treated as a stream of bytes representing UTF-8-encoded data and converted to an INTEGER array containing Unicode codepoint values for each glyph.
o UTF8 : Scalar CHARACTER string or single-character array of CHARACTER variables assumed to represent a stream of bytes containing data encoded at UTF-8 text. o CODEPOINTS : An INTEGER array of Unicode codepoint values representing the glyphs found in STRING o NERR : Zero if no error occurred. If not zero the stream of bytes could not be completely converted to UTF-8 characters.
Sample program
program demo_utf8_to_codepoints use m_unicode, only : utf8_to_codepoints implicit none character(len=*),parameter :: string =Noho me ka hau’oli !(Be happy) character(len=1),allocatable :: bytes(:) character(len=*),parameter :: solid=(*(g0)) character(len=*),parameter :: space=(*(g0,1x)) character(len=*),parameter :: z=(a,*(z0,1x)) integer,allocatable :: codepoints(:) integer :: nerr integer :: i ! BASIC USAGE: SCALAR CHARACTER VARIABLE write(*,solid)STRING:,string call utf8_to_codepoints(string,codepoints,nerr) write(*,space)CODEPOINTS:, codepoints write(*,z)HEXADECIMAL CODEPOINTS:, codepoints ! write(*,space)How long is this string in glyphs? write(*,space)size(codepoints) write(*,space)How long is this string in bytes? write(*,space)len(string) ! ! BASIC USAGE: ARRAY OF BYTES bytes=[(string(i:i),i=1,len(string))] write(*,solid)STRING:,bytes call utf8_to_codepoints(bytes,codepoints,nerr) write(*,space)CODEPOINTS:, codepoints write(*,z)HEXADECIMAL CODEPOINTS:, codepoints ! write(*,space)How long is this string in glyphs? write(*,space)size(codepoints) write(*,space)How long is this string in bytes? write(*,space)size(bytes) ! end program demo_utf8_to_codepointsResults:
> STRING:Noho me ka hau’oli > CODEPOINTS: 78 111 104 111 32 109 101 32 107 97 32 104 97 117 ... > 8217 111 108 105 > 48 4E 6F 68 6F 20 6D 65 20 6B 61 20 68 61 75 2019 6F 6C 69 > How long is this string in glyphs? > 18 > How long is this string in bytes? > 20 > STRING:Noho me ka hau’oli > CODEPOINTS: 78 111 104 111 32 109 101 32 107 97 32 104 97 117 ... > 8217 111 108 105 > 48 4E 6F 68 6F 20 6D 65 20 6B 61 20 68 61 75 2019 6F 6C 69 > How long is this string in glyphs? > 18 > How long is this string in bytes? > 20
functions that perform operations on character strings:
o elemental: adjustl(3), adjustr(3), index(3), scan(3), verify(3) o non-elemental: len_trim(3), repeat(3), trim(3), codepoints_to_utf8(3)
o John S. Urban o Francois Jacq - enhancements and optional Latin support from Francois Jacq, 2025-08
