TOKENIZE(3) - [CHARACTER:PARSE] Parse a string into tokens.
TOKEN form (returns array of strings)
subroutine tokenize(string, set, tokens [, separator])ARRAY BOUNDS form (returns arrays defining token positions)character(len=*),intent(in) :: string character(len=*),intent(in) :: set character(len=:),allocatable,intent(out) :: tokens(:) character(len=1),allocatable,intent(out),optional :: separator(:)
subroutine tokenize (string, set, first, last)character(len=*),intent(in) :: string character(len=*),intent(in) :: set integer,allocatable,intent(out) :: first(:) integer,allocatable,intent(out) :: last(:)
To reiterate, STRING, SET, TOKENS and SEPARATOR must all be of the same CHARACTER kind type parameter.
o STRING - a scalar of type character. It is an INTENT(IN) argument. o SET - a scalar of type character with the same kind type parameter as STRING. It is an INTENT(IN) argument. o SEPARATOR - (optional) shall be of type character with the same kind type parameter as STRING. It is an INTENT(OUT)argument. It shall not be a coarray or a coindexed object. o TOKENS - of type character with the same kind type parameter as STRING. It is an INTENT(OUT) argument. It shall not be a coarray or a coindexed object. o FIRST,LAST - an allocatable array of type integer and rank one. It is an INTENT(OUT) argument. It shall not be a coarray or a coindexed object.
TOKENIZE(3) parses a string into tokens. There are two forms of the subroutine TOKENIZE(3).
Since the token form pads all the tokens to the same length the original number of trailing spaces of each token accept for the longest is lost.
o The token form returns an array with one token per element, all of the same length as the longest token. o The array bounds form returns two integer arrays. One contains the beginning position of the tokens and the other the end positions. The array bounds form retains information regarding the exact token length even when padded by spaces.
o STRING : The string to parse into tokens. o SET : Each character in SET is a token delimiter. A sequence of zero or more characters in STRING delimited by any token delimiter, or the beginning or end of STRING, comprise a token. Thus, two consecutive token delimiters in STRING, or a token delimiter in the first or last character of STRING, indicate a token with zero length. o TOKENS : It shall be an allocatable array of rank one with deferred length. It is allocated with the lower bound equal to one and the upper bound equal to the number of tokens in STRING, and with character length equal to the length of the longest token. The tokens in STRING are assigned in the order found, as if by intrinsic assignment, to the elements of TOKENS, in array element order.
o FIRST : shall be an allocatable array of type integer and rank one. It is an INTENT(OUT) argument. It shall not be a coarray or a coindexed object. It is allocated with the lower bound equal to one and the upper bound equal to the number of tokens in STRING. Each element is assigned, in array element order, the starting position of each token in STRING, in the order found.
If a token has zero length, the starting position is equal to one if the token is at the beginning of STRING, and one greater than the position of the preceding delimiter otherwise.
o LAST : It is allocated with the lower bound equal to one and the upper bound equal to the number of tokens in STRING. Each element is assigned, in array element order, the ending position of each token in STRING, in the order found. If a token has zero length, the ending position is one less than the starting position.
Sample of uses
program demo_tokenize !use M_strings, only : tokenize=>split2020 implicit none ! some useful formats character(len=*),parameter :: brackets=(*("[",g0,"]":,",")) character(len=*),parameter :: a_commas=(a,*(g0:,",")) character(len=*),parameter :: space=(*(g0:,1x)) character(len=*),parameter :: gen=(*(g0))Results:! Execution of TOKEN form (return array of tokens)
block character (len=:), allocatable :: string character (len=:), allocatable :: tokens(:) character (len=:), allocatable :: kludge(:) integer :: i string = first,second ,third call tokenize(string, set=;,, tokens=tokens ) write(*,brackets)tokens
string = first , second ,third call tokenize(string, set= ,, tokens=tokens ) write(*,brackets)(trim(tokens(i)),i=1,size(tokens)) ! remove blank tokens ! <<< !tokens=pack(tokens, tokens /= ) ! gfortran 13.1.0 bug -- concatenate // and use scratch ! variable KLUDGE. JSU: 2024-08-18 kludge=pack(tokens//, tokens /= ) ! >>> write(*,brackets)kludge
endblock
! Execution of BOUNDS form (return position of tokens)
block character (len=:), allocatable :: string character (len=*),parameter :: set = " ," integer, allocatable :: first(:), last(:) write(*,gen)repeat(1234567890,6) string = first,second,,fourth write(*,gen)string call tokenize (string, set, first, last) write(*,a_commas)FIRST=,first write(*,a_commas)LAST=,last write(*,a_commas)HAS LENGTH=,last-first.gt.0 endblock
end program demo_tokenize
> [ first ],[second ],[third ] > [],[first],[],[],[second],[],[third],[],[],[],[],[] > [first ],[second],[third ] > 123456789012345678901234567890123456789012345678901234567890 > first,second,,fourth > FIRST=1,7,14,15 > LAST=5,12,13,20 > HAS LENGTH=T,T,F,T
Fortran 2023
Fortran intrinsic descriptions (license: MIT) @urbanjost
o SPLIT(3) - return tokens from a string, one at a time o INDEX(3) - Position of a substring within a string o SCAN(3) - Scan a string for the presence of a set of characters o VERIFY(3) - Position of a character in a string of characters that does not appear in a given set of characters.
Nemo Release 3.1 | tokenize (3fortran) | November 02, 2024 |