..  Copyright (c) 2017-2023 Science and Technology Facilities Council.

    All rights reserved.

    Modifications made as part of the fparser project are distributed
    under the following license:

    Redistribution and use in source and binary forms, with or without
    modification, are permitted provided that the following conditions are
    met:

    1. Redistributions of source code must retain the above copyright
    notice, this list of conditions and the following disclaimer.

    2. Redistributions in binary form must reproduce the above copyright
    notice, this list of conditions and the following disclaimer in the
    documentation and/or other materials provided with the distribution.

    3. Neither the name of the copyright holder nor the names of its
    contributors may be used to endorse or promote products derived from
    this software without specific prior written permission.

    THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
    "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
    LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
    A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
    HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
    SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
    LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
    DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
    THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
    (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
    OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

.. _developers:

Developer Guide
===============

Reading Fortran
---------------

A key part of the fparser package is support for reading Fortran code.
`fparser.common.readfortran.FortranFileReader` provides this functionality
for source files while `FortranStringReader` supports Fortran source
provided as a string. Both of these classes sub-class `FortranReaderBase`:

.. autoclass:: fparser.common.readfortran.FortranReaderBase

Note that the setting for `ignore_comments` provided here can be overridden
on a per-call basis by methods such as `get_single_line`.
The 'mode' of the reader is controlled by passing in a suitable instance of
the `FortranFormat` class:

.. autoclass:: fparser.common.sourceinfo.FortranFormat

Due to its origins in the f2py project, the reader contains support
for recognising `f2py` directives
(https://numpy.org/devdocs/f2py/signature-file.html). However, this
functionality is disabled by default.

A convenience script called read.py is provided in the scripts
directory which takes a filename as input and returns the file
reader's representation of that file. This could be useful for
debugging purposes.

Invalid input
-------------

The file reader uses :py:func:`open` to open a Fortran file. If
invalid input is found then Python raises a `UnicodeDecodeError`
exception by default. Since we typically wish to skip invalid
characters (on the principle that, for valid Fortran, they can only
occur in comments) while logging their presence, a bespoke error
handler named "fparser-logging" is implemented in
``fparser/__init__.py`` and registered using
:py:func:`codecs.register_error`.  This handler may be specified when
using :py:func:`open` to open a file by supplying the
``errors='fparser-logging'`` argument.

Fparser2
--------

Fparser2 supports Fortran2003 and is being extended to support
Fortran2008. Fparser2 is being actively developed and will fully
replace fparser1 in the future.

.. _rules:

Rules
+++++

Each version of the Fortran language is defined as a set of rules in a
specification document. The Fortran2003 rules are specified here
https://wg5-fortran.org/N1601-N1650/N1601.pdf and the Fortran2008
rules are specified here
https://j3-fortran.org/doc/year/10/10-007r1.pdf.

Each rule has a number, for example the Fortran2003 document includes
the following top level rules `R201` and `R202`
::

    R201 program is program-unit
                    [ program-unit ] ...

    R202 program-unit is main-program
                         or external-subprogram
                         or module
                         or block-data

It can be seen that the right hand side of these rules consist of more
rules. Note, `[]` means that the content is optional. At some point in
the rule hierarchy rules start to be defined by text. For example,
taking a look at the specification of a module
::

    R1104 module is module-stmt
                    [ specification-part ]
                    [ module-subprogram-part ]
                    end-module-stmt

    R1105 module-stmt is MODULE module-name

    R1106 end-module-stmt is END [ MODULE [ module-name ] ]

it can be seen that rules `R1105` and `R1106` specify the actual code to
write e.g. `MODULE`. Here `module-name` is a type of `name` which has
a rule specifying what is valid syntax (see the specification document
for more details).

Therefore Fortran is specified as rules which reference other rules,
or specify a particular syntax. The top level rule of this hierarchy
is rule `R201`, which defines a program, see above.


Classes
+++++++

In fparser2 each rule is implemented in a class with the class names
closely following the rule names. For example, `program` is
implemented by the `Program` class and `program-unit` is implemented
by the `Program_Unit` class. In general, the name of the class
corresponding to a given rule can be obtained by replacing '-' with
'_' and capitalising each word.

The Fortran2003 classes exist in the ``Fortran2003.py`` file and the
Fortran2008 classes exist in the ``Fortran2008`` directory (see
:ref:`Fortran2008` section for Fortran2008-specific implementation
details).

The Fortran2003 and Fortran2008 classes can inherit from a set of
pre-existing base classes which implement certain rule patterns in a
generic way. The base classes are contained in the `utils.py` file.

The base classes and rule patterns are discussed more in the
:ref:`base-classes` section.

The primary components of classes i.e. the parts that developers
typically need to be concerned with are:

1) the `subclass_names` list
2) the `use_names` list
3) the static `match` method
4) the `tostr` method

A `subclass_names` list of classes should be provided when the rule is
a simple choice between classes. In this case the `Base` class ensures that each
child class is tested for a match and the one that matches is
returned. An example of a simple choice rule is `R202`. See the
:ref:`program-unit-class` section for a description of its
implementation.

The `use_names` list should contain any classes that are referenced by the
implementation of the current class. These lists of names are aggregated
(along with `subclass_names`) and used to ensure that all necessary `Scalar_`,
`_List` and `_Name` classes are generated (in code at the end of the
`Fortran2003` and `Fortran2008` modules - see :ref:`class-generation`).

When the rule is not a simple choice the developer needs to supply a
static `match` method. An example of this is rule `R201`. See the
:ref:`program-class` section for a description of its implementation.

.. note::

   A `tostr` description, explanation and example needs to be added.

Class Relationships
+++++++++++++++++++

When a rule is a simple choice, the class implementing this rule
provides a list of classes to be matched in the `subclass_names` list
(or potentially `use_names` list). These class names are provided as
strings, not references to the classes themselves.

In fparser2 these strings are used to create class references to allow
matching to be performed. The creation of class references is
implemented by the `create` method of the `ParserFactory` object.

The `create` method of the `ParserFactory` class also links to
appropriate classes to create parsers compliant to the specified
standard.

.. note::

   The ParserFactory implementation needs to be explained.

A parser conforming to a particular Fortran standard is created by a
ParserFactory object. For example::

    >>> from fparser.two.parser import ParserFactory
    >>> parser_f2003 = ParserFactory().create(std="f2003")

The `create` method returns a `Program` *class* (called `parser_f2003`
in the above example) which contains a `subclasses` dictionary
(declared in its base class - called `Base`) configured with *all* the
Fortran2003 class relationships specified by the `subclass_names` and
`use_names` lists in each class.

As all classes inherit from the `Base` class, the `subclasses`
dictionary is available to all classes. If, for example, we query the
dictionary for the `Program` class relationships we get an empty list
as it has no `subclass_names` or `use_names` entries specified (see
:ref:`program-class`). If however, we query the dictionary for the
`Program_unit` relationships we get the list of classes specified in
that classes `subclass_names` list (see :ref:`program-unit-class`)::

    >>> parser_f2003.__name__
    'Program'
    >>> parser_f2003.subclasses['Program']
    []
    >>> parser_f2003.subclasses['Program_Unit']
    [<class 'fparser.two.Fortran2003.Main_Program'>, <class 'fparser.two.Fortran2003.Function_Subprogram'>, <class 'fparser.two.Fortran2003.Subroutine_Subprogram'>, <class 'fparser.two.Fortran2003.Module'>, <class 'fparser.two.Fortran2003.Block_Data'>]

Symbol Table
++++++++++++

There are many situations when it is not possible to disambiguate the
precise form of the Fortran being parsed without additional type
information (e.g. whether code of the form `a(i,j)` is an array
access or a function call).  Therefore fparser2 contains a single,
global instance of a `SymbolTables` class, accessed as
`fparser.two.symbol_table.SYMBOL_TABLES`. As its name implies, this
holds a collection of symbol tables, one for each top-level scoping
unit (e.g. module or program unit). This is implemented as a
dictionary where the keys are the names of the scoping units e.g. the
name of the associated module, program, subroutine or function. The
corresponding dictionary entries are instances of the `SymbolTable`
class:

.. autoclass:: fparser.two.symbol_table.SymbolTable
   :members:

The entries in these tables are instances of the named tuple,
`SymbolTable.Symbol` which currently has the properties:

 * name
 * primitive_type

Both of these are stored as strings. In future, support for more
properties (e.g. kind, shape, visibility) will be added and strings
replaced with enumerations where it makes sense. Similarly, support
will be added for other types of symbols (e.g. those representing
program/subroutine names or reserved Fortran keywords).

Symbols available in the scoping region of a module may be made
available in another scoping region through one or more `USE` statements.
In a `SymbolTable` such uses are captured as instances of `ModuleUse`:

.. autoclass:: fparser.two.symbol_table.ModuleUse

These instances are created by calling:

.. automethod:: fparser.two.symbol_table.SymbolTable.add_use_symbols

Fortran has support for nested scopes - e.g. variables declared within
a module are in scope within any routines defined within that
module. Therefore, when searching for the definition a symbol, we
require the ability to search up through all symbol tables accessible
from the current scope. In order to support this functionality, each
`SymbolTable` instance therefore has a `parent` property. This holds a
reference to the table that contains the current table (if any).

Since fparser2 relies heavily upon recursion, it is important that the
current scoping unit always be available from any point in the code.
Therefore, the `SymbolTables` class has the `current_scope` property
which contains a reference to the current `SymbolTable`. Obviously,
this property must be updated as the parser enters and leaves scoping
units.  This is handled for all cases bar one within the `BlockBase`
base class since this is sub-classed by all classes which represent a
block of code and that therefore includes all those which define a
scoping region. The exception is the helper class
`Fortran2003.Main_Program0` which represents Program units that do not
include the (optional) program-stmt (see R1101 in the Fortran
standard).  The creation of a scoping unit for such a program is
handled within the `Fortran2003.Main_Program0.match()` method. Since
there is no name associated with such a program, the corresponding
symbol table is given the name "fparser2:main_program", chosen so as
to prevent any clashes with other Fortran names.

Those classes which define scoping regions must subclass the
`ScopingRegionMixin` class:

.. autoclass:: fparser.two.utils.ScopingRegionMixin


.. _class-generation:

Class Generation
++++++++++++++++

Some classes that are specified as strings in the `subclass_names` or
`use_names` variables do not require class implementations. There are 3
categories of these:

1) classes of the form '\*\_Name'
2) classes of the form '\*\_List'
3) classes of the form 'Scalar\_\*'

The reason for this is that such classes can be written in a generic,
boiler-plate way so it is simpler if these are generated rather than
them having to be hand written.

At the end of the ``Fortran2003.py`` and ``Fortran2008/__init__.py``
files there is code that is executed when the file is imported. This
code generates the required classes described above in the local file.

.. note::

   The way this is implemented needs to be described.

As a practical example, consider rule `R1106`
::

   R1106 end-module-stmt is END [ MODULE [ module-name ] ]

which is implemented in the following way
::

    class End_Module_Stmt(EndStmtBase):  # R1106
        ''' <description> '''
        subclass_names = []
        use_names = ['Module_Name']

        @staticmethod
        def match(string):
            return EndStmtBase.match('MODULE', Module_Name, string)

It can be seen that the `Module_Name` class is specified as a string
in the `use_names` variable. The `Module_Name` class has no
implementation in the Fortran2003.py file, the class is
generated. This code generation is performed when the file is
imported.

.. note::

   At the moment the same code-generation code is replicated in both
   the ``Fortran2003.py`` and ``Fortran2008/__init__.py`` files. It would be
   better to import this code from a separate file if it is possible to do so.

.. _base-classes:

Base classes
++++++++++++

There are a number of base classes implemented to support matching
certain types of pattern in a rule. The two most commonly used are
given below. As mentioned earlier, the class `Base` supports a choice
between classes. The class `BlockBase` supports an initial and final
match with optional subclasses inbetween (useful for matching rules
such as programs, subroutines, if statements etc.).

.. autoclass:: fparser.two.utils.Base
               :members:
               :noindex:

.. autoclass:: fparser.two.utils.BlockBase
               :members:
               :noindex:

.. note::

   The `BlockBase` `match` method is complicated. One way to simplify this
   would be to create a `NamedBlockBase` which subclasses `BlockBase`. This
   would include the code associated with a block having a name.

.. _Fortran2008:

Fortran2008 implementation
++++++++++++++++++++++++++

As Fortran2008 is a superset of Fortran2003, the Fortran2008 classes
are implemented as extensions to the Fortran2003 classes where
possible. For example, the Fortran2003 rule for a program-unit is::
   
    R202 program-unit is main-program
                         or external-subprogram
                         or module
                         or block-data

and for Fortran2008 it is
::
   
    R202 program-unit is main-program
                         or external-subprogram
                         or module
                         or submodule
                         or block-data

Therefore to implement the Fortran2008 version of this class, the
Fortran2003 version needs to be extended with the `subclass_names`
list being extended to include a `Submodule` class as a string (of
course the `Submodule` class also needs to be implemented!)
::

    >>> from fparser.two.Fortran2003 import Program_Unit as Program_Unit_2003

    >>> class Program_Unit(Program_Unit_2003):  # R202
    >>>       ''' <description> '''
    >>>       subclass_names = Program_Unit_2003.subclass_names[:]
    >>>       subclass_names.append("Submodule")


.. _program-class:

Program Class (rule R201)
+++++++++++++++++++++++++

As discussed earlier, Fortran rule `R201` is the 'top level' Fortran
rule. There are no other rules that reference rule `R201`. The rule
looks like this::

    R201 program is program-unit
                    [ program-unit ] ...

which specifies that a Fortran program can consist of one or more program
units. Note, the above rule does not capture the fact that it is valid
to have an arbitrary number of comments before the first program-unit,
inbetween program-units and after the final program-unit.

As the above rule is not a simple choice between different rules a
static `match` method is required for the associated fparser2
`Program` class.

As discussed earlier there are a number of base classes implemented to
support matching certain types of pattern in a rule. The obvious one
to use here would be `BlockBase` as it supports a compulsory first
class, an arbitrary number of optional intermediate classes (provided
as a list) and a final class. Therefore, subclassing `BlockBase` and
setting the first class to `Program_Unit`, the intermediate classes to
`[Program_Unit]`, and the final class to `None` would seem to perform
the required functionality (and this was how it was implemented in
earlier versions of fparser2).

However, there is a problem using `BlockBase`. In the case where there
is no final class (which is the situation here) it is valid for the
first class to match and for an optional class to **fail** to
match. This is not the required behaviour for the `Program` class as, if an
optional `Program_Unit` exists then it must be a valid `Program_Unit`
or the code is invalid. For example, the following code is invalid as
there is a misspelling of `subroutine`::

    program test
    end
    subroutin broken
    end

To implement the required functionality for the `Program` class, the
static `match` method is written manually. A `while` loop is used to
ensure that there is no match if any `Program_Unit` is invalid.

There are also two contraints that must be adhered to by the `Program`
class:

1) Only one program unit may be a main program
2) Any name used by a program-unit (e.g. program fred) must be
   distinct from names used in other program-units.

At the moment neither of these two contraints are enforced in
fparser2. Therefore two xfailing tests `test_one_main1` and
`test_multiple_error1` have been added to the
`tests/fortran2003/test_program_r201.py` file to demonstrate these
limitations.

Further, in Fortran the `program` declaration is actually
optional. For example, the following is a valid (minimal) Fortran
program::

    end

fparser2 does not support the above syntax in its `Program_Unit`
class. Therefore as a workaround, a separate `Program_Unit0` class has
been implemented and added as a final test to the `Program` match
method. This does make use of `BlockBase` to match and therefore
requires the `Program` class to subclass `BlockBase`.

.. note::
   
   It would be much better if `Program_Unit` was coded to support
   optional program declarations and this option should be
   investigated.

The current implementation also has a limitation in that
multiple program-units with one of them not having a program
declaration are not supported. The xfailing test
`test_missing_prog_multi` has been added to the
`tests/fortran2003/test_program_r201.py` file to demonstrate this
limitation.

A final issue is that the line numbers and line information output is
incorrect in certain cases where there is a syntax error in the code
and there are 5 spaces before a statement. The xfailing tests
`test_single2` and `test_single3` have been added to the
`tests/fortran2003/test_program_r201.py` file to demonstrate this
error.

.. _program-unit-class:

Program_Unit Class (rule R202)
++++++++++++++++++++++++++++++

Fortran2003 rule `r202` is specified as
::

    R202 program-unit is main-program
                         or external-subprogram
                         or module
                         or block-data

As the above rule is a simple choice between different rules, the
appropriate matching code is already implemented in one of the base
classes (`Base`) and therefore does not need to be written.  Instead,
the rules on the right hand side can be provided as **strings** in the
`subclass_names` list. The `use_names` list should be empty and the
`tostr` method is not required (as there is no text to output because
this rule is simply used to decide what other rules to use).

.. note::

    it is currently unclear when to use `subclass_names` and when to use
    `use_names`. At the moment the pragmatic suggestion is to follow the
    way it is currently done.

Therefore to implement rule `R202` the following needs to
be specified
::
   
    class Program_Unit(Base):  # R202
        ''' <description> '''
        subclass_names = ['Comment', 'Main_Program', 'External_Subprogram',
                          'Module', 'Block_Data']

In this way fparser2 captures the `R202` rule hierarchy in its
`Program_Unit` class.

.. _exceptions:

Exceptions
++++++++++

There are 7 types of exception raised in fparser2: `NoMatchError`,
`FortranSyntaxError`, `ValueError`, `InternalError`, `AssertionError` and
`NotImplementedError`.

A baseclass `FparserException` is included which `NoMatchError`,
`FortranSyntaxError` and `InternalError` subclass. The reason for this
is to allow external tools to more simply manage fparser if it is used
as a library.

Each of the exceptions are now discussed in turn.

`NoMatchError` can be raised by a class when the text it is given does
not match the pattern for the class. A class can also return an empty
return value to indicate no match. It is currently unclear when it is
appropriate to do one or the other.

`NoMatchError` (or an empty return value) does not necessarily mean that
the text is invalid, just that the text does not match this class. For
example, it may be that some text should match one of a set of
rules. In this case all rules would fail to match except one. It is
only invalid text if none of the possible rules match.

Usually `NoMatchError` is raised by a class with no textual information
(a string provided as an argument to the exception), as textual
information is not required. When textual information is provided this
is ignored.

.. note::

   `NoMatchError` is the place where we can get context-specific
   information about a syntax error. The problem is that there are
   typically many `NoMatchError`s associated with invalid code. The
   reason for this is that every (relevant) rule needs to be matched
   with the associated invalid code. Each of these will return a
   `NoMatchError`. One option would be to always return
   context-specific information from `NoMatchError` and somehow
   aggregate this information until it is known that there is a syntax
   error. At this point a `FortranSyntaxError` is raised and the
   aggregated messages could be used to determine the correct
   message(s) to return. As a simple example, imagine parsing the
   following code: `us mymodule`.  This is probably meant to mean `use
   mymodule`. The associated rule might return a `NoMatchError` saying
   something like `use not found`. However, there might be a missing
   `=` and it could be that an assignment would would also return a
   `NoMatchError` saying something like `invalid assignment`. It is
   unclear which was the programmers intention. In general, it is
   probable that the further into a rule one gets the more likely it
   is a syntax error for that rule, so it may be possible to prune out
   many `NoMatchError`s. There may even be some rule about this
   i.e. if a hierarchy of rules is matched to a certain depth then it
   must be a syntax error associated with this rule. However, in
   general it will not be possible to prune `NoMatchError`s down to one.
   The first step could be to return context information from
   `NoMatchError` for all failures to match and then look at whether
   there is an obvious way to prune these when raising a
   `FortranSyntaxError`.

.. note::

   Need to add an explanation about when `NoMatchError` exceptions are
   used and when a null return is used.

A `FortranSyntaxError` exception should be raised if the parser does
not recognise the syntax. `FortranSyntaxError` takes two
arguments. The first argument is a reader object which allows the line
number and text of the line in question to be output. The second
argument is text which can be used to give details of the error.

Currently the main use of `FortranSyntaxError` is to catch either an
`InternalSyntaxError` exception or the final `NoMatchError` exception
and re-raise it with line number and the text of the line to be
output. These exceptions are caught and re-raised by overriding the
`Base` class `__new__` method in the top level `Program` class. A
limitation of the `NoMatchError` exception (but not the
`InternalSyntaxError` exception) is that it is not able to give any
details of the error, as it knows nothing about which rules failed to
match.

`FortranSyntaxError` should also be used when it is known that there
is a match, the match has a syntax error and the line number
information is available via the reader object. One issue is that when
`FortranSyntaxError` is raised from such a location, the `fparser2.py`
script may not be able to use the reader's fifo buffer to extract
position information. In this case, position information is not
provided in the output. It is possible that if the lines were pushed
back into the buffer in the parser code then this problem would not
occur.

.. note::

   more information about the error could be determined by inspecting
   the FortranReader object. In particular, a match can be over a
   number of lines and the first line could be returned as well as the
   last. At the moment the last line and the line number are returned.

An `InternalSyntaxError` exception should be raised when it is known
that there is a match and that a syntax error has occured but it is
not possible to use the `FortranSyntaxError` exception as the line
number information is not known (typically because the match is part
of a line rather than a full line so the input to the associated match
method is a string not a reader object). As mentioned earlier, this
exception is subsequently picked up and re-raised as a
`FortranSyntaxError` exception with line number information added.
   
A `ValueError` exception is raised if an invalid standard is passed to
the `create` method of the `ParserFactory` class.

An `InternalError` exception is raised when an unexpected condition is
found. Such errors currently specify where there error was, why it
happened and request that the authors are contacted.

.. note::
   
   An additional future idea would be to also wrap the whole code with
   a general exception handler which subsequently raised an
   InternalError. This would catch any additional unforseen errors
   e.g. errors due to the wrong type of data being passed. One
   implementation would be to have this as the the only place an
   InternalError is raised, however, it is considered better to check
   for exceptions where they might happen e.g. a dangling else clause,
   as appropriate contextual information can be given in the
   associated error message.

.. note::

   Information needs to be added about the use of
   `NotImplementedError` and `AssertionError` and/or the code needs to
   be modified. These exceptions come from pre-existing code and it is
   likely that we would want to remove the `AssertionError` from
   fparser. There has also been discussion about using a logger for
   messages, however, there are currently no known situations where it
   makes sense to output messages.

Object Hierarchy
++++++++++++++++

Fortran code is parsed by creating the `Program` object with a
`FortranReader` object as its argument. If the code is parsed
successfully then a hierarchy of objects is returned associated with
the structure of the original code. For example::

    >>> from fparser.common.readfortran import FortranStringReader
    >>> code = "program test\nend"
    >>> reader = FortranStringReader(code)
    >>> ast = parser_f2003(reader)
    >>> ast
    Program(Main_Program(Program_Stmt('PROGRAM', Name('test')), End_Program_Stmt('PROGRAM', None)))

Therefore the above example creates a `Program` object, which contains
a `Main_Program` object. The `Main_Program` object contains a
`Program_Stmt` object followed by an `End_Program_Stmt` object. The
`Program_Stmt` object contains the `PROGRAM` text and a `Name`
object. The `Name` object contains the name of the program
i.e. `test`. The `End_Program_Stmt` object contains the `PROGRAM` text
and a `None` for the name as it is not supplied in the original code.

As one might expect, the object hierarchy adheres to the Fortran rule
hierarchy presented in the associated Fortran specification document
(as each class implements a rule). If one were to manually follow the
rules in the specification document to confirm a code was compliant
and write down the rules visited on a piece of paper in a hierarchical
manner (i.e. also write down which rules triggered subsequent rules)
then there would be a one-to-one correspondance between the rules and
rule hierarchy written on paper and the objects and object hierarchy
returned by fparser2.

Extensions
++++++++++

Compilers often support extensions to the Fortran standard. fparser2
also does this in certain cases. The suggested way to support this in
fparser2 is to add an appropriate name to the `EXTENSIONS` list in
`utils.py` and then support this extension in the appropriate class if
the name is found in the `EXTENSIONS` list. This will allow this list
to be modified in the future (e.g. a `-std` option could force the
compiler to throw out any non-standard Fortran).

.. note::

   A number of extensions do not currently follow this convention and
   are always supported in fparser2 (e.g. support for `$` in
   names). At some point these need to be modified to use the new
   approach. Eventually, the concept of extensions is expected to be
   implemented as a configuration file rather than a static list.

Include files
+++++++++++++

fparser has been extended to support include files as part of the
Fortran syntax. This has been implemented in two new classes
`fparser.two.Fortran2003.Include_Stmt` and
`fparser.two.Fortran2003.Include_Filename`. This allows fparser to
parse code with unresolved include files.

The filename matching pattern implemented in fparser is that the
filename must start with a non-space character and end with a
non-space character. This is purposely a very loose restriction
because many characters can be used in filenames and different
characters may be valid in different operating systems. Note that
whilst the term filename is used here it can be a filepath.

The include statement rule is added to the start of the `BlockBase`
match method by integrating it with the `comments` rule in the
`add_c_and_i()` function. This means that any includes before a
BlockBase will be matched.

The include statement rule is also added to the subclasses to match in
the `BlockBase` match method by simply appending it to the existing
subclasses (the valid classes between the start and end classes) in
the same way that the Comments class is added. This means that any
includes within a `BlockBase` will be matched.

All Fortran rules that are responsible for matching whole line
statements (apart from the top level Program rule R201) make use of
the `BlockBase` match method. Therefore by adding support for includes
at the beginning and within a BlockBase class we support includes at
all possible locations (apart from after the very last statement).

The top level Program rule R201 supports includes at the level of
multiple program units by again making use of the `add_c_and_i()`
function before any 'program units', between 'program units' and after
any 'program units'. This completes all valid locations for include
statements, including the missing last statement mentioned in the
previous paragraph.

Preprocessing Directives
++++++++++++++++++++++++

fparser2 retains preprocessing directives as nodes in the parse tree
but does not interpret them. This has been implemented in
`C99Preprocessor.py` as a number of classes that have names with the
prefix `Cpp_`. This allows fparser2 to parse code successfully that
contains preprocessing directives but reduces to valid Fortran if the
directives are omitted.

Similarly to comments, the readers represent preprocessing directives
by a dedicated class `CppDirective`, which is a subclass of `Line`.
This allows directives to be detected early and matches to be limited
to source lines that are instances of `CppDirective`. Matching of directives
is performed in the same place as include statements to make sure that they
are recognized at all locations in a source file.

Most directives are implemented as subclasses of `WORDClsBase` or
`StringBase` (with the only exceptions being macro definition and
null directive).

Conditional inclusion directives (`#if...[#elif...]...#endif` or their
variants `#ifdef`/`#ifndef`) are represented as individual nodes by
classes `fparser.two.C99Preprocessor.Cpp_If_Stmt`,
`fparser.two.C99Preprocessor.Cpp_Elif_Stmt`,
`fparser.two.C99Preprocessor.Cpp_Else_Stmt`, and
`fparser.two.C99Preprocessor.Cpp_Endif_Stmt` but
currently not grouped together in any way since directives can appear
at any point in a file and thus the span of conditional inclusions may
be orthogonal to a Fortran block. In `#if(n)def` directives the
identifier is matched using
`fparser.two.C99Preprocessor.Cpp_Macro_Identifier`
and may contain only letters and underscore. In `#if` or `#elif`
directives the constant expression is matched very loosely by
`fparser.two.C99Preprocessor.Cpp_Pp_Tokens`
which accepts any non-empty string.

Include directives (`#include`) are handled similarly to Fortran
include statements with the matching of filenames being done by the
same class and therefore with the same (loose) restrictions.

Directives that define macro replacements (`#define`) contain a
macro identifier that is matched using `Cpp_Macro_Identifier`.
This is followed by an optional identifier list in parentheses
(and without white space separating identifier and opening
parenthesis) that defines parameters to the macro for use in the
replacement expression. The identifier list is matched by
`fparser.two.C99Preprocessor.Cpp_Macro_Identifier_List`
which, however, does not treat individual identifiers as separate
names but matches the entire list as a single string.
The replacement expression is matched and represented as
`Cpp_Pp_Tokens`.

The matching of `#undef` statements is implemented in class
`fparser.two.C99Preprocessor.Cpp_Undef_Stmt` with the identifier again
matched by `Cpp_Macro_Identifier`.

Directives `#line`, `#error`, and `#warning` are implemented in classes
`fparser.two.C99Preprocessor.Cpp_Line_Stmt`,
`fparser.two.C99Preprocessor.Cpp_Error_Stmt`, and
`fparser.two.C99Preprocessor.Cpp_Warning_Stmt` with the corresponding
right hand sides matched by `Cpp_Pp_Tokens`.

A single preprocessing directive token `#` without any directive is
a null statement and is matched by
`fparser.two.C99Preprocessor.Cpp_Null_Stmt`.

Utils
+++++

fparser2 includes a `utils.py` file. This file contains the base
classes (discussed in the :ref:`base-classes` section), the
fparser2-specific exceptions (discussed in the :ref:`exceptions`
section), a list of extensions (see previous section) and a tree-walk
utility that can be used to traverse the AST produced by fparser2 for
a valid Fortran program.

.. note::

   the tree-walk utility currently fails if the parent node of the
   tree is provided. The solution is to provide the parent's
   children. This should be fixed at some point.


.. skip
   # Constraints
   # +++++++++++
   # TBD
   # Comment Class
   # +++++++++++++
   # TBD

.. _tokenisation:

Tokenisation
++++++++++++

In order to simplify the problem of parsing code containing
potentially complex expressions, fparser2 performs some limited
tokenisation of a string before proceeding to attempt to match it.
Currently, this tokenisation replaces three different types of quantity with
simple names:

 1. the content of strings;
 2. expressions in parentheses;
 3. literal constants involving exponents (e.g. ``1.0d-3``)

This tokenisation is performed by the `string_replace_map` function:

.. autofunction:: fparser.common.splitline.string_replace_map

In turn, this function uses `splitquote` and `splitparen` (in the same
module) to split a supplied string into quanties within quotes or
parentheses, respectively. The matching for literal constants involving
exponents is implemented using a regular expression.

`string_replace_map` is used in the `match()` method of many of the classes
that implement the various language rules. Note that the tokenisation must
be undone before passing a given string on to a child class (or returning
it). This is performed using the reverse-map that `string_replace_map`
returns, e.g.::

    line, repmap = string_replace_map(string)
    ...
    type_spec = Declaration_Type_Spec(repmap(line[:i].rstrip()))

(The reverse map is an instance of `fparser.common.splitline.StringReplaceDict`
which subclasses`dict` and makes it callable.)

   
Expression matching
+++++++++++++++++++

The Fortran2003 rules specify a hierarchy of expressions (specified in
levels). In summary::

    R722 expr is [ expr defined-binary-op ] level-5-expr
    R717 level-5-expr is [ level-5-expr equiv-op ] equiv-operand
    R716 equiv-operand is [ equiv-operand or-op ] or-operand
    R715 or-operand is [ or-operand and-op ] and-operand
    R714 and-operand is [ not-op ] level-4-expr
    R712 level-4-expr is [ level-3-expr rel-op ] level-3-expr    
    R710 level-3-expr is [ level-3-expr concat-op ] level-2-expr
    R706 level-2-expr is [[level-2-expr] add_op ] add-operand
    R705 add-operand is [ add-operand mult-op ] mult-operand
    R704 mult-operand is level-1-expr [ power-op mult-operand ]
    R702 level-1-expr is [ defined-unary-op ] primary

As can hopefully be seen, the "top level" rule is `expr`, this depends
on a `level-5_expr`, which depends on an `equiv-operand` and so on in
a hierarchy in the order listed.

Fparser2 naturally follows this hierarchy, attempting to match in the
order specified. This works well apart from one case, which is the
matching of a Level-2 expression::

    R706 level-2-expr is [[level-2-expr] add_op ] add-operand

The problem is to do with falsely matching an exponent in a
literal. Take the following example::

    a - 1.0e-1

When searching for a match, the following pattern is a valid candidate
and will be the candidate used in fparser2 as fparser2 matches from the
right hand side of a string by default::

    level-2-expr = "a - 1.0e"
    add-op = "-"
    add-operand = "1"

As expected, this would fail to match, due to the level-2 expression
("a - 1.0e") being invalid. However, once R706 failed to match it
would not be called again as fparser2 follows the rule hierarchy
mentioned earlier. Therefore fparser2 would fail to match this string.

To solve this problem, fparser2 performs limited tokenisation of a string
before attempting to perform a match. Amongst other things, this tokenisation
replaces any numerical constants containing exponents with simple symbols
(see :ref:`tokenisation` for more details). For the example above this means
that the code being matched would now look like::

    a - F2PY_REAL_CONSTANT_1_

which is readily matched as a level-2 expression.

Continuous Integration
----------------------

GitHub Actions are used to run the test suite for a number of different
Python versions and the coverage reports are uploaded automatically to CodeCov
(https://codecov.io/gh/stfc/fparser). The configuration for this is in the
`.github/workflows/unit-tests.yml` file.

Black Formatting
++++++++++++++++

A second job within the GitHub Action is used to check that all of the
code conforms to Black (https://black.readthedocs.io) formatting. It
is up to the developer to ensure that this passes (e.g. by running
`black` locally and committing the results).

The formatting choices made by Black are influenced by the version of Python
being used. Therefore it is recommended that a developer use the version of
Python that is specfied for the `Black` job within the yml configuration
file mentioned above. (This will normally be the most recent, stable version
of Python.)

Note that while it is technically possibly to have the Action
actually make the changes and commit them, this was found to break
the Github review process since the automated commit is not permitted to
trigger further Actions. This then leaves GitHub thinking that the
various checks have not run.

Automatic Packaging
-------------------

A GitHub Action (https://github.com/pypa/gh-action-pypi-publish)
is also used to automate the process of uploading a new
release of fparser to the Python Package Index (pypi). This action is
configured in the `.github/workflows/python_publish.yml` file and is
triggered by the creation of a new release on GitHub.

Test Fixtures
-------------

Various pytest fixtures
(https://docs.pytest.org/en/stable/fixture.html) are provided so as to
aid in the mock-up of a suitable environment in which to run
tests. These are defined in `two/tests/conftest.py`:

=================== ======================= ===================================
Name                Returns                 Purpose
=================== ======================= ===================================
f2003_create        --                      Sets-up the class hierarchy for the
                                            Fortran2003 parser.
f2003_parser        `Fortran2003.Program`   Sets-up the class hierarchy for the
                                            Fortran2003 parser and returns the
                                            top-level Program object.
clear_symbol_table  --                      Removes all stored symbol tables.
fake_symbol_table   --                      Creates a fake scoping region and
                                            associated symbol table.
=================== ======================= ===================================


Performance Benchmark
---------------------

The fparser scripts folder contains a benchmarking script to assess the
performance of the parser by generating a synthetic Fortran file with
multiple subroutines and the associated subroutine calls. It can be executed
with the following command::

    ./src/fparser/scripts/fparser2_bench.py