history | edit

The design principles used in the functional-perl library

1. General

1.1. Be properly functional first.

1.2. Try to limit dependencies if sensible.

1.3. Generally provide functionality both as functions and methods.

1.4. Use of *foo vs \&foo

1.5. Naming conventions

1.6. Name spaces

1.7. Error handling

2. Purity

2.1. Use FP::Abstract::Pure as base class for (in principle) immutable objects

3. Lazyness

4. Various / "Hacking"

1. General

1.1. Be properly functional first.

As already mentioned in the introduction on the howto page, the modules are built using the functional paradigm from the ground up (as much as makes sense; e.g. iterations in simple functions are often written as loops instead of tail recursion¹). A sequences API to build alternative implementations (like iterator based, or optimizing away intermediate results) might be added in the future.

¹ But this is mainly done just because it's (currently) faster, and since currently Perl does not offer first-class continuations. Avoiding loop syntax and using function calls everwhere makes it possible to suspend and resume execution arbitrarily in a language like Scheme, without mutation getting in the way; but this doesn't apply to current Perl 5.

1.2. Try to limit dependencies if sensible.

E.g. avoiding the use of Sub::Call::Tail, Method::Signatures, MooseX::MultiMethods or autobox in the core modules. (Some tests, examples and Htmlgen use them.) Declaring dependencies in FunctionalPerl::Dependencies so that tests can skip modules with such dependencies.

1.3. Generally provide functionality both as functions and methods.

NOTE: since this was written, the method call based style has become the primary way to provide functionality, and function based access is spotty now. TODO: rewrite this section. BTW, providing functions would be good but provide generic wrappers (which also work for builtin types which can't have methods)--TODO.

The sequence processing functions use the argument order conventions from functional programming languages (Scheme, Ocaml, Haskell). The methods move the sequence argument to the object position.

For example, both

list_map *inc, list (1,3,4)

and

list (1,3,4)->map (\&inc)

result in the same choice of algorithm. The shorter method name is possible thanks to the dispatch on the type of the object. Compare to:

stream_map *inc, array_to_stream ([1,3,4])

or the corresponding

array_to_stream ([1,3,4])->map (\&inc)

which shows that there's no need to specify the kind of sequence when using method syntax.

This actually needed an implementation trick: streams are just lazily computed linked lists, hence the object on which the map method is being called is just a generic promise. The promise could return anything upon evaluation, not just a list pair. Thus it can't be known what map implementation to call without evaluating the promise. After evaluation, it's just a pair, though, at which point it can't be known whether to call the list_map or stream_map implementation. So how it works is that promises have a catch-all (AUTOLOAD), which forces evaluation, and then looks for a method with a stream_ prefix first (which will find the stream_map method in this example). If that fails, it will call the original method name on the forced value.

So the way to make it work both for lazily and eagerly computed pairs is to put both a map and a stream_map method into the FP::List::List namespace (which is the parent class of FP::List::Pair and FP::List::Null). When the pair was provided lazily, the above implementation will dispatch to stream_map, which normally makes sense since the user will want a lazy result from a lazy input.

Note that this dispatch mechanism is only run for the first pair of the list; afterwards, the code stays in either list_map or stream_map(*). This means that prepending a value to a stream makes the non-lazy map implementation be used:

cons (0, array_to_stream [1,3,4])->map (\&inc)

returns an eagerly evaluated list, not a stream. If that's not what you want, you can still prefix the method name with stream_ yourself to force the lazy variant:

cons (0, array_to_stream [1,3,4])->stream_map (\&inc)

returns a stream.

(*) Question: should the dispatch really happen for each cell? Then the eager part of a mixed list/stream would still be mapped eagerly, and the lazy part lazily. (TODO: measure the overhead.)

NOTE: providing both functions and methods makes things more complicated. The reason it was done so far is rather accidental, as originally only functions were provided. Some functions like car and cons are now wrappers that actually do method calls if they can. cons still needs to remain a function because it doesn't necessarily receive an object as its rest argument. TODO: figure out whether to continue providing functions, perhaps reduce the offer to those strictly needed and otherwise request the user to build them on the fly using the_method? Or figure out a way to generate them for whole packages easily. The second reason other than the need to use the_method is that the functions can take arguments in the same order as traditional functional programming languages (the object does not need to come first, and with multiple objects it can be unclear which to use as the one to dispatch on).

Idea: use Class::Multimethods or Class::Multimethods::Pure or MooseX::MultiMethods to provide multimethods as alternative to methods; this would allow to retain the traditional argument positions and still use short names. (Perhaps look at Clojure as an example?)

1.4. Use of `*foo` vs `\&foo`

Both of these work for passing a subroutine as a value, with the following differences:

The code reference (\&foo):

is the same type of data as what the expression sub { .. } returns, and hence what's most often teached.
clearly only ever represents a subroutine, whereas *foo is ambiguous and can point to any type: the named package entries for subroutines, IO handles, scalars, arrays, hashes, plus any other kind of object by way of scalars.
serialization to bytes is problematic (can only be done using complex modules and only for a limited range of Perl code, and includes serializing the whole code of the subroutine)
can be used as a value in lexical variables as arguments to goto even without using a & prefix, as in my $f = \&foo; goto $f

The glob (*foo):

looks arguably visually cleaner, and may be easier to type
later redefinitions to the subroutine it points to are being reflected (as it points to the subroutine indirectly by name)
can be serialized easily (as it's just a name)
nicer for debugging when not using show from FP::Show or other ways using introspection, as one can directly see the subroutine package and name, not just an anonymous code ref,
this code fails: my $f = *foo; goto $f. But this still works: my $f = *foo; goto &$f. (Sub::Call::Tail's tail is fine.)
there are no builtin perl checks for the wrong type, i.e. passing *foo where an array reference is expected will silently access the @foo package variable, even if it was never declared (empty in this case), while passing \&foo would have the interpreter point out the error.

Quick benchmarking of subroutine calls of the two variants did not detect a performance difference.

FP::Predicate's is_procedure accepts globs if they contain a value in the CODE slot, i.e. it adapts its meaning to "can represent a subroutine". (But Todo: should it return true for any other callable (overloaded object) as well? (How can the latter be implemented, by way of checking for a '(&' method?))

Earlier versions of FunctionalPerl suggested to use globs. Due to globs being a usually rarely used feature in Perl, and the possibility for its mutation on a distance, it does not recommend it any longer, and all code and examples have been changed to use \& instead.

1.5. Naming conventions

Functions names are generally choosen to prefer the naming in "established" functional languages. For example the filtering methods on FP::Abstract::Sequence are called filter, not grep (likewise for the type-prefixed functions like list_filter in FP::List, array_filter in FP::Array, etc.). Or as an other example, FP::Abstract::Sequence also defines fold and fold_right (see Fold (higher-order function)) according to traditional functional languages, in addition to reduce which has a slightly different API and is close to Clojure's reduce.
Function names start with the data type that they are made for; for example array_map versus list_map. (This follows the conventions in Scheme (and some other functional languages?).) Of course method (and multimethod) names don't need to, and shouldn't, carry the name of the data type. (The stream_ prefix in method names already mentioned above is an exception: it's to explicitely choose, and also not really a type choice but an evaluation strategy choice.)
Predicates (functions that check whether a value fulfills a type or other requirement (or in general return a boolean?)) start with is_; but if they only work for a particular data type, the put the is after the type name (something like array_is_pure).
Data conversion functions are now named with _to_ (previously with 2), e.g. array_to_list. This follows the convention in Scheme (except -> is used there instead of the _to_), but not that of Ocaml, where such functions are called e.g. list_of_array. Method names for the same omit both the source type name and the _to_ (e.g. ->array).
The maybe_ prefix is used for variables and functions which bind or return undef as indication for the absence of a value. The perhaps_ prefix is used for functions which return () as indication for the absence of a value. possibly_ is used for functions which might return an argument unchanged (i.e. not do anything). Also, functions always return exactly one value unless they have a perhaps_ prefix or a name that indicates plural and there's a good reason that the values are not returned in an array or linked list instead; this is to reduce the risk of accidental argument misalignment in function calls that have function calls as subexpressions.
See FP::Optional for more on this.
Since handling arrays and hashes by reference is the normal way of working functionally (see howto (References and mutation..) for why), naming things array and hash is preferred over arrayref and hashref. (_ref is used in names of functions/methods to access fields in data structures (e.g. a function that takes an array and an index and returns $array->[$index] would be called array_ref).)
Functions that compose several other functions into one for efficiency are named from the names of the functions it could be composed of with "__" as separator:
```
array_reverse (array_map ($f2, array_filter ($f1, $a)))
```
becomes
```
array_reverse__map__filter ($f2, $f1, $a)
```
and
```
$a->filter ($f1)->map ($f2)->reverse
```
becomes
```
$a->reverse__map__filter ($f2, $f1)
```
(Todo: should the order in the method case be reversed? (i.e. $a->filter__map__reverse ($f1,$f2)) A reason against it is that searching for the base function name will find both cases.)
Functional setters (those which leave their arguments unmodified, i.e. for persistent data structures) end with _set instead of starting with set_ as is common in the imperative world. (This is consistent with the Scheme naming conventions (first the type, then the field name, then the operation), and hints that it's different from imperative code.)
Functions and methods ending in an underscore are used to indicate those taking key => value parameter pairs (as in the generated constructor methods and functions from FP::Struct), or which are curried, i.e. will return a parametrized function (e.g. left_associate_ and right_associate_ from FP::Combinators2) (this latter naming is experimental, is there a better idea?).
Procedures and methods which are not safe, i.e. can lead to delayed failures instead of reporting an exception right away or lead to other violations of the intended behaviour, are prefixed with (or contain the string) "unsafe_". This allows to find their usage easily using grep or similar. Examples are functions that access array or hash fields in their arguments without verifying their type, or constructors that reuse a mutable data structure passed as argument and return it as ostensibly pure object.

1.6. Name spaces

Perl using namespaces both to name types (classes) and to hold functions is somewhat unfortunate. The confusion is well known to the Perl programmer (imported or locally defined functions may accidentally shadow methods in parent classes). But they are conflicting more because of two aims followed in the Functional Perl Project:

Functions in class scopes show up in tab completion in FP::Repl, since there's no way to know whether it's functions or methods. One solution to this, which is used in this project, is namespace cleaning: deleting functions from the package after the package definition has been read. FP::Struct automatically does this, and Chj::NamespaceCleanAbove can be used in classes not using FP::Struct. (There are other packages on CPAN which do the same.)
Constructor functions fit well with functional programming (they can easily be passed as function values), and provide better ergonomy in many cases (for example the list or purearray functions). But the natural way to request a type including its constructor is say FP::PureArray, but if FP::PureArray is used as the class name for the objects then where should the objects live? use FP::PureArray calls the import method on this package, so it must be the one holding the constructor functions. Namespace cleaning doesn't work here as exporting would fail. The solution that this project has started to use (TODO: do it consistently) is to always use nested namespaces for the actual classes; in this example, FP::_::PureArray. The underscore is used in cases where such a sub-namespace is "artificually" introduced. Basically it's "the type" defined by that package. (Some packages use subpackages anyway because they are multiple, e.g. FP::List uses FP::List::Pair, FP::List::Null, but also a common base class FP::List::List.)
Where it is beneficial, unit tests written using Chj::TEST are usually within the modules implementing a feature; but where they aren't, because it's too many test cases and they clutter up more than they document, they are in a separate module file, with ::t appended to the name of the module they are testing. E.g. tests for Foo/Bar.pm are in Foo/Bar/t.pm in such a case. (Integration tests are scripts in t/ as usual.)

1.7. Error handling

Type safety is important and helpful for constructing correct programs. For example ref methods that take an index do not accept negative numbers (to mean "from the end"). (Offer a circular_ref method instead.)

Early error reporting is useful because it makes debugging easier. For example ref methods do not return undef for invalid indices, instead they throw exceptions.

There's also the approach of using error values; FP::Failure is a start (TODO: perhaps provide FP::Result, FP::Maybe). In general though using exceptions is fine since native to Perl, and fast, not planning to replace this.

2. Purity

NOTE: current Perl versions support immutability, and using it has been enabled in some of the modules; TODO: make this support complete, and rewrite this section.

Perl does not have a compile time type checker to guarantee (sub-)programs to be purely functional like e.g. Haskell does, but programs could still enforce checks at run time.

The FP libraries do not currently enforce purity anywhere, it just does not offer mutators (except for array or hash assignment to the object fields). It helps the user writing pure programs, but does not enforce it. This works well for projects written by single developers or perhaps also small teams, where you know which subroutines and methos are pure by way of remembering or naming convention, or where checking is quick. But in bigger teams it might be useful to be able to get guarantees by machine instead of just by trust. Thus it is an aim of this project to try to provide for optional runtime enforcement of purity (in the future).

2.1. Use `FP::Abstract::Pure` as base class for (in principle) immutable objects

And let is_pure from FP::Predicates return true for all immutable data types (even if they are not blessed references.) (is_pure_object will only return true for actual objects.)

The idea is to be able to assert easily that an algorithm can rely on some piece of data not changing.

(Currently) the rule is that a data structure is considered immutable if it doesn't provide an exported function, method, or tie interface to mutate it. For example mistreating list pairs by mutating them by way of relying on their implementation as arrays with two elements and mutating the array slots does not make them a(n officially) mutable object.

The libraries inheriting from FP::Abstract::Pure should try to disable such mutations from Perl code; they might be useful in some situations for debugging, though, so leaving open a back door that still allows for mutation (like using a mutator that issues a warning when run, or a global that allows to turn off mutability protection) may be a good idea. In general, mutations that are purely debugging aids (like attaching descriptive names to objects or similar) are excluded from the rule.

Algorithms that want to use mutation, even if rarely (like creating a circular linked list without going through a promise, or copying a list without using stack space or reversing twice (but copying a pure list doesn't make sense!)) must rely on mutable objects instead (like mutable pairs (todo)).

Closures can't be treated as immutable in general since their environment (lexicals visible to them) can be mutated. (Todo: provide syntax (e.g. 'purefun' keyword) that blesses closures (if manually deemed pure)? Note that should this ever be implemented, purity checks shouldn't be added too often, as e.g. passing an impure function to map is ok if the user knows what he is doing. But offering a guaranteed pure variant of map that does restrict its function argument to be pure might be useful. Instead of creating a mess of variants, something smarter like a pragma should be implemented though.)

3. Lazyness

NOTE: There is FP::TransparentLazy now. TODO: rewrite this section.

Promises created with FP::Lazy are not automatically forced when used by perl builtins (todo: should they?). Also, type predicates usually don't force them either, the exception is currently is_null, so that FP::List does not need to care about lazy code. (Perhaps this should be changed? But it can't be fully transparent anyway since e.g. ref will always return the promise namespace.)

OTOH, method calls on promises are always forcing the promise and are then delegated to the value the promise returns.

Some functions like car and cdr (first and rest) are forcing them, too (TODO: actually this is coded explicitely, but instead those functions should probably simply be defined as the_method ("car") etc., which would still force them, and be properly OO).

The current mix seems to work well, but details are still open for change.

4. Various / "Hacking"

The project is now using spaces for indentation (no tabs).

Wed, 6 Jan 2021 17:14:30 +0000

The design principles used in the functional-perl library

Contents

1. General

1.1. Be properly functional first.

1.2. Try to limit dependencies if sensible.

1.3. Generally provide functionality both as functions and methods.

1.4. Use of *foo vs \&foo

1.5. Naming conventions

1.6. Name spaces

1.7. Error handling

2. Purity

2.1. Use FP::Abstract::Pure as base class for (in principle) immutable objects

3. Lazyness

4. Various / "Hacking"

1.4. Use of `*foo` vs `\&foo`

2.1. Use `FP::Abstract::Pure` as base class for (in principle) immutable objects