Tony Marston's Blog About software development, PHP and OOP

Attributes are atrocious

Posted on 24th June 2024 by Tony Marston
Introduction
Annotations
Attributes
The problem with attributes
A simpler solution without the problems
Automatic data validation
Dealing with database changes
Conclusion
References
Comments

Introduction

I have never used attributes, or their previous incarnation called annotations, for one good reason - they are an over-complicated solution to a problem which I solved with simple code over 20 years ago. In this article I will explain how my simple solution is far superior to its modern day counterpart.

Attributes were added into PHP8 with the following description:

Attributes offer the ability to add structured, machine-readable metadata information on declarations in code: Classes, methods, functions, parameters, properties and class constants can be the target of an attribute. The metadata defined by attributes can then be inspected at runtime using the Reflection APIs. Attributes could therefore be thought of as a configuration language embedded directly into code.

So what exactly is metadata and how is it useful? Wikipedia provides the following description:

Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself.

What is an example of this metadata? When building software to manipulate the contents of a database table the software must be aware of the table's structure, how it is configured, so that it can build the queries to both read and write that data. While each table can be queried directly for its application data, its structure data, the metadata, can be obtained from the INFORMATION SCHEMA within the database.

When I developed my method of dealing with metadata I started with the premise that metadata is still data, and there are standard ways of dealing with data that can be accomplished with simple code, not the complex methods that are dreamt up by some architecture astronauts.


Annotations

This metadata was originally implemented under the name annotations which extended the docblock syntax to provide information about a variable such as the following:

/**
 * @Column(type="string", length=32, unique=true, nullable=false)
 */
protected $username;

It could also supply information regarding functions, such as:

/**
* function to add 2 numbers
* @param int default 0
* @param int default 0
* @return int default 0
*/
public function add($a = 0, $b = 0){
return $a + $b;
}

It could also be used inside a class to describe associations (relationships) with other tables, such as:

/**
 * @Entity
 * @Table(name="ecommerce_products",uniqueConstraints={@UniqueConstraint(name="search_idx", columns={"name", "email"})})
 * @ManyToMany(targetEntity="Phonenumber")
 * @JoinTable(name="users_phonenumbers",
 *      joinColumns={@JoinColumn(name="user_id", referencedColumnName="id")},
 *      inverseJoinColumns={@JoinColumn(name="phonenumber_id", referencedColumnName="id", unique=true)}
 * )
 */

These annotations were treated by PHP as simple comments, so had absolutely no effect at runtime. The use of a third-party library was required in order to extract this information from the source code into a format which enabled it to be used. It then needed some userland code in order to do something with this extracted information.


Attributes

After some discussion it was decided to add this capability into the PHP language, but under the name attributes and with a completely different syntax. Instead of @varname inside a docblock it was changed to #[attribute] where the name "attribute" is user defined. Because these attributes now had an official syntax it was possible to extract the information using the Reflection APIs which are built into PHP instead of having to use a third-party library. However, the developer still has to write some userland code in order to do something with this extracted information.


The problem with attributes

Considering the fact that I had already found a way to deal with metadata using bog standard code in the days of PHP4 I considered this approach to be w-a-a-a-y more complicated than it need be. This list of complaints I have against this features goes as follows:

  1. It completely violates the KISS principle and instead follows the KICK principle. I rephrased this into the following expression:
    A good programmer should try to achieve complex tasks using simple code, not simple tasks using complex code.

    This "solution" looks more like a Heath Robinson contraption or a Rube Goldberg machine than something which a competent programmer would create.

  2. It uses comments to affect runtime behaviour. It has been a standard feature of every programming language that during the transformation from source code to machine code that comments cannot be included simply because they cannot be transformed into instructions that the CPU can recognise. The idea that you can use comments to affect the program's behaviour at runtime can have only been dreamt up by someone who lives in Cloud cuckoo land.
  3. It has to be written into the source code. In a database application which deals with dozens, or even hundreds, of database tables it should be obvious that the metadata for each of those tables is totally unique, which would mean that each table class would have to be built by hand. Although it would be possible to share common code using that standard OO mechanism called inheritance it would then be impossible to insert custom comments into that inherited code. It is not possible to provide comments in a subclass that override comments in a superclass.
  4. It requires additional code to extract the metadata information. None of these comments can be accessed directly where they are defined, they have to be extracted using a separate process. For annotations this means using a third-party library, but for attributes this means using the Reflection APIs.
  5. It requires additional code to process the extracted metadata. Once the metadata has been extracted you will still need to write code in order to convert this information into some sort of action. All the examples which I have seen require separate code in each table class.
  6. You cannot make temporary amendments to the metadata at runtime. Having configuration parameters permanently fixed may seem like a good idea to novice programmers, but those of us who have years of experience writing large and complex applications can sometimes encounter situations where, for the duration of a particular task, you want to change that configuration. This is impossible with comments as there is no mechanism to change them except by making a permanent change to the source code.

A simpler solution without the problems

When I encountered the need to insert metadata into my table classes my solution was as follows:

  1. As I was using PHP4 the solution had to be simple using nothing more than bog standard PHP code
  2. Since metadata is just another form of data, I store it in variables, not comments.
  3. I do not include this metadata in the class definition, I store it in an external structure file which is loaded into the object's common table properties via code in the constructor. This means that the class file and the structure file can be maintained separately. This also means that I can make changes to a table's structure without having to amend the class file.
  4. I do not need any extra code to extract this data and make it accessible as it already exists in the object as a group of variables, either as strings or arrays.
  5. I do not need any extra code to process this metadata as that functionality has been built into the framework and is performed automatically.
  6. During the processing of particular tasks I can make temporary adjustments to the metadata, and these will be reset simply by reloading the contents of the structure file. This is done automatically by the framework so requires no action by the developer.

I came up with this simple procedure by using the knowledge which I had gained in the previous 20 years:

Using this knowledge I started to build a sample application which dealt with a small number of database tables in various common relationships. I wrote a class for the first database table which encapsulated all its properties and all its methods, then I wrote a family of forms to view and maintain its contents. I then duplicated this code to deal with the next database table. This resulted in duplicated code, which is a bad idea according to the Don't Repeat Yourself (DRY) principle.

This is when I decided to use inheritance to define the duplicated code in one place so that I could share it as many times as I liked. I did not make the mistake of inheriting from one concrete class to create a different concrete class, thus avoiding the problems which resulted in the principle favour composition over inheritance. I knew that the first concrete class would contain some properties or methods that I did not want to inherit in any other concrete classes, so I moved only those properties and methods which were safe to inherit to a separate generic table class. Note that I used the word "generic" as I was using PHP4 which did not support the "abstract" keyword. The PHP manual at the time said the following:

extends

Often you need classes with similar variables and functions to another existing class. In fact, it is good practice to define a generic class which can be used in all your projects and adapt this class for the needs of each of your specific projects.

It was not until over a decade later that I discovered that the practice which I had employed instinctively was the same one proposed by Ralph E. Johnson & Brian Foote in their paper Designing Reusable Classes which was published in 1988. In it they identified a practice which they called programming-by-difference which can be summarised as follows:

Abstraction is the act of separating the abstract from the concrete, the similar from the different. An abstraction is usually discovered by generalizing from a number of concrete examples. This then permits a style of programming called programming-by-difference in which the similar protocols can be moved to an abstract class and the differences can be isolated in concrete subclasses.

This means that you have to create several objects before you can look for similarities. It is simply not possible to perform an abstraction before you have written a single line of code. By separating the similar from the different you should then be able to put the similar into some sort of reusable/sharable module while isolating the different in unique modules. Following this simple idea I placed all the common table methods into an abstract table class so that I could inherit these methods into every concrete table class. Each of these classes then required the following additions in order to provide the differences:

Inserting metadata was to me as easy as falling off a log. Metadata is just data, and can be stored in variables/properties just like any other data. While building my first table class I identified the following pieces of additional information that were required:

All this data was held in a set of common table properties which, in my first table class, I populated using code within the constructor. Initially I did this by reading the CREATE TABLE script and copying the details manually into the class file, but after doing this repetitive and laborious task for the first few tables I decided to automate it by replicating what I had done 20 years earlier in COBOL and write a routine which extracted that information directly from the database's INFORMATION SCHEMA and write it to a disk file so that it could be imported into a class file. As well as saving time this would also remove the possibility of mistakes creeping in when typing in that information manually. I decided to make it a little more sophisticated than that by extracting the data into an intermediate database known as a Data Dictionary (which is my equivalent to the Application Model in UNIFACE) so that I could add more information should the need arise. This metadata is now stored in a table structure file which is separate from the table class file.

Inserting custom methods was a problem I solved by falling back yet again on my previous experience. In both UNIFACE and Visual Basic you could have code executed at certain points in the processing flow by inserting that code into certain functions known as event handlers or triggers. Each trigger with a particular name would be executed at a particular event, such as a mouse movement or a function key being pressed. I deduced from this that when the program was running it would at certain times look for a trigger with a certain name, and if one was found it would be executed. So how could I emulate this in PHP? As the main processing flow was already defined within the common table methods I found places where I wanted to interrupt this flow and inserted a call to a custom method which was also defined within the abstract class but which did absolutely nothing. By reading the PHP manual it was clear to me that if a method existed in the subclass then it overrode that method in the superclass, so all I had to do was copy the correct method from the superclass to the subclass and insert whatever code I wanted. An example of one of these customisable methods looks as follows:

function _cm_validateInsert ($fieldarray)
// perform custom validation before an insert.
// if anything is placed in $this->errors the insert will be terminated.
{
    // insert custom code here

    return $fieldarray;
}

Note that all table data is passed around in a single $fieldarray variable and not as separate properties for each column. This promotes loose coupling instead of tight coupling.

It was not until years later that I discovered that my practice of inheriting from an abstract superclass which contained customisable methods which could be overridden in a concrete subclass had been documented in the Design Patterns book as the Template Method Pattern, and that my customisable methods were known as "hook" methods.

Automatic data validation

Every programmer should know that all input from a user should be validated before it is processed. As well as checking for simple things, such as not supplying a value for a required field/column, it should check that each field should contain a value which is consistent with its datatype, such as "date", "time" or "number". The consequence of not performing this check could mean constructing a INSERT or UPDATE query which is rejected by the DBMS, thus causing the program to immediately terminate. The correct course of action is to validate all user input before it is processed so that any errors can be detected as early as possible so that it can be returned to the user with a suitable error message, thus giving the user the opportunity to correct their error before trying the operation again.

Note that some novice programmers have this misguided belief that it is not "proper OOP" to allow invalid data to exist within an object, which means that the data must be validated BEFORE it is given to the object. There is no such rule. The ONLY golden rule is that the data must be validated before it is sent to the database.

How should this data validation be performed? I have already written about a very inefficient approach in How NOT to Validate Data which shows a common method where every piece of validation requires the developer to add code to the table class. This is a common approach used by novices which an experienced developer should be able to avoid. All you need is to have available in your code a list of specifications for that table's structure so you can then write a reusable subroutine which can match each field's value in the user input with that field's specifications. It helps a great deal if all the specifications you require have been supplied as metadata, such as the $fieldspec array.

However, there is a slight problem if all your table's data is being held in separate variables, but that is a problem which I do not have. I noticed in all the code samples which I saw when I was evaluating PHP that although all the input from an HTML was presented to the script in the $_POST array that every sample, without exception, unpicked the array into its component parts so that each part could be inserted into the object one at a time using named variables. Having become aware of the power and flexibility of PHP's arrays I asked myself a simple question - Is there a good reason why I cannot pass the data around in its original array instead of breaking it into pieces? The answer is No, there is not, which is why I move data into and out of each table class using a single $fieldarray variable.

This then means that with my table class I have two arrays:

An experienced programmer should instantly see how easy it would be to write a routine to iterate through these two arrays to ensure that a value found in the $fieldarray array matches its specifications in the $fieldspec array. It must be easy as I managed to do it in 2002 when I created my validation object. If you look at the common table methods which are inherited from the abstract table class you should see how each of those methods can be applied to any table class, with its different metadata, within the application, proving that the same piece of code can be used to load() validate() and store() any data within any database table.

Dealing with database changes

Anybody who has ever had to maintain or enhance a database application should be well aware that it is sometimes necessary to make changes to a table's structure, either by adding or removing columns, or even changing a column's datatype or size. A lot of novice programmers struggle with this simply because of the effort required to make their table class aware of these changes. If they use annotations which are embedded into the source code as comments then they have to go through that source code, then find and modify each of the relevant comments.

This is where having the metadata stored outside of the class file in a separate structure file pays dividends. All I have to do is amend that structure file and Bob's your uncle.

There is also one thing better than having to amend this structure file manually, and that is being able to amend it automatically at the touch of a button (well, several buttons actually). Because I invested my time and effort in creating a Data Dictionary to automate common functions for me this process is as easy as 1-2-3:

  1. Make changes to the table's structure in the DBMS.
  2. Import the updated structure into the Data Dictionary.
  3. Export the updated structure to the file system.

Note that the export function will replace the structure file but NOT the class file if it already exists as it may have been amended to include one or more "hook" methods.


Conclusion

If I could devise such a simple and effective solution in 2002 using PHP4 why do so many programmers today insist on devising such elaborate, esoteric and convoluted solutions? It would seem that far too many of them do not have the brain power to devise simple solutions, so they go in the opposite direction and devise complex solutions just to prove how clever they are. This, to me, is a clear violation of the KISS principle and is something which novice programmers should be taught to avoid. They need to try some lateral thinking, to think outside of the box, otherwise they will keep inventing more and more complex and over engineered solutions which are the complete opposite of what they are supposed to do.

The idea that simplicity is better than complexity is one that is shared by others, as I have documented in Quotations in support of Simplicity.


References

Further information regarding deprecated annotations can be found at the following:

Further information regarding attributes can be found at the following:

These are reasons why I consider some ideas on how to do OOP "properly" to be complete rubbish:


counter