Attributes are atrocious

Posted on 24th June 2024 by Tony Marston

Amended on 17th January 2025

Introduction
Annotations
Reading DOCBLOCK data via the Reflection API
Insert DOCBLOCK data into classes
Extract DOCBLOCK data
Process DOCBLOCK metadata at run time
Attributes
The problem with attributes
A simpler solution without the problems
Automatic data extraction
Automatic data validation
Dealing with database changes
Conclusion
References
Amendment History
Comments

Introduction

I have never used attributes, or their previous incarnation called annotations, for one good reason - they are an over-complicated solution to a problem which I solved with simple code over 20 years ago. In this article I will explain how my simple solution is far superior to its modern day counterpart.

Attributes were added into PHP8 with the following description:

Attributes offer the ability to add structured, machine-readable metadata information on declarations in code: Classes, methods, functions, parameters, properties and class constants can be the target of an attribute. The metadata defined by attributes can then be inspected at runtime using the Reflection APIs. Attributes could therefore be thought of as a configuration language embedded directly into code.

So what exactly is metadata and how is it useful? Wikipedia provides the following description:

Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself.

What is an example of this metadata? In my RADICORE framework I have a separate class for each database table. As each of these classes has to support the same set of CRUD operations I have defined these operations as common table methods in an abstract table class which is then inherited by every concrete table class. This is the approach recommended in Designing Reusable Classes which was published in 1988 by Ralph E. Johnson & Brian Foote. They identified a technique which they called programming-by-difference in which the similarities are maintained in a reusable module, such as an abstract class, while the differences are maintained in unique modules, such as concrete classes.

While each table is subject to the same set of methods which can be inherited from the abstract class, each concrete table class need only contain its differences. These can be identified using metadata, which is stored in a set of common table properties (see below), or behaviour which can defined in any of the available "hook" methods.

Common Table Properties
$this->dbname	This value is defined in the class constructor. This allows the application to access tables in more than one database. It is standard practice in the RADICORE framework to have a separate database for each subsystem.
$this->tablename	This value is defined in the class constructor.
$this->fieldspec	The identifies the columns (fields) which exist in this table and their specifications (type, size, etc).
$this->primary_key	This identifies the column(s) which form the primary key. Note that this may be a compound key with more than one column. Although some modern databases allow it, it is standard practice within the RADICORE framework to disallow changes to the primary key. This is why surrogate or technical keys were invented.
$this->unique_keys	A table may have zero or more additional unique keys. These are also known as candidate keys as they could be considered as candidates for the role of primary key. Unlike the primary key these candidate keys may contain nullable columns and their values may be changed at runtime.
$this->parent_relations	This has a separate entry for each table which is the parent in a parent-child relationship with this table. This also maps foreign keys on this table to the primary key of the parent table. This array can have zero or more entries.
$this->child_relations	This has a separate entry for each table which is the child in a parent-child relationship with this table. This also maps the primary key on this table to the foreign key of the child table. This array can have zero or more entries.

Each table has its own set of values in these properties which are loaded into the table object by code within the class constructor. These values are then used by various parts of the framework at run time to assist in the way that they operate, thus avoiding the need for developers to insert custom code into any table class. Below are two different ways in which this metadata can be made available in your table classes:

Reading DOCBLOCK data via the Reflection API - a convoluted way of obtaining this metadata using annotations
A simpler solution without the problems - a simpler solution which I built in 2003 using bog standard PHP4 code.

Annotations

This metadata was originally implemented under the name annotations which extended the DOCBLOCK syntax to provide information about a variable such as the following:

/**
 * @Column(type="string", length=32, unique=true, nullable=false)
 */
protected $username;

It could also supply information regarding functions, such as:

/**
* function to add 2 numbers
* @param int default 0
* @param int default 0
* @return int default 0
*/
public function add($a = 0, $b = 0){
return $a + $b;
}

It could also be used inside a class to describe associations (relationships) with other tables, such as:

/**
 * @Entity
 * @Table(name="ecommerce_products",uniqueConstraints={@UniqueConstraint(name="search_idx", columns={"name", "email"})})
 * @ManyToMany(targetEntity="Phonenumber")
 * @JoinTable(name="users_phonenumbers",
 *      joinColumns={@JoinColumn(name="user_id", referencedColumnName="id")},
 *      inverseJoinColumns={@JoinColumn(name="phonenumber_id", referencedColumnName="id", unique=true)}
 * )
 */

These annotations were treated by PHP as simple comments, so had absolutely no effect at runtime. However, it was possible to use the Reflection API, particularly by using getDocComment(), to extract these comments so that they could be analysed and processed in some way. As I never use docblocks I was not aware of this approach. Even if I had been aware I would never have used it as I had already produced my own automated solution.

Reading DOCBLOCK data via the Reflection API

Even though I created my own solution to insert metadata into each of my table classes as long ago as 2003 I decided to test the use of annotations so that I could perform a comparison of the two techniques. The use of annotations turns out to be a four-step process:

Extract the structure of each table from the database schema.
Manually INSERT that DOCBLOCK data into your table's class file.
Create a function to EXTRACT that DOCBLOCK data using the Reflection API so that it can be inserted into proper variables where it can be processed using regular PHP code.
Have code in your framework to PROCESS that metadata as and when necessary.

Insert DOCBLOCK data into classes

As I do not use any third-party libraries to process my DOCBLOCK data I am not forced to use any particular format, so I can use whatever format which suits my needs. In order to replicate the data that already exists in my common table properties I started by creating a class file with the following docblock data:

<?php

class foobar
{
    /**
     * @dbname(dict)
     */
    var $dbname=null;

    /**
     * @tablename(dict_table)
     */
    var $tablename=null;

    /**
     * @fieldname1(keyword1=value1, keyword2=value2)
     * @fieldname2(keyword1=value1, keyword2=value2)
     * @three(type=string, nullable=n, size=40)
     * @four(type=integer, nullable=y, minvalue=0, maxvalue=99)
     */
    var $fieldspec=array();

    /**
     * @primary_key(field1, field2)
     */
    var $primary_key=array();

    /**
     * @unique_keys[](fieldname1)
     * @unique_keys[](fieldname2)
     */
    var $unique_keys=array();

    /**
     * @parent_relations[](parent=tblparent1, fields=[fldchild1=fldparent1, fldchild2=fldparent2])
     * @parent_relations[](parent=tblparent2, fields=[fldchild1=fldparent1, fldchild2=fldparent2])
     */
    var $parent_relations=array();

    /**
     * @child_relations[](child=tblchild1, type=RES, fields=[fldparent1=fldchild1, fldparent2=fldchild2])
     * @child_relations[](child=tblchild2, type=RES, fields=[fldparent1=fldchild1, fldparent2=fldchild2])
     */
    var $child_relations=array();

    // ****************************************************************************
    function __construct ()
    {
        $this->loadCommonProperties();

    } // __construct
} // end class
?>

Note that I do not have a separate property for each table column as this promotes tight coupling which is a Bad Thing ™. The specifications for each column are maintained in the $fieldspec array while all the values are passed around in the $fieldarray array. These enables validation to be performed automatically using my standard validation object.

Extracting DOCBLOCK data

In this example I am extracting the DOCBLOCK data using the loadCommonProperties() method which is called from the class constructor.

function loadCommonProperties ()
// extract docblock data for all properties within this class
{
    $class = new ReflectionClass($this);

    echo "<pre>\n";
    foreach ($class->getProperties() as $property) {
        // display the docblock data for each property
        echo $property->getName() . ': ' . var_export($property->getDocComment(), true) . PHP_EOL;
    } // foreach
    echo "</pre>\n";

    return;
} // loadCommonProperties

This will produce the following output:

dbname: '/**
     * @dbname(dict)
     */'
tablename: '/**
     * @tablename(dict_table)
     */'
fieldspec: '/**
     * @fieldname1(keyword1=value1, keyword2=value2)
     * @fieldname2(keyword1=value1, keyword2=value2)
     * @three(type=string, nullable=n, size=40)
     * @four(type=integer, nullable=y, minvalue=0, maxvalue=99)
     */'
primary_key: '/**
     * @primary_key(field1, field2)
     */'
unique_keys: '/**
     * @unique_keys[](field1)
     * @unique_keys[](field2, field3)
     */'
parent_relations: '/**
     * @parent_relations[](parent=tblparent1, fields=[fldchild1=fldparent1, fldchild2=fldparent2])
     * @parent_relations[](parent=tblparent2, fields=[fldchild1=fldparent1, fldchild2=fldparent2])
     */'
child_relations: '/**
     * @child_relations[](child=tblchild1, type=RES, fields=[fldparent1=fldchild1, fldparent2=fldchild2])
     * @child_relations[](child=tblchild2, type=RES, fields=[fldparent1=fldchild1, fldparent2=fldchild2])
     */'

In order to convert the DOCBLOCK data into proper variables which can be accessed by the normal PHP code I need to extract this data separately for each property so that I can use the relevant regular expression. I can do this with the following code:

function loadCommonProperties ()
// load common table properties with data obtained from docblocks
{
    $class = new ReflectionClass($this);

    $this->dbname = $this->refl_get_string($class, 'dbname');

    $this->tablename = $this->refl_get_string($class, 'tablename');

    $this->fieldspec = $this->refl_get_fieldspec($class);

    $this->primary_key = $this->refl_get_primary_key($class);

    $this->unique_keys = $this->refl_get_unique_keys($class);

    $this->parent_relations = $this->refl_get_relationship($class, 'parent');

    $this->child_relations = $this->refl_get_relationship($class, 'child');

    return;

} // loadCommonProperties

// ****************************************************************************
function refl_get_string($class, $name)
// extract 'value' from '@name(value)'
{
    $property = $class->getProperty($name)->getDocComment();

    $pattern = "/@$name\((?<value>\w+)\)/imsx";
    if (preg_match($pattern, $property, $regs)) {
        return trim($regs['value']);
    } // if

    return '';

} // refl_get_string

// ****************************************************************************
function refl_get_fieldspec($class)
// get the array of specifications for each column in this table.
{
    $property = $class->getProperty('fieldspec')->getDocComment();

    $pattern = "/@(?<fieldname>\w+)\((?<specs>[^\)]+)\)/imsx";

    $output = array();
    $offset = 0;
    while (preg_match($pattern, $property, $regs, PREG_OFFSET_CAPTURE, $offset)) {
        $fieldname = $regs['fieldname'][0];
        $array = $this->string2array($regs['specs'][0]);
        $output[$fieldname] = $array;
        $offset = $regs['specs'][1] + strlen($regs['specs'][0]);
    } // while

    return $output;

} // refl_get_fieldspec

// ****************************************************************************
function string2array($string)
// convert a string of 'field=value' pairs into an associative array.
// note that 'field=[value]' indicates that 'value' is a nested array.
{
    $output = array();

    $pattern = "/(?<field>\w+) = (?<value>[^,]+)/imsx";

    $offset = 0;
    while (preg_match($pattern, $string, $regs, PREG_OFFSET_CAPTURE, $offset)) {
        $field = trim($regs['field'][0]);
        $value = $regs['value'][0];
        if (substr($value, 0, 1) == '[') {
            // value starts with '[', so look for closing ']'
            $start_pos = $regs['value'][1];
            $end_pos   = stripos($string, ']', $start_pos);
            $string2   = substr($string, $start_pos, $end_pos - $start_pos);
            $array = $this->string2array($string2);
            $output[$field] = $array;
            $offset = $end_pos;
        } else {
            $output[$field$value);
            $offset = $regs['value'][1] + strlen($value);
        } // if
    } // while

    return $output;

} // string2array

// ****************************************************************************
function refl_get_primary_key($class)
// extract 'array' from @primary_key(field1, field2) - only one entry
{
    $property = $class->getProperty('primary_key')->getDocComment();

    $pattern = "/@primary_key\((?<value>\w+[, \w+]*)\)/imsx";
		
    if (preg_match($pattern, $property, $regs)) {
        $array = explode(',', $regs['value']);
        $array = array_map('trim', $array);
        return $array;
    } // if

    return array();

} // refl_get_primary_key

// ****************************************************************************
function refl_get_unique_keys($class)
// extract 'array' from @unique_keys[](field1, field2) - may be zero or more entries
{
    $property = $class->getProperty('unique_keys')->getDocComment();

    $pattern = "/@unique_keys\[\]\((?<value>\w+[, \w+]*)\)/imsx";

    $output = array();
    $offset = 0;
    while (preg_match($pattern, $property, $regs, PREG_OFFSET_CAPTURE, $offset)) {
        $array = explode(',', $regs['value'][0]);
        $array = array_map('trim', $array);
        $output[] = $array;
        $offset = $offset + $regs['value'][1];
    } // while

    return $output;

} // refl_get_unique_keys

// ****************************************************************************
function refl_get_relationship($class, $relation)
// get zero or more relationship details, each being an associative array
{
    if (!preg_match('/^parent|child$/', $relation)) {
        throw new Exception("Relation must be either 'parent' or 'child'");
    } // if

    $relation = $relation.'_relations';

    $property = $class->getProperty($relation)->getDocComment();

    $pattern = "/$relation\[\]\((?<value>[^\)]+)/imsx";

    $output = array();
    $offset = 0;
    while (preg_match($pattern, $property, $regs, PREG_OFFSET_CAPTURE, $offset)) {
        $array = $this->string2array($regs['value'][0]);
        $output[] = $array;
        $offset = $offset + $regs['value'][1];
    } // while

    return $output;

} // refl_get_relationship

Process DOCBLOCK metadata at run time

Remember that this is metadata, or "data about data", which means that it is data and not code and therefore cannot "do" anything on its own. However, there may be pieces of regular code which perform certain actions depending on what values they find in this data. In the RADICORE framework each of these variables is processed as follows:

$this->dbname	As each DBMS server can access multiple databases this identifies the default database for this connection. Without it all references to tables would have to be qualified with the database name.
$this->tablename	This identifies the default table for the generated SQL query, thus allowing "SELECT * FROM <tablename>" to retrieve data from the table which is specific to the current concrete table class.
$this->fieldspec	This will contain the specifications for each field/column in this table. This information will be used during the primary validation of each INSERT or UPDATE operation.
$this->primary_key	On an INSERT operation the framework will perform a lookup, unless the auto_increment property has been set, using the current primary key and generate an error if it already exists in the database. Note that a primary key cannot be updated.
$this->unique_keys	On an INSERT or UPDATE operation the framework will perform a lookup on each candidate key and generate an error if it already exists in the database. Note that for compound keys if any column is NULL then this lookup will not be performed as the index entry will not exist.
$this->parent_relations	This may be used to add JOIN clauses to a SELECT query, as discussed in Using Parent Relations to construct sql JOINs.
$this->child_relations	In a DELETE operation the framework will perform a different function depending on the contents of type: RES (RESTRICTED) - do not allow the parent to be deleted if any associated records on this child table exist. CAS (CASCADE) - before the parent is deleted then also delete all associated records on this child table. NUL (NULLIFY) - when the parent is deleted then all matching records on this child table will have their foreign key field(s) set to NULL.

Attributes

Even though the ability to use ReflectionClass::getDocComment() to obtain the contents of a DOCBLOCK had existed since PHP 5, the core developers decided to enhance this capability, which meant changing the syntax and giving it a new name called attributes. Instead of @varname inside a docblock it was changed to #[attribute] where the name "attribute" is user defined. However, unlike annotations where I can see how to define an array of different values for each property I cannot see how to achieve the same thing with attributes. I can create an attribute with a user-defined name, but where are the values? If I cannot see how to make the transition from annotations to attributes then I cannot see the point of using attributes at all.

When I read what the PHP manual has to say about attributes I was immediately turned off because the example provided uses an interface and I do not use object interfaces anywhere in my code. I get more mileage out of using abstract classes. When I looked at the description for ReflectionClass::getAttributes() I could not see how attributes were any better than annotations. They are both supposed to provide metadata which can be used for configuration purposes, but I cannot see how to replicate the use of DOCBLOCK metadata with attributes.

I have looked at other online articles which show how attributes could be used, but I do not understand what they are trying to do. This means that I cannot replicate their behaviour with bog standard PHP code, and without the ability to compare how they achieve whatever it is they are trying to achieve with bog standard PHP code that achieves the same thing I cannot compare the two approaches to see which is better.

Attributes are supposed to do nothing than provide metadata, just like the earlier version called annotations which uses DOCBLOCKS, yet the manual at Declaring Attribute Classes actually states that it is recommended to create an actual class for every attribute. Why on earth would I want to create a class just to hold metadata? That would be even worse that creating that anti-pattern called an anemic class.

I am reminded of this quote made in 1984 by H. Abelson and G. Sussman in The Structure and Interpretation of Computer Programs:

Programs must be written for people to read, and only incidentally for machines to execute.

When I look at a piece of code I expect to be able to quickly understand what its purpose is and how it achieves that purpose otherwise I regard it as being unmaintainable. If I have to run a piece of code in order to determine what it does then that is a bad sign, what some people call a code smell.

The problem with attributes

Considering the fact that I had already found a way to deal with metadata using bog standard code in the days of PHP4 I considered this approach to be w-a-a-a-y more complicated than it need be. This list of complaints I have against this features goes as follows:

I cannot see how to replicate the functionality of annotations, which uses DOCBLOCKs, with attributes. If I cannot see how to make the transition then I would be wasting my time trying to use them.
It completely violates the KISS principle and instead follows the KICK principle. I rephrased this into the following expression:
A good programmer should try to achieve complex tasks using simple code, not simple tasks using complex code.

This "solution" looks more like a Heath Robinson contraption or a Rube Goldberg machine than something which a competent programmer would create.
It uses comments to affect runtime behaviour. It has been a standard feature of every programming language that during the transformation from source code to machine code that comments cannot be included simply because they cannot be transformed into instructions that the CPU can recognise. The idea that you can use comments to affect the program's behaviour at runtime can have only been dreamt up by someone who lives in Cloud cuckoo land.
It has to be written into the source code for each table class. In a database application which deals with dozens, or even hundreds, of database tables it should be obvious that the metadata for each of those tables is totally unique, which would mean that each table class would have to be built by hand. Although it would be possible to share common code using that standard OO mechanism called inheritance it would then be impossible to insert custom comments into that inherited code. It is not possible to provide comments in a subclass that override comments in a superclass.
It requires additional code to extract the metadata information. None of these comments can be accessed directly where they are defined, they have to be extracted using a separate process. For annotations this means using a third-party library, but for attributes this means using the Reflection APIs.
It requires additional code to process the extracted metadata. Once the metadata has been extracted you will still need to write code in order to convert this information into some sort of action. All the examples which I have seen require separate code in each table class.
You cannot make temporary amendments to the metadata at runtime. Having configuration parameters permanently fixed may seem like a good idea to novice programmers, but those of us who have years of experience writing large and complex applications can sometimes encounter situations where, for the duration of a particular task, you want to change that configuration. This is impossible with comments as there is no mechanism to change them except by making a permanent change to the source code.

A simpler solution without the problems

When I was building my early prototype I encountered the need to have metadata available in each of my table classes, and my initial solution was to declare the variables with their hard-coded values inside the class. This meant reading the values of each table's structure with the Mark 1 Eyeball and manually transposing them into the table class. This was a slow and laborious process which was prone to error. Then I remembered a technique I had designed in the 1980s when working on my COBOL framework which involved creating a program called COPYGEN which would read the table's structure directly from the database and create a text file which could then be loaded into the program as a COPYLIB entry when it was compiled. I knew that replacing the manual process with a piece of software would make the operation much quicker and more reliable, so I decided to provide similar functionality in my PHP framework.

Automatic data extraction

I knew that it was possible to access the INFORMATION SCHEMA in any modern database to obtain the same information, so I proceeded as follows:

As I was using PHP4 the solution had to follow the KISS principle by using nothing more than bog standard PHP code, thus making it easy to understand and easy to maintain.
Instead of reading the metadata from the database and writing it directly into the table's class file I decided to follow my original COBOL design and write it to a text file which I could then import into the table's object when it was instantiated. This them meant that I could maintain the table's class file separately from the table's structure metadata.
In fact I went one stage further by storing it first in a database which I designed as part of my Data Dictionary subsystem as I knew that I would want to customise this metadata before I loaded it into the table's object. Being able to view and edit this metadata using GUI forms would be much more user-friendly than trying to edit a file using a text editor.
Within my Data Dictionary subsystem I then had separate functions for the following:
- Import the table's structure into the Data Dictionary.
- Edit the table data, the column data and the relationship data as required.
- Export the table's structure to produce a class file and a structure file.
Whenever a table class file is instantiated into an object the loadFieldSpec() method will be called to load the contents of the table structure file into the relevant object properties.

Note that the import and export functions are both initiated by pressing a button on a web page. This then replaces the first three steps in Reading DOCBLOCK Data which savesa great deal of time and the possibility of transcription errors.

Automatic data validation

Every programmer should know that all input from a user should be validated before it is processed. As well as checking for simple things, such as not supplying a value for a required field/column, it should check that each field should contain a value which is consistent with its datatype, such as "date", "time" or "number". The consequence of not performing this check could mean constructing a INSERT or UPDATE query which is rejected by the DBMS, thus causing the program to immediately terminate. The correct course of action is to validate all user input before it is processed so that any errors can be detected as early as possible so that it can be returned to the user with a suitable error message, thus giving the user the opportunity to correct their error before trying the operation again.

Note that some novice programmers have this misguided belief that it is not "proper OOP" to allow invalid data to exist within an object, which means that the data must be validated BEFORE it is given to the object. There is no such rule. The ONLY golden rule is that the data must be validated before it is sent to the database.

How should this data validation be performed? I have already written about a very inefficient approach in How NOT to Validate Data which shows a common method where every piece of validation requires the developer to add code to the table class. This is a common approach used by novices which an experienced developer should be able to avoid. All you need is to have available in your code a list of specifications for that table's structure so you can then write a reusable subroutine which can match each field's value in the user input with that field's specifications. It helps a great deal if all the specifications you require have been supplied as metadata, such as the $fieldspec array.

However, there is a slight problem if all your table's data is being held in separate variables, but that is a problem which I do not have. I noticed in all the code samples which I saw when I was evaluating PHP that although all the input from an HTML was presented to the script in the $_POST array that every sample, without exception, unpicked the array into its component parts so that each part could be inserted into the object one at a time using named variables. This to me is a shining example of tight coupling which is a practice I prefer to avoid. Having become aware of the power and flexibility of PHP's arrays I asked myself a simple question - Is there a good reason why I cannot pass the data around in its original array instead of breaking it into pieces? The answer is No, there is not, which is why I move data into and out of each table class using a single $fieldarray variable. This promotes loose coupling which is much preferred.

This then means that with my table class I have two arrays:

$fieldarray - an associative array of name => value pairs.
$fieldspec - an associative array of name => specifications (array) pairs.

An experienced programmer should instantly see how easy it would be to write a routine to iterate through these two arrays to ensure that a value found in the $fieldarray array matches its specifications in the $fieldspec array. It must be easy as I managed to do it in 2002 when I created my validation object. If you look at the common table methods which are inherited from the abstract table class you should see how each of those methods can be applied to any table class, with its different metadata, within the application, proving that the same piece of code can be used to load() validate() and store() any data within any database table.

Dealing with database changes

Anybody who has ever had to maintain or enhance a database application should be well aware that it is sometimes necessary to make changes to a table's structure, either by adding or removing columns, or even changing a column's datatype or size. A lot of novice programmers struggle with this simply because of the effort required to make their table class aware of these changes. If they use annotations which are embedded into the source code as comments then they have to go through that source code, then find and modify each of the relevant comments.

This is where having the metadata stored outside of the class file in a separate structure file pays dividends. All I have to do is amend that structure file and Bob's your uncle. Easy peasy lemon squeezy.

There is also one thing better than having to amend this structure file manually, and that is being able to amend it automatically at the touch of a button (well, several buttons actually). Because I invested my time and effort in creating a Data Dictionary to automate common functions for me this process is as easy as 1-2-3:

Make changes to the table's structure in the DBMS.
Import the updated structure into the Data Dictionary.
Export the updated structure to the file system.

Note that the export function will replace the structure file but NOT the class file if it already exists as it may have been amended to include one or more "hook" methods.

Conclusion

If I could devise such a simple and effective solution in 2002 using PHP4 why do so many programmers today insist on devising such elaborate, esoteric and convoluted solutions? It would seem that far too many of them do not have the brain power to devise simple solutions, so they go in the opposite direction and devise complex solutions just to prove how clever they are. This, to me, is a clear violation of the KISS principle and is something which novice programmers should be taught to avoid. They need to try some lateral thinking, to think outside of the box, otherwise they will keep inventing more and more complex and over engineered solutions which are the complete opposite of what they are supposed to do.

The idea that simplicity is better than complexity is one that is shared by others, as I have documented in Quotations in support of Simplicity.