Imagine you are a developer writing a function which can be used in multiple places within your application. You have decided on a meaningful function name and a list of arguments, but when you come to write the code within the function you realise that one or more of the arguments needs to be qualified before the function can proceed with its assigned task. What do you do? There are two possible solutions to this scenario:
Solution #1 puts the argument checking in a single place while solution #2 forces the argument checking to be duplicated in every place where that function is called. Which one of these solutions violates the DRY principle? Which one of these solutions would require the most work if the rules regarding the argument checking were to change?
This is not an obscure situation which very few developers are liable to encounter. Since the release of PHP version 8.1 which changed strict typing (refer to Type Declarations and the declare(strict_types=1);
directive) to be a requirement instead of an option (refer to this RFC and my response in this blog post) it is now something which every developer will encounter all over their code and which will require enormous amounts of effort to fix.
This change violates the principles laid out in RFC: Strict and weak parameter type checking, which was published in June 2009, which states the following:
PHP's type system was designed from the ground up so that scalars auto-convert depending on the context. That feature became an inherent property of the language.
...
strict type checking is an alien concept to PHP. It goes against PHP's type system by making the implementation detail (zval.type) become much more of a front-stage actor
...
In addition, strict type checking puts the burden of validating input on the callers of an API, instead of the API itself. Since typically functions are designed so that they're called numerous times - requiring the user to do necessary conversions on the input before calling the function is counterintuitive and inefficient. It makes much more sense, and it's also much more efficient - to move the conversions to be the responsibility of the called function instead. It's also more likely that the author of the function, the one choosing to use scalar type hints in the first place - would be more knowledgeable about PHP's types than those using his API.
...
Finally, strict type checking is inconsistent with the way internal (C-based) functions typically behave. For example, strlen(123) returns 3, exactly like strlen('123'). sqrt('9') also return 3, exactly like sqrt(9). Why would userland functions (PHP-based) behave any different?
Before PHP came along earlier languages, such as COBOL, were strictly typed for the simple reason that they used predefined structs (also known as records, composite data types or data buffers). All input/output operations required a pre-defined and pre-compiled record structure which identified precisely the type and size of every piece of data that was passed in that operation. Each form/screen had to be built and compiled in order to define all the data items on that screen. Each database access had to define every column used in that access. The application code that dealt with that I/O operation had to use the exact same data structure otherwise the the wrong value would be retrieved and chaos would ensue. The process of defining the record structure for each operation also required that each data item be listed in the correct order. Trying to access a string as an integer, or an integer as a string, would result in corrupt data. If the sending and receiving structures had the data items defined in different orders this would result in corrupt data. Arguments passed into and out of subroutine calls were also structures, so again the structures used by the calling and the called components had to match precisely otherwise this would result in corrupt data.
In my COBOL days I helped ensure that the structures used within each program were always synchronised with the structures used by the forms and the database by creating a program called COPYGEN which would read the structures directly from the forms file or database and write them as text files which could be imported into a copy library. When a program required one of these structures it would read it from the copy library instead of having to be hard-coded by the developer. Thus when any structure changed all that was necessary was to rerun the COPYGEN program and recompile all those programs which referenced that structure. This simple piece of automation cut out a lot of developer-induced program bugs and helped with programmer productivity.
PHP does not use pre-defined and pre-compiled structures which are typed, it uses dynamic arrays which are untyped. These arrays can be regarded as dynamic structures as the contents of an array does not need to be defined before it is filled with data. At runtime values are coerced into the relevant types as and when necessary. When the contents of an HTML form is submitted to a PHP script it appears in the $_POST array. This is an associative array which contains a list of name => value
pairs where the value is always a string. This is because the HTML document does not contain any type information for each of its fields. When data is retrieved from the database the result appears as an indexed array of associative arrays. The first level is indexed by row number, which always starts at zero. Each row is an associative array with a separate field for each item specified in the SELECT query. Again each column of data appears as a string as the SQL output is designed to be sent in human-readable format. Again it is not necessary to define the structure of the array before running the query as the array is built dynamically by the DBMS when the query is executed.
The only different between HTML data and SQL data is for empty values - in the $_POST array they appear as empty strings while in the SQL array they appear as NULLs. This did not matter in PHP 4 as any variable containing an empty string, NULL or FALSE was regarded as being empty()
and was always coerced successfully into an empty value for the relevant type. This standard behavior has now been removed by those idiot core developers, thus proving that they do not understand the roots of PHP and are seeking to change it to suit their own perverse interpretations of how a "proper" language should behave.
Because the structure of an array does not have to be defined before it can be filled with data it provides a dynamic mechanism which is both powerful and flexible for the sending and receiving of data. I can construct methods which send and receive data as single array arguments and these arrays can contain any number of columns from any number of sources - either HTML or SQL - without having to be amended. This is why I use a standard $fieldarray variable as the input and output arguments in my common table methods which are defined in my abstract table class and inherited by every concrete table class. This provides the polymorphism which allows me to access any Model class from any Controller, using that mechanism called Dependency Injection, which contributes directly to the power of my Transaction Patterns and my high rates of productivity.
It would appear that many developers learned about OOP in one of those early languages which used structs and which required strict typing. They assumed that this was the way it should be done, and they cannot adjust their thinking to deal with the new breed of languages which are not compiled and which do not require strict typing. PHP does not use structs which have to be defined for each input/output operation, it uses untyped arrays which are infinitely more flexible. Being dynamic it means that I can define a function with an array argument, such as my $fieldarray, and at runtime I can fill it with any amount of data from any source. I can change the contents of this array at any time without having to modify any method signatures, thus proving that my software is as loosely coupled as it could possibly be. This directly contributes to the increase in polymorphism which is available in my framework, which leads to an increase in the volume of reusable code, which leads to a decrease in the code which I have to write, and this contributes to my high rate of productivity.
PHP was always designed from the ground up to be dynamically typed instead of statically typed so that scalars would auto-convert depending on the context. The difference can be explained as follows:
You should see from those statements that the time at which type checking is performed depends on whether the language is compiled or interpreted. This was because in the early days of computing the hardware was very slow, so leaving type checking to run time was bad for performance. If all type checking is done in the compilation phase then it does not have to be redone at run time. From the mid-1990s the speed of the hardware went up while the cost came down, and this allowed interpreted languages, with dynamic type checking, to become a viable alternative.
I can hear you asking the question If PHP does its type checking at runtime what is wrong with rejecting any argument which is not of the expected type?
The answer is simple - it was designed to accept arguments of mixed types and silently cast them into the expected type instead of only accepting a single type, as indicated in the scenario which I identified earlier. All type checking used to be performed within the function itself after it had been called. However, with the introduction of the declare(strict_types=1);
directive this type checking is now performed by PHP itself when your code calls the function but before the function is actually activated. This means that any type juggling has now to be performed in all those places where a function is called instead of within the function when it is called. This means that code which has run quite happily for the last 20+ years will now complain bitterly and demand to be "fixed". This is a break in Backwards Compatibility (BC) which has been forced upon the legions of application developers for a completely bogus reason. The offending RFC makes the following statement:
Internal functions (defined by PHP or PHP extensions) currently silently accept null values for non-nullable arguments in coercive typing mode. This is contrary to the behavior of user-defined functions, which only accept null for nullable arguments. This RFC aims to resolve this inconsistency.
The statement This is contrary to the behavior of user-defined functions
should have been qualified with since the introduction of the declare(strict_types=1); directive in version 7.0
as it is only since then that the behaviour of type checking for user-defined functions has been able to be switched from being relaxed to strict. Note also that this directive was supposed to have absolutely no effect on the calling of internal functions.
To say that the treatment of non-nullable arguments has been incorrect for the past 20 years is blatantly incorrect. Before it was changed in PHP5 any function given a non-nullable argument containing NULL would treat that as an empty version of that argument's type. While the function signatures in the documentation seemed to indicate that each argument had to be of a single specific type the actual behaviour was much more relaxed, as described in the following sections of the manual:
Converting to string
String conversion is automatically done in the scope of an expression where a string is needed.
null is always converted to an empty string.
Converting to integer
To explicitly convert a value to int, use either the (int) or (integer) casts. However, in most cases the cast is not needed, since a value will be automatically converted if an operator, function or control structure requires an int argument.
null is always converted to zero (0).
This difference between the function description and its actual behaviour was not significant until this RFC was implemented in version 7.1 which introduced Nullable type syntactic sugar which stated the following:
A single base type declaration can be marked nullable by prefixing the type with a question mark (?). Thus ?T and Tnull| are identical.
While this was specifically allowed for user-defined functions it should also have been allowed for internal functions so that the documentation for each function would then be consistent with its behaviour, but this never happened. It was not until 6 years later that the core developers decided to deal with this "inconsistency", and they had two choices:
The first option would have involved a small change to each function's documentation but not its behaviour, and no work at all by any application developer as it would have no effect on backwards compatibility and therefore would not break anything that then had to be fixed. The second option was the complete opposite - minimal effort by a core developer but maximum effort by legions of application developers who now have to hunt down and fix every function call in their code.
It should be pointed out that although in the PHP manual each documented function would contain its name, its input arguments and its return type, such as in
Get string length (PHP 4, PHP 5, PHP 7) strlen ( string $string ) : int
this was merely a hint as it would use PHP's type jugging capabilities to accept variables of different types provided that they could be coerced into the expected type. In the case of the strlen()
it would accept both "12.34"
and 12.34
, and NULL
would be interpreted as an empty string and its length would therefore be zero.
Because of the changes to the type system which have taken place over quite a few releases it is now possible to turn type hinting into type enforcement by using the declare(strict_types=1);
directive. However, while this new feature was supposed to be entirely optional and only applicable for user-defined functions it has been made mandatory for all internal functions in version 8.1 and cannot be turned off. This means that arguments for any internal function which do not have the same type as is specified in the manual will be rejected. This is a change in behaviour and a massive BC break.
Why has PHP always allowed multiple types for its internal functions? This can be best explained in RFC: Strict and weak parameter type checking which quite clearly states:
PHP's type system was designed from the ground up so that scalars auto-convert depending on the context. That feature became an inherent property of the language. In the vast majority of scenarios in PHP, scalar types auto-convert to the necessary type depending on the context.
everything is an object
, everything is a string
Why was it done this way? Contrary to the belief of some OO purists who insist that everything is an object
, when we are writing applications in the real world which have HTML at the front end and SQL and the back end reality makes an appearance and tells us that everything is a string
. Take a look at the following facts if you don't believe me:
This meant that while both the front end HTML and back end SQL handle nothing but strings, the code that the developer writes in the middle to handle all the business rules and calculations might need variables of particular types. Rather than force the developer to change a variable's type manually PHP's automatic type juggling feature would take care of everything behind the scenes. This behaviour was always deemed to be better for the application developers as the alternative required extra effort for zero benefit, as explained in the RFC mentioned earlier:
Strict type checking puts the burden of validating input on the callers of an API, instead of the API itself. Since typically functions are designed so that they're called numerous times - requiring the user to do necessary conversions on the input before calling the function is counterintuitive and inefficient. It makes much more sense, and it's also much more efficient - to move the conversions to be the responsibility of the called function instead.
You should be able to see that this philosophy is based on common sense, is efficient, and in keeping with good programming practices. To reverse this philosophy based on nothing more that the perverse and dogmatic interpretation of an artificial rule shows that the abilities of the core developers has slipped from "cream of the crop" to the "bottom of the barrel".
There are a surprising number of programmers who think that strict typing is better than dynamic typing, or that strict typing was developed to avoid the problems with dynamic typing. Both arguments are wrong. Strict typing is not better than dynamic typing, it is just different. Each has its own set of pros and cons, and it is up to the individual developer to decide which is best for them.
The first programming languages were compiled and statically typed simply because of the poor performance of the hardware at the time. It was deemed to be more acceptable to perform the type checking just once at compile time instead of every time the code was run. Later improvements in CPU speeds reduced the differences in performance to insignificant levels, and this gave rise to interpreted languages which, because there was no compilation phase, meant that any type checking had to be delayed until run time. As an added bonus this allowed for languages such as PHP to add a feature called Type Juggling which determines a variable's value at run time and attempts to coerce it into the expected type if it is different. Thus the statement sqrt('9') will return 3, exactly like sqrt(9). Many programmers have discovered that using dynamically typed languages makes them more productive.
The differences between the two methods of type checking can be compared in the following chart:
Statically typed languages | Dynamically typed languages |
---|---|
Type checking is completed at compile time | Type checking is completed at run time |
Compiled code can only run on a single platform | Can have different interpreters for different platforms |
Explicit type declarations are usually required | Explicit type declarations are not required |
Type juggling must be performed manually | Type juggling can be performed automatically |
Errors are detected earlier | Type errors are detected at runtime, preferably during testing |
Variable assignments are static and cannot be changed | Variable assignments are dynamic and can be changed |
Produces more optimised code | Produces less optimised code, runtime errors are possible |
Offers operational stability and clean code | Offers agility and development flexibility |
Can be more verbose and slower to write | Can be less verbose and quicker to write |
While it is still true that code which has been compiled ahead of time runs faster than code which is interpreted at rime, modern hardware is so fast that the difference is becoming less and less noticeable.
I used statically typed languages for 20 years before I switched to PHP, and I found them to be too restrictive. I chose to switch to PHP as it was well suited to the building of enterprise applications, my specialty, with HTML forms in the front end, an SQL database at the back end, and business rules in the middle. Once I started using PHP I was amazed at how much more productive I could become. I didn't have to write code to define variables and their types ahead of time, and I didn't have to write code to deal with type conversions. I could write code and run it immediately without having to wait for it to compile, and thorough testing would reveal any errors that needed to be fixed. PHP suited the programming style which I had developed over the previous 20 years even though I had to switch from a procedural to the object oriented paradigm. I had become used to writing code that was reusable, but the OO capabilities of PHP, with its support for encapsulation, inheritance and polymorphism, enabled me to write code that was even more reusable. The fact that I could rebuild my previous framework in PHP with even more capabilities convinced me that it was the best language for me. I love using PHP as it enables me to get the job done with the minimum of effort. Programmers who don't like PHP's dynamically-typed nature should not be using it in the first place. That's as stupid as going into a Chinese restaurant and complaining that they don't serve Italian food.
While it is true that novice programmers can make more mistakes with dynamically typed languages, not all their mistakes will be down to simple type errors. As they gain experience and learn how to avoid those errors they may actually find that the verbosity of statically typed languages becomes more of a hindrance than a help. I put statically typed languages in the same category as training wheels on a child's bicycle - they are OK when you are a child, but get in the way when you are an adult.
To say that statically-typed languages are better and more popular than dynamically-typed languages is not supported by the fact that 60% of the world's programming languages are dynamically-typed and only 40% are statically-typed. This would indicate to me that those who support statically-typed languages are in the minority.
In his article Are Dynamic Languages Going to Replace Static Languages? Robert C Martin (aka Uncle Bob) says the following:
I thought an experiment was in order. So I tried writing some applications in Python, and then Ruby (well known dynamically typed languages). I was not entirely surprised when I found that type issues simply never arose. My unit tests kept my code on the straight and narrow. I simply didn't need the static type checking that I had depended upon for so many years.
I also realised that the flexibility of dynamically typed languages makes writing code significantly easier. Modules are easier to write, and easier to change. There are no build time issues at all. Life in a dynamically typed world is fundamentally simpler.
In a response to a comment in his weblog he also states the following:
However, I don't really think I spend a lot of time in java to make there is no build error... Could you share more about how dynamically typed language help you in development?Have you ever added an exception to a method and then had to change many other methods with either 'throws' clauses or try/catch blocks? Have you ever changed the type of a function argument and then had to find all usages and change them? Have you ever changed a class and then had to recompile, and redeploy, all the classes that used that class? These are just three of the issues that don't occur nearly as often in dynamically typed languages.
In his article Strong Typing vs. Strong Testing Bruce Eckel writes:
Instead of putting the strongest possible constraints upon the type of objects, as early as possible (as C++ and Java do), languages like Ruby, Smalltalk and Python put the loosest possible constraints on types, and evaluate types only if they have to. That is, you can send any message to any object, and the language only cares that the object can accept the message - it doesn't require that the object be a particular type, as Java and C++ do.
...
the compiler was just one (incomplete) form of testing ... a weakly-typed language could be much more productive but create programs that are just as robust as those written in strongly-typed languages, by providing adequate testing.
In his article Diminishing returns of static typing TheMerovius writes:
The costs of static typing again come in many forms. It requires more up front investment in thinking about the correct types. It increases compile times and thus the change-compile-test-repeat cycle. It makes for a steeper learning curve. And more often than we like to admit, the error messages a compiler will give us will decline in usefulness as the power of a type system increases. Again, we can oversimplify and subsume these effects in saying that it reduces our speed.
It may come as a shock to some programmers, but the fact that strict typing forces the developer to define a variable and its type before it can be used, then forbids the developer from changing its type during the program's execution actually caused a problem which required a solution. The problem was that once you had defined a variable with a particular type you could not use that variable as an argument of a different type. The solution was function overloading which can be described as follows:
function overloading or method overloading is the ability to create multiple functions of the same name with different signatures (arguments and types). Calls to an overloaded function will run a specific implementation of that function appropriate to the context of the call, allowing one function call to perform different tasks depending on context.
Take the following examples in Java:
public int FunctionName(int x, int y) { return (x + y); } public double FunctionName(double x, int y) { return (x + y); } public double FunctionName(int x, double y) { return (x + y); } public double FunctionName(double x, double y) { return (x + y); }
As you can see this is just a collection of variations of the same function where each variation does exactly the same thing but uses a different combination of types for its input and output arguments. Function and method overloading is impossible in PHP as once a function name has been defined it cannot be defined again. Once a method name has been defined in a class it cannot be defined again in the same class.
Note that the implementation has to be duplicated in each variation, and this is a clear violation of the DRY principle. With PHP, as in all other dynamically-typed languages, the implementation is NOT duplicated, and the code to deal with variations in argument types is less complicated, as shown in the following section .
Some programmers describe function/method overloading as a form of polymorphism, but as far as I am concerned this is a misuse of the terminology. Polymorphism requires the same method signature to be available in more than one object. It has absolutely nothing to do with the types of the arguments in a method signature.
Function overloading like the above is not required in PHP as its rules for type juggling are based on common sense. If all the arguments in an expression which involves arithmetic are of type int then the result will be an int. If any is a double then the result will be a double. Easy peasy lemon squeezy. It's not rocket science.
As I have already stated in Static typing is NOT better than dynamic typing the two different approaches to type checking were developed at different times when the hardware performed at different speeds. Compiled languages performed better when the CPU speeds were low, but as they increased the performance difference with interpreted languages gradually diminished, and their ability to make programmers more productive became a significant factor in their rise in popularity. PHP was designed to be dynamically-typed from the outset, which meant that it had all the advantages of being dynamically typed and none of the disadvantages of being statically typed.
The problem with back-porting static type checking into PHP is that you start to loose some of the advantages and instead gain some of the disadvantages. As I have said elsewhere some people know only what they have been taught while others know what they have learned. There are far too many people who have started their programming careers with statically typed languages who later make the switch to one which is dynamically typed, and their first response is This is different from what I have been taught, therefore it must be wrong
. These people also complain that PHP is not a "proper" object oriented language, but I have dismissed this argument in OOP - the bare minimum and also What OOP is. These people then push for the language to be changed using the argument Language X has so-and-so feature, so PHP should have it too
. What they fail to realise is that a feature which was added to a statically typed language simply to get around a problem in that language has no place in being added to a language which does not have that deficiency.
A case in point is object interfaces which were only added to the early languages to allow polymorphism to exist without inheritance. This is highlighted in Polymorphism and Inheritance are Independent of Each Other which contains the following code samples:
// C++ polymorphism through inheritance class Car { // declare signature as pure virtual function public virtual boolean start() = 0; } class VolkswagenBeetle : Car { public boolean start() { // implementation code } } class SportsCar : Car { public boolean start() { // implementation code } } // Invocation of polymorphism Car cars[] = { new VolkswagenBeetle(), new SportsCar() }; for( I = 0; I < 2; i++) Cars[i].start();
The cars array is of type Car and can only hold objects that derive from Car (VolkswagenBeetle and SportsCar) and polymorphism works as expected. However, suppose I had the following additional class in my C++ program:
// C++ lack of polymorphism with no inheritance class Jalopy { public boolean start() { // implementation code } } // Jalopy does not inherit from Car, the following is illegal Car cars[] = { new VolkswagenBeetle(), new Jalopy() }; for( I = 0; I < 2; i++) Cars[i].start();
At compile time this will generate an error because the Jalopy type is not derived from Car. Even though they both implement the start() method with an identical signature, the compiler will stop me because there is a static type error.
This problem has never existed with PHP as it has always been possible to define several classes with the same method signatures without the use of the keywords "extends" or "implements" and have those methods called in a polymorphic manner. This is precisely what I did when I created the Data Access Objects to deal with the different DBMS engines which I support. When you call a method on an object PHP does not care how the method got there, just that it exists.
This topic is discussed in more detail in Object Interfaces.
When looking at other people's code which is supposed to demonstrate the "right way" to do OOP I often see deep inheritance hierarchies, and I wonder where this idea came from. When I came across Pragmatic OOP written by Ricki Sickenger I found this observation:
A Car and a Train and a Truck can all inherit behavior from a Vehicle object, adding their subtle differences. A Firetruck can inherit from the Truck object, and so on. Wait.. and so on? The thing about inheritance is that is so easy to create massive trees of objects. But what OO-bigots won't tell you is that these trees will mess you up big time if you let them grow too deep, or grow for the wrong reasons.
This example of a class hierarchy is, in my experience, totally inappropriate for a database application. I would never start with an abstract class called Vehicle then create a hierarchy of subclasses and sub-subclasses for each vehicle type and sub-type. I would only have two tables - Vehicle and Vehicle-Type - where the Vehicle table contains a foreign key to the Vehicle-Type table. Each vehicle, regardless of its type, would have a row on the Vehicle table and a pointer to a row on the Vehicle-Type table. In this way I could create as many vehicle types as I want without having to create a new subclass, and I would not have to change any code to refer to this new subclass.
Another example of this mis-application of the "IS-A" test can be found in Inheritance in Java, Part 1: The extends keyword. This shows that Java programmers are taught to do it this way, and I'm pretty sure that there will be similar code samples in other "proper" OO languages. Having been taught the benefits of proper database design following the rules of Data Normalisation I have learned to implement a different and more pragmatic interpretation of the "IS-A" and "HAS-A" tests.
From the very first moment I started to develop my framework for building enterprise application the idea of producing deep inheritance hierarchies, or inheriting from one concrete class to create another concrete class, never entered my head. I saw immediately that this type of application could be comprised of thousands of user transactions where each transaction performs one or more operations on one or more database tables, and that regardless of what data a table may contain the only operations that can be performed on it are Create, Read, Update and Delete (CRUD). For this reason it seemed obvious to me to create an abstract table class to contain the common processing, and then extend this into a separate concrete table class for each physical table to hold the unique business rules for that table. Despite the fact that this arrangement works, and works very well, my critics take great delight in telling me that Having a separate class for each database table is not good OO. I ignored such criticisms as they could not explain exactly why it was a bad idea or what problems it caused.
It was not until I came across a document which was published in 1988 by Ralph E. Johnson & Brian Foote called Designing Reusable Classes that I found something which supported my approach. I discuss this further in The meaning of "abstraction". In their article they describe a style of programming called programming-by-difference whereby you examine a collection of entities and look for similarities and differences. If they have similar protocols (operations) then these can be put into an abstract class, while each concrete subclass need only specify the differences. In the case of database tables each one is subject to the same standard set of CRUD operations while the differences boil down to different data structures and business rules. Their article states that standard protocols are best provided by an abstract class, and that it is better to inherit from an abstract class than from a concrete class, that the root of each class hierarchy should be an abstract class. This abstract class defines methods with standard implementations as well as empty methods which may be overridden in subclasses, which sounds to me like an implementation of the Template Method Pattern. My implementation matches this description perfectly - I have an abstract table class which contains all the methods that can be performed on any database table, and a separate concrete table class for each of my 400+ database tables. Each table class has its own data structures, and customisable "hook" methods for the unique business rules.
While this activity is carried out behind the scenes for internal functions, how is it possible to duplicate this in user defined functions? Type checking can be performed using any of the functions identified in Type Detection, such as is_array, is_bool, is_double, is_int, is_null, is_numeric, is_string. Transforming a variable into another type is covered in Type Casting. Note that similar functionality does not exist in strictly typed languages as a variables type is fixed once it has been declared, and once declared it cannot be changed. All you can do to change type is move a variable of one type to a another variable of a different type.
Note that type checking of a user's input from an HTML form before it is added to an SQL query is an absolutely vital step. Although both the HTML data and the SQL query are strings the SQL query will fail if a value destined for a particular column is not compatible with that column's data type. If you try to insert the value "four" into an integer column the query will be rejected. It is therefore necessary to validate all user input before it is sent to the database. If this validation process finds any errors it can be sent back to the user with an appropriate error message so that the faulty value can be corrected before being resubmitted.
How is it possible to perform this validation? One method is to use the filter extension, but this did not become available until several years after I had completed my framework, so I had to invent my own technique. Here is a sample of my original code:
$fieldarray['column1'] = '...'; // test for a missing value if (strlen(trim($fieldarray['column1']) < 1)) { $errors['column1']= 'This must have a value'; } // if // check that column does not contain any non-numeric characters if (!is_numeric($fieldarray['column1'])) { $errors['column1'] = 'This is not a number'; } // if // test for an integer $test = (integer)$fieldarray['column1']; if ((string)$test != (string)$fieldarray['column1']) { $errors['column1'] = 'This is not a whole number'; } // if // test for a positive integer $test = abs((integer)$fieldarray['column1']); if ((string)$test != (string)$fieldarray['column1']) { $errors['column1'] = 'This is not a positive whole number'; } // if // test for a positive decimal $test = abs((float)$fieldarray['column1']); if ((string)$test != (string)$fieldarray['column1']) { $errors['column1'] = 'This is not a positive decimal number'; } // if // test for a positive number with no more than 2 decimal places $test = abs((float)$fieldarray['column1']); $test = round($test, 2); if ((string)$test != (string)$fieldarray['column1']) { $errors['column1'] = 'This cannot have more than 2 decimal places'; } // if
Please note the following:
$fieldarray['column1']
is never converted to the desired type, it is left just as it was.(integer)$fieldarray['column1']
no error will be thrown if the input cannot be converted - the result will always be zero.if ($test != $fieldarray['column1'])
I cast both values to strings before testing they are equal.Note also that unlike virtually every other programmer on the planet I do not have a separate class variable for each table column as I have found it far easier to leave the $_POST array intact on its journey from the HTML form to the database. This saves having to waste time dealing with those superfluous setters and getters which would cause tight coupling instead of the preferred loose coupling. As I pass this entire array through every method as both an input and output argument I can just as easily reference each column with $fieldarray['column1']
as I could with $this->column1
. I never saw any rule which said that I MUST have a separate variable for each column, so I did what worked with less code. None of my previous languages had ever used individual variables to pass data around, they had all used structs or records. As soon as I discovered the power and flexibility of PHP's arrays I saw them as being superior to anything I had ever encountered previously, and nothing I have seen in the past 20 years has swayed me from that view.
By leaving all the input data in an array instead of splitting it into separate variables made it easier for me to automate the data validation process. Some 20 years earlier in COBOL I had developed a COPYGEN program which read the database and produced a COPYLIB entry for each table's structure, so I created a similar version in PHP which read a table's structure and put the results in a structure file which is loaded into memory for use by each table's class file. If you have one array of field names and values, and a second array of field names and their specifications, how difficult would it be to write a standard routine to check that each field's value matches its specifications? If I can do it then anyone can. It's not rocket science, it's easy peasy lemon squeezy.
Constructing an SQL insert using $fieldarray
is extremely simple:
$this->query = "INSERT INTO $tablename SET "; foreach ($fieldarray as $item => $value) { if (is_null($value) OR strlen($value) == 0) { $this->query .= "`$item`=NULL, "; } else { $value = mysqli_real_escape_string($this->dbconnect, $value); $this->query .= "`$item`='$value', "; } // if $this->query = rtrim($this->query, ', '); // remove trailing comma } // foreach $result = mysqli_query($this->dbconnect, $this->query) OR trigger_error($this, E_USER_ERROR);
Note here that simply including a variable name in a double-quoted string will replace that name with a string representation of its value instead of inserting just the name. Note also that it is legitimate in SQL to supply a numeric value as a quoted string. So even SQL is dynamically typed!
The problem with strict typing is that it requires values to be set to the correct type otherwise a runtime error will be thrown. By writing code which forces each value to be of the correct type before it is used is supposed to fix such errors before they occur, and that's a good thing, right? But here's the kicker - you have to write extra code to achieve this. When you consider the fact that PHP was designed to make it easy to write programs which use HTML at the front end and SQL at the back end you should be aware those those two sources of data do not supply typed values, as explained in everything is a string. PHP does not require explicit type definition, and, as explained in Type Juggling, it will attempt to convert the type of a value to another automatically in certain contexts. This is explained further in RFC: Strict and weak parameter type checking where it says:
PHP's type system was designed from the ground up so that scalars auto-convert depending on the context. That feature became an inherent property of the language.
This is because the functions in PHP were originally designed as wrappers for the equivalent functions in the C language which also has a weak type system with implicit conversions. This means that it will accept string values into functions that expect different types, and attempt to juggle or coerce the value into the correct type. In order to avoid the possibility of type conversion errors it is standard practice that all input from an untrusted source (i.e. from a user-generated HTML form) be sanitised before being used in any function or being written to the database. This means checking that a numeric field contains nothing but a number, a date field contains nothing but a date, et cetera. This sanitisation can be done in one of the following ways:
fieldname = value
, such as the $_POST array, all you need to automate the sanitisation process is a method to identify the expected type for each of those values. This information is known as metadata as it is "data about data"Method #1 above is for clueless newbies who don't know any better. Method #2 above is also for clueless newbies who don't know any better. My previous experience as a database programmer elevated me beyond the status of a clueless newbie and lead me to create a solution in PHP which is entirely automated, which means the following:
Because PHP 4 did not allow me to store metadata in comments I chose to do it in the old fashioned way, by using a $fieldspec array inside each table class. Initially I entered all these details by hand, but later I invested the time and effort in extracting this data from the INFORMATION SCHEMA which is available within every DBMS. This was similar to the COPYGEN utility which I built in the 1980s. For PHP I created a Data Dictionary which extracts this data from the DBMS and stores it in a DICT database from which it can be exported to a separate table structure file for each database table. The contents of this file are then loaded into the object using standard code within the class constructor. This information is then compared with the current set of input data during an INSERT or UPDATE operation which calls the standard validation object which is built into the framework. If this object finds any field whose value is inconsistent with its datatype it will generate an error message which will cause the operation to be aborted and give the user the opportunity to correct the error and retry the operation.
Note that the table structure file is automatically created when the table class file is constructed. If the table's structure is ever amended all that is required is that the latest structure be re-imported into the Data Dictionary so that it can be re-exported to the file system. Note that this process will overwrite the table structure file but *NOT* the table class file which may have been amended to include some customisable "hook" methods.
Using metadata to provide the rules for data validation is a good idea, but the use of Attributes and the Reflection API for this metadata was obviously created by someone who knows nothing about how database work. I used my experience to create a solution which allows me to create each table's metadata at the touch of a button, and to use this metadata to validate a user's input without having to write any code at all. This follows my general philosophy regarding computer programming which is:
The job of a good programmer is to solve complex problems with simple code, not to solve simple problems with complex code.
My solution uses bog standard code which is built into the framework and not some peculiar hocus-pocus using annotations which requires additional effort by the developer to both define and use. My solution follows the KISS principle (Keep It Simple, Stupid) instead of the KICK principle (Keep It Complex, Knucklehead) or, as I like to call it Let's Make It More Complicated Than It Really Is Just To Prove How Clever We Are
. By using bog standard code my solution is much easier for a novice to read, follow and understand. Which method would YOU prefer?
Whether you are an application developer or a language developer a programmer's job is to solve problems for his users, not create them. The core developers for PHP write the code that other developers use to write their applications, and there are thousands more application developers than there are core developers. When you consider that 70% of all internet sites are powered by PHP any changes to the language could potentially have a huge impact. Releasing a version of the language which breaks code that has run quite happily for years is something that should really, really be avoided. If a new version breaks code that used to work this is known as a Backwards Compatibility (BC) break. If a new release contains too many of these then it can slow down the adoption of that release as the application developers will be forced to change their code before they can upgrade. The effect of this is a slow adoption rate for each new version. If you look at these usage statistics you should see that 39% of websites are still using PHP 7 even though PHP 8 has been available since November 2020, so that is a lot of code that was broken by the core developers but has to be fixed by the application developers.
While some breaks have been done to solve security issues, and others for stability issues, it was generally accepted that nothing was allowed to be broken unless there was a really good reason, that the benefits of the break would be greater than the damage it caused to working code. However, far too many of today's crop of core developers fail to understand that their job is to fix bugs and make improvements for the benefit of the legions of application developers. Their brains have switched from "maintenance" to "arrogance" as they think that they are the sole arbiters of what is the "right" way that things should be done, and anything that does not conform to their way of thinking can be disregarded as automatically being the "wrong" way. I follow what the core developers are discussing in the internals newsgroup, and I regularly see statements such as "it shouldn't be done that way, it should be done this way". They believe that as they have total control of the language if there is any part of it that they don't like then they can simply change it and to hell with the impact it has on the community of developers. Most of them haven't a clue about the history of PHP, how it was designed and how it has evolved, and they regard backwards compatibility as an obstacle and not a feature.
The early leaders of the core developers, the likes of Zeev Suraski and Andi Gutmans, were real giants, were real pioneers. However, the current crop are more like pygmies with big ideas yet small brains. They are more like parasites as instead of injecting new ideas into the language they are sucking the blood out of it. Proper programmers are supposed to be pragmatists while most of today's code monkeys are nothing but dogmatists. As explained in Pragmatic vs. Dogmatic: What Are The Differences? a pragmatist is results-oriented while a dogmatist is rules-oriented. A pragmatist strives to achieve a result in the most practical and cost-effective way, and knows which rules to follow, which rules to ignore, and when to make new rules. A dogmatist, on the other hand, latches on to a single set of rules and follows them religiously and without question and assumes that the result will automatically be correct even though it may not be the most efficient or cost-effective. Dogmatists also have a tendency, in an attempt to demonstrate their purity of thought, to create new interpretations of the rules which can become more and more extreme with the passing of time leading to attitudes such as holier than thou and more Catholic than the Pope. Those dogmatists who can do nothing but follow rules blindly without understanding why they were created or when they should be used are in great danger of becoming nothing more than Cargo Cult Programmers.
Which of the following rules is correct:
The answer is that they are both correct as they depend on how the argument is marked. However, it should be remembered that it was not until version 7.1 which implemented RFC: Nullable Types that it was possible to mark an argument as nullable for user-defined functions, but this was not enforced unless the declare(strict_types=1);
directive was used. None of the signatures for internal functions in the PHP manual was altered to incorporate this change in version 7.1 as this change was stated as being applicable ONLY to user-defined functions. The behaviour of all these internal functions continued as before by accepting nulls in arguments in line with the documentation which said:
for string arguments null is always converted to an empty string
for numeric arguments null is always converted to zero (0)
When it was decided to retro-fit the rules for nullable types for all internal functions there were two ways in which this could be done:
One of these options would have involved a small amount of work by a core developer while the other will involve thousands of application developers trawling through millions of lines of code and fixing every call to an internal function. The "justification" for the choice which they made was given as:
Historically, the reason for this discrepancy is that internal functions have supported a concept of scalar types (bool, int, float, string) long before they were introduced for user-defined functions in PHP 7.0, and the existing implementation silently accepted null values. For the new scalar type declarations introduced in PHP 7.0 an explicit choice was made to not accept null values to non-nullable arguments, but changing the existing behavior of internal functions would have been too disruptive at the time.
Firstly, the ability to not accept null values to non-nullable arguments was only supposed to be enforced when strict typing was turned on, but it is now turned on for internal functions whether you like it or not.
Secondly, the statement changing the existing behavior of internal functions
is totally wrong as all arguments had been treated as nullable since the language was created.
Thirdly, the statement would have been too disruptive at the time
begs the question Too disruptive for whom? For a single core developer or thousands of application developers?
This is just another way of saying "We could have done it ourselves, but we just couldn't be *rsed.
Currently the only people who can vote on changes to the language are the core developers, and you can only become a core developer if the know the C language because this is the language in which PHP was written. It is interesting to note that C has a static type system but it is weakly enforced and supports implicit conversions. In its original implementation all PHP's internal functions were just wrappers for the equivalent C functions, so all the function names and the order of arguments in C was maintained in order "to be consistent". When programmers who are new to PHP start complaining that its functions are not consistent they should be forced to answer the question Consistent with what?
The fact that is not consistent with language X, Y and Z is irrelevant as it was designed to be consistent with C, so all their complaints should be directed at the creators of C and not the creators of PHP.
Knowledge of the C language on its own does not qualify someone to become a core developer - they have to submit a change to the language which has to be accepted by the existing core developers. In theory this is supposed to also demonstrate their knowledge of PHP and their ability to improve it, but judging by the number of proposed changes which break backwards compatibility this is not the case. How many of the core developers have actually used the language for any length of time? How many of the core developers have proved their competence by releasing their software for peer review? Unless they can produce evidence that they have actually used the language how can they offer suggestions on improving the language? Improving a language means adding new features while maintaining its current behaviour. Every new feature should be entirely optional so that application developers can upgrade to a new version and be sure that their existing code will continue working. The use of a new feature should be at the application developer's discretion, not a diktat which is unilaterally imposed by the core developers.
The only people who can vote on changes to the language are the core developers, and the only way to become a core developer is to actually submit a change to the language. This means that the only people who can vote on changes are the very people who want to make those changes - the millions of application developers who have been using PHP for years, whose livelihood depends on writing software with a reasonable life expectancy, don't have any say at all.
Nobody should be able to change the language to match their own personal preferences as it then inflicts those preferences on everybody else. There are hundreds of different languages out there which offer different ways to get the job done, so if you don't like the way that PHP works then you should switch to one of those others and quit complaining. There is also no "correct" way to use the features of a language. There are ways that work and ways that don't work. There are ways that are easy to write and ways which are cumbersome. There are ways which neat and slick, and there are ways which are ugly and slow. Transforming a language which has traditionally been dynamically typed since its inception into one that is statically typed just to satisfy the opinion of a small number of narrow-minded, dogmatic and pedantic coders is not the way to increase the popularity and use of the language.
The objective of the offending RFC was stated as follows:
Internal functions (defined by PHP or PHP extensions) currently silently accept null values for non-nullable arguments in coercive typing mode. This is contrary to the behavior of user-defined functions, which only accept null for nullable arguments. This RFC aims to resolve this inconsistency.
Instead of changing the arguments on all internal functions to be nullable, just as they can be marked as nullable on all user-defined functions, and so maintain the way in which they have behaved for the past 20+ years, the core developers made a poor decision and resolved this inconsistency in a way which is guaranteed to cause grief to unknown numbers of application developers. What those chumps have failed to realise, which can only be down to the fact that they don't actually write any applications in PHP themselves, is that in failing to deal properly and intelligently with that inconsistency they have actually created a new inconsistency in the way that PHP handles NULL in arithmetic and string expressions. Take the following examples:
$number = null; $number++; // result = 1 $number = null; $number = $number +1; // result = 1
Here you can clearly see that in both cases the variable $number started with a NULL value but was incremented by 1 without any problem. This means that the value NULL was correctly interpreted as zero and not thrown out as a type error.
$foo = 'foo'; $bar = 'bar'; $null = null; $foobar = $foo.$null.$bar; // result = 'foobar'
In the above example you can see that I have concatenated three variables into a fourth variable. The value NULL was correctly interpreted as an empty string and not thrown out as a type error.
This is exactly the same behaviour that all internal functions exhibited up until version 8.1 was released. It clearly shows that the way in which nulls are treated in internal functions is now at odds with the way they are treated in expressions. Is this or is this not an inconsistency?
These are reasons why I consider some ideas on how to do OOP "properly" to be complete rubbish:
24 Nov 2024 | Added Typed structures have been superseded by untyped arrays |
31 Mar 2024 | Added Type juggling using metadata |
14 Jul 2023 | Added Perpetuating bad/redundant practices |