PHP Coding Horrors and Excuses for Poor Decisions

Having coded in PHP for 7 years I feel I can give a balanced feedback on PHP. Today I mainly focus on Python & .NET because these languages have stood the test of time and allow me to attract great talent. I find it amusing that engineering leaders in established companies make backward decisions today to use PHP to power their business/core sites. Not to mention software engineer newbies falling prey to using it as their 1st language to experience software development & put theory into practice. So let’s explore this in more detail.

A quick story

Few years back while attending a Python class a young chap put up his hand, introduced himself as a long time PHP developer and asked the lecturer a question. “What is the difference between Python’s dictionary & lists to PHP’s arrays.”. Bang. This is exactly why I do not want newbies to go down that route. Data structures are fundamental to any software design. PHP will NOT force you to think about data structures when coding.. instead just stick a boot in your face and say walk.

As a leader

As a smart fast paced technology leader, you should NOT be suggesting or advising PHP as the company’s “language of choice”. If a company is using optimized wordpress hosting it’s typically for its blog (yes WordPress rocks), due to legacy reasons (we all learn right) or a variant of it. PHP is not even a great presentation language (so famous for years ago) lacking good support for a real templating engine. Going LAMP stack, as in Linux stack, is not about moving to PHP. Matter of fact LAMP stack is an old, beaten, used & abused lingo which means little today with the range of open source stacks that run on the Linux OS.

Let’s first look at what makes a good language. And if you are a leader looking at starting or moving to a new language this post should be enough to tell you what to avoid. Learn from other’s mistakes so you don’t have to make them yourself.

What makes a good language

  • Predictable
  • Consistent
  • Concise
  • Reliable
  • Debuggable

Check out the philosophies behind Python in Zen of Python on what a good language encourages.

PHP fails miserably here.

  • PHP is full of surprises: mysql_real_escape_string, E_ALL
  • PHP is inconsistent: strpos, str_rot13
  • PHP requires boilerplate: error-checking around C API calls, ===
  • PHP is flaky: ==, foreach ($foo as &$bar)
  • PHP is opaque: no stack traces by default or for fatals, complex error reporting.

PHP is NOT an enterprise language

An enterprise language is one that has good corporate support. Best example is Microsoft and their .NET platform.

Look at the support behind the PHP language. No corporation supports PHP’s growth & maturity like Sun & Google do for Java, Google (Guido van Rossum) for Python (jnc Django framework), Ruby (inc RoR) by 37 signals etc…

PHP is not supported by Yahoo. They failed to launch a version with Unicode support in the hyped up PHP6. And the father of PHP Rasmus Lerdorf is no longer based at Yahoo. Nor is PHP supported by Facebook. Facebook has been trying hard to move away from it’s aged roots and now compile PHP into C via HipHop – more on that below.

The mess that is PHP

There are plenty of websites covering the mess that is PHP. Just go and read them if you are still doubtful.

Some of those nasty PHP horrors

  • Unsatisfactory and inconsistent documentation at php.net.
  • PHP is exceptionally slow unless you install a bytecode cache such as APC or eAccelerator, or use FastCGI. Otherwise, it compiles the script on each request. It’s the reason Facebook invented HipHop (PHP compiler) to increase speed by around 80% and offer a just-in-time (JIT) compilation engine.
  • Unicode: Support for international characters (mbstring and iconv modules) is a hackish add-on and may or may not be installed. An afterthought.
  • Arrays and hashes treated as the same type. Ref my short story above.
  • No closures or first-class functions, until PHP 5.3. No functional constructs. such as collect, find, each, grep, inject. No macros (but complaining about that is like the starving demanding caviar.)  Iterators are present but inconsistently used.  No decorators, generators or list comprehension.
  • The fact that == doesn’t always work as you’d expect, so they invented a triple-equals === operator that tests for true equality.
  • include() can generate circular references and yield many unwanted and hard to debug problems. Not to mention its abuse to execute code that gets included.
  • Designed to be run in the context of Apache. Any back-end scripts have to be written in a different language. Long-running background process in PHP have to overwrite the global php ini.
  • PHP lacks standards and conventions.
  • There’s no standard for processing background tasks, such as Python’s Celery.

PHP presents 4 challenges for Facebook.

  • High CPU utilization.
  • High memory usage.
  • Difficult to use PHP logic in other systems.
  • Extensions are hard to write for most PHP developers.

Dont use Facebook as an excuse to have PHP as your core language.

Excuses for poor decision to use PHP

“But Facebook is all PHP.”

Boo hoo. Is that what your decision was based on? Seriously? It is well documented that Facebook uses PHP due to legacy reasons. It is what Mark Zuckerberg used in his dorm nearly a decade ago and somehow it stuck around. Later a top FB engineer called Haiping Zhao released HipHop literally rewriting the entire PHP language thus avoiding the worst attributes of the language. Since 2007 alone, Haiping named four failed attempts to move to Python (twice), to Java, to C++. The reason this did not work is due to incumbent inertia (it’s what’s there).

So you see it is not the same PHP you are coding in but a far superior subset of it customized for Facebook process & development efforts. PHP at Facebook was a mistake that had been corrected to some degree. Today the preferred strategy at Facebook is to write new components in a de-coupled manner using a better language of choice (C++, python, Erlang, Java, etc); this is easily facilitated by Facebook’s early development of thrift, an efficient multi-language RPC framework.

“But Yahoo is all PHP.”

Seriously? Shall we even go into this. A sinking Titanic that started its life as a manually maintained directory site. Today’s online apps are more advanced, demand high concurrency and dynamic nature – something more advanced languages are capable of delivering.

 “But Zynga (a large gaming company) uses PHP.”

At the time Zynga started developing for the platform, there was no other official Facebook SDK available except for the PHP one. Naturally Zynga started its life on Facebook. The rest is history.

Looking for a better language? Guess! ~ Yes I drew that by hand 🙂 Hope you like it!

Technology breeds culture

Bring a bunch of core PHP developers (those that only know this language) on board and you get what you pay for. Someone that can hack a script and not really understand the fundamentals of software design & engineering.

Think about this. Your valued assets are the staff (people in your company). And the staff will naturally come from companies and/or backgrounds/experiences will align with the technology decisions you made.

How about rewriting your code base in another language?

There is also a lot of industry precedent (Netscape case or Startup Suicide) indicating that re-writing an entire codebase in another language is usually one of the worst things you can do. Either don’t make the mistake to go down the PHP route in today’s era or start thinking about introducing a new language into the stack for new projects. Having a hybrid setup is OK and actually allows you to iterate fast, gives something new to play for your engineering crew and should you ever need to switch stacks you are already half way there. Dont make the same mistakes Facebook did.

The only bits I like in PHP are its “save file, refresh page and there are your changes”. The language is “easy to use”, yes. It’s hard to figure out what the fuck it’s doing, though.

Happy coding!

~ Ernest

Gentle Introduction to Python

Right, let’s dig into my favorite language. Python. It’s super easy to read & learn, it’s concise and one of the hot languages in Silicon Valley. In fact, Python is also one of the easiest languages to grasp if you want to learn to code on mobile.

The following assumes you understand basic software engineering concepts.

A bit about Python

  • Design philosophy emphasizes on code readability. Important because software engineers spend most of their time trying to understand code. (Ref Coding Horror)
  • Has a nice MVT open-source web framework called Django. Django emphasizes reusability and pluggability of components, rapid development, and the principle of DRY (Don’t Repeat Yourself).
  • It features a fully dynamic type system with late binding (duck typing) and automatic memory management, similar to that of Scheme, Ruby, Perl, and Tcl. More here.
  • Runs on LAMP, where the P = Python. Here’s how to set it up.
  • Currently one of the hottes languages (alongside Ruby/Ruby on Rails) in Silicon Valley especially among startups.

Sample of popular sites build in Python

Google, Dropbox, Reddit, Disqus, FriendFeed (Sold to Facebook to drive their News Feeds), YouTube, Quora (rising star), Douban. Comprehensive list here.

Python

  • Uses whitespace indentation, rather than curly braces or keywords, to show & delimit block structure. I prefer 4 spaces.
  • Everything is an object (first class) and everything has a namespace accessed by dot-notation.
  • Naming convention UpperCamelCase for class names, CAPITALIZED_WITH_UNDERSCORES for constants, and lowercase_separated_by_underscores for other names. See Python style guide and The Zen of Python for guiding principles for Python’s design into 20 aphorisms. Basically write self-documenting code by chosing explicit naming convention.
  • A comment starts with a hash character (#). For longer then a line (and as Doc strings) use triple quotes: ”’ xyz ”’.
  • Variable names have to start with a letter or underscore, and can contain numbers but no spaces or other symbols.
  • File extension is always .py. If you see .pyc this is source code compiled into bytecode for execution by a Python VM (virtual machine).
  • Use command line python shell to test assumptions by getting immediate results.
  • No case/switch statements. Switch is better solved with polymorphism (object that has more than one form) instead. Good example here.
  • Data types
    • Immutable (can’t be updated or changed): strings, tuple, int, float, complex, bool
    • Mutable (can be updated or changed): list, dictionary (dict) & mutable except for it’s keys
  • Editors I use: IDLE (for basic shell work & comes with Python.org install), PyCharm (with Django support) and Sublime Text 2 (lightweight TextMate replacement).

Basics

Arithmetic Boolean
2 > 3 → False
2 == 3 → False
2 The opposite of == is != (“not equals”):
2 != 3 → True
You can chain together comparison operators:
2 < 3 < 4 → True
Equality works on things besides numbers:
“moose” == “squirrel” → False
True and True → True
True and False → False
True or False → True
not False → True
(2 < 3) and (6 > 2) → True
Under the hood,
True is equal to 1,
and False is equal to 0.
Booleans are a subtype of integers.

Operators

== Equal to
!= Not Equal to
is Identical
and Boolean and
or Boolean or
& Bitwise and
| Bitwise or
not Boolean not (not the !)

Built in functions that are always available

len(s) Return the length of an object. Can also be a sequence (string, tuple or list) or a mapping (dictionary).
print(obj) Print object(s) to the stream file.
help(list) See basic help on any object.
dir(list) Return a list of valid attributes for that object.
type(list) Return the type of an object

More built in functions here: http://docs.python.org/library/functions.html

Functions

Always starts with a “def” and ends with “:”.

# define a new function with 1 default argument. Can also have no arguments.
def function_purpose(arg1=1):
     ''' This is a doc string '''
     print 'Python code'
     return (arg1, arg1+7,) # returns 2 values as a tuple (note the comma), else None
# call the function, returns a tuple that we assign to 2 variables
item1, item2 = function_purpose(1)

If you want to assign a value to a variable outside the function within a function you must prepend the variable with “global”.

Calling methods on objects

Just like calling functions, but put the name of the object first, with a dot

words = 'some monkeys here'
e = words.count('e')
# returns 4

Strings

Are a sequence of characters.

# creation
name = 'Ernest Semerda'

# accessing, returns 's'
name[4]

# splitting, returns a list ['Ernest', 'Semerda']
the_string.split(' ')

Strings can be subscripted/sliced like the list (see lists in Data Structures below).

# selected range returns 'nest '
name[2:5]

# get first two characters returns 'Er'
name[:2]

# get everything except the first two characters returns 'nest Semerda'
name[2:]

Sample of  some string methods. They come with 8-bit & Unicode support.

name.capitalize() # changes to 'ERNEST SEMERDA'
name.find(sub[, start[, end]])
name.lower()
name.split([sep[, maxsplit]]) and new_name.join(list)

More string methods: http://docs.python.org/library/stdtypes.html#string-methods

Data Typing

Python is strongly typed which won’t allow you to automatically converted from one type to another.

Python also has a strong tradition of duck-typing (dynamic typing) in which an object’s current set of methods and properties determines the valid semantics. Trusting that those methods will be there and raising an exception if they aren’t. Be judicious in checking for ABCs and only do it where it’s absolutely necessary.

An important feature of Python is dynamic name resolution (late binding), which binds method and variable names during program execution.

# fails because (str + int + str) != str
'There are ' + 8 + ' aliens.'
# perfect, str() = type conversion
'There are ' + str(8) + ' aliens.'

To achieve Reflection, a process by which a computer program can observe and modify its own structure and behavior, use the built-in functions. I.e. getattr

Over a “sys” module’s method “path”:

path = getattr(sys, "path")

Over a function1 with sample input:

result = getattr(sys.modules[__name__], "function1")("abc")

And/or use the Reflection Utilities API for deeper execution frame, execution model, class/obj inspection for methods & attributes etc… See: http://docs.python.org/c-api/reflection.html

Data Structures

Dictionary

Set of key:value pairs. Keys in a dictionary must be unique. Values Mutable.

# creation, empty dictionary
peopleDict = {}

# creation, with defaults
aliensDict = {'a':'ET', 'b':'Paul', 'c':42}

# accessing, returns 'ET'
aliensDict['a']

# deleting, 'Paul' is removed from dictionary
del alientsDict['b']

# finding, returns False (note capital F)
aliensDict.has_key('e')

# finding, returns ['a', 'c']
aliensDict.keys()

# finding, returns [('a', 'ET'), ('c', 42)]
aliensDict.items()

# finding, returns True
'c' in aliensDict

Lists

Lists can carry any items ordered by an index. Lists are Mutable.

# creation, empty list
peopleList = []

# creation, with defaults of any type
codesList = [5, 3, 'p', 9, 'e']

# accessing, returns 5
codesList[0]

# slicing, returns [3, 'p']
codesList[1:3]

# finding, returns ['p', 9, 'e']
codesList[2:]

# finding, returns [5, 3]
codesList[:2]

# returns ['p', 9]
codesList[2:-1]

# length, returns 5
len(codesList)

# sort, no return value
codesList.sort()

# add
codesList.append(37)

# return, returns 37
codesList.pop()

# remove, returns 5
codesList.pop(1)

# insert
codesList.insert(2, 'z')

# remove
codesList.remove(‘e’)

# delete
del codesList[0]

# concatenation, returns ['z', 9, 'p', 0]
codesList + [0]

# finding, returns True
9 in codesList

Apply set(list) and it becomes a set – an unordered collection with no duplicate elements. Also support mathematical operations like union, intersection, difference, and symmetric difference.

Tuples

Tuples are similar to lists: they can carry items of any type & useful for ordered pairs and returning several values from a function. Tuples are Immutable.

# creation, empty tuple
emptyTuple = ()

# note the comma! = tuple identifier
singleItemTuple = ('spam',)

# creation, with defaults of any type
codesTuple = 12, 89, 'a'
codestuple = (12, 89, ‘a’)

# accessing, returns 1
codesTuple[0]

More on data structures here: http://docs.python.org/tutorial/datastructures.html

Control & Flow

For loop

# Collection iterator over dictionary w/ tuple string formatting
people = {"Ernest Semerda":21, "Urszula Semerda":20}
for name, age in people:
    print "%s is %d years young" % (name, age)

To loop over two or more sequences at the same time, the entries can be paired with the zip() function.

More on string formatting operations here: http://docs.python.org/library/stdtypes.html#string-formatting-operations

For loop with if else

# Iterate over a sequence (list) of numbers (1 to 10) with if/else Conditionals. The range function makes lists of integers.
 for x in range(1, 10):
     if x == 8:
         print "Bingo!"
     elif x == 10:
         print "The End"
     else:
         print x

While loop

# using request to ask user for input from interactive mode
request = "Gimme cookie please: "
while raw_input(request) != "cookie":
    print "But me want cookie!"

Switch-statements do not exist. In OO they are irrelevant & better solved with polymorphism instead. Examples here.

More control flow tools here: http://docs.python.org/tutorial/controlflow.html

Golfing!

Chaining into few lines.

[x * x for x in [1, 2, 3, 4, 5]]
# returns [1, 4, 9, 16, 25]

Can get messy & complicated to read.

print [x * x for x in range(50) if (x % 2 ==0)]
def is_palindrome(word):
    word = re.compile(r'[!? ]').sub("", word.lower())
    return True if word == word[::-1] else False

Files

# open, defaults to read-only + note single forward slash
contents = open('data/file.txt')

# accessing, reads entire file into one string
contents.read()

# accessing, reads one line of a file
contents.readline()

# accessing, reads entire file into a list of strings, one per line
contents.readlines()

# accessing, steps through lines in a file
for line in contents:
    print line

More on IO: http://docs.python.org/tutorial/inputoutput.html

Classes

All methods (but not functions) are closures – see “self” below. A closure is data attached to code. All variables are public, private variables are established by convention only.

# SuperHero inherits from Person class - also supports multiple inheritance using comma
class SuperHero(Person):
    # constructor
    def __init__(self, name):
        self._name = name

    # method
    def shout(self):
        print "I'm %s!" % self._name

The __name__ below allows Python files to act as either reusable modules, or as standalone programs. Also think Unit Tests benefits!

if __name__ == '__main__':
    # instantiate the class
    batman = SuperHero('Batman')

    # call to method in class, returns "I'm Batman!"
    batman.shout()

    # returns "I'm Batman!"
    SuperHero.shout(batman)

More on Classes in Python here: http://docs.python.org/tutorial/classes.html

Modules

Modules are Libraries that hold common definitions and statements. They can be combined into an importable module.
More on modules here: http://docs.python.org/tutorial/modules.html

To use a module, use the import statement:

import math

# returns 1.0
math.sin(math.pi / 2)

Some commonly used modules

  • math – trigonometry, the constants e and pi, logarithms, powers, and the like.
  • random – random number generation and probability distribution functions.
  • os – tools for talking to your OS, including filesystem tools in os.path.
  • sys – various system information, as well as the handy sys.exit() for exiting the program.
  • urllib2 – tools for accessing Web resources.

Useful modules: http://wiki.python.org/moin/UsefulModules

Error & exception handling

import sys
try:
    f = open('myfile.txt')
    s = f.readline()
    i = int(s.strip())
except IOError as (errno, strerror):
    print "I/O error({0}): {1}".format(errno, strerror)
except ValueError:
    print "Could not convert data to an integer."
except:
    print "Unexpected error:", sys.exc_info()[0]
    raise

More: http://docs.python.org/tutorial/errors.html

Fun – easter egg; The antigravity module

Released in Google App Engine on April 7, 2008. The antigravity module (http://xkcd.com/353/) can be enabled like this:

import antigravity

def main():
    antigravity.fly()

if __name__ == '__main__':
    main()

Speed – always a common topic

Classic computer programs had two modes of runtime operation = interpreted (as code runs) or static (ahead-of-time) compilation.

Just-In-Time compilation (JIT), also known as dynamic translation is a new hybrid approach. It caches translated code (bytecode into native machine code) to minimize performance degradation. Used in .NET, Java & Python via PyPy.

PyPy is a fast, compliant alternative implementation of the Python language. It has several advantages and distinct features like Speed (Just-in-Time JIT compiler), Memory usage (better then CPython), Compatibility (works with twisted & django frameworks), Sandboxing (run untrusted code), Stackless (providing micro-threads for massive concurrency). Check it out: http://pypy.org/

Recommended

Books

Websites/following

Finally, it is important that you have a network of like minded people around you whom you can regularly work on Python with, bounce ideas & question and support (help) each other out.

Happy Learning and if you have any questions please contact me. Always happy to help.

~ Ernest

G’day mate – Aussie & American vocabulary comparison

It’s always a funny experience when I run into a language barrier with my American friends and work colleagues. The problem is most commonly  with my helping verbs. Today, over lunch the 3 of us Aussies shared some terms we ran into which our American friends have / had trouble recognizing. In spirit, I compiled a list to get us Aussies accustomed to the choice of words to use when speaking with our American friends.

The list – vocabulary comparison

The following list of words are angled from an Aussie wanting to convey a message. Use the American column as a guide to see what an American will understand, and adjust accordingly.

Word American Australia
Boot Something that goes on your foot The trunk of a car
Texta { confusion } Say: marker A marker & also a brand
Thong G-String (the underwear) & something Borat wears A sandal held on the foot by a strip. For reference, a La Tribe Sandal.
Fortnight { confusion } Say: 2 weeks A period of fourteen consecutive days
Soft drink Say: soda/ pop / soda-pop Nonalcoholic beverage (usually carbonated)
Takeaway { confusion } Say: to-go Prepared food that is intended to be eaten off of the premises
Lemonade Drink made from lemon juice, sugar, and water – not carbonated Fizzy lemon drink
Arvo { confusion } Say: Afternoon The hours after 12pm
Pissed Very angry. Say: drunk Some who is drunk
Chemist { confusion } Say: Pharmacy / Drug store The place you buy medicine, shampoo, cough syrup and lotion
Ute { confusion } Say: Truck An automotive vehicle suitable for hauling
Napkin A piece of paper or cloth you use to wipe your face and hands when you eat A women’s sanitary product
Barbie It’s an anatomically incorrect female doll that comes in a pink box What you put beef, shrimp and chicken on to grill it outside
Rubber A contraceptive device An eraser

Employment tip

If you are using British syntax/terms (non American spelling) in your LinkedIn resume, fix it up immediately or crowd source it to an American to correct the grammar for you. Recruiters in Silicon Valley use LinkedIn to “keyword search” for potential candidates using the American grammar. So you may miss out on potential offers if you are in the market for a new opportunity.

Keep these differences in mind next time chatting with an American

An America (SFFD) with an Aussie (me)

If you have other words which you ran into please share them below in the comments section and I will add to the list above.

Happy conversing!

~ Ernest