Một Số Vấn Đề Bảo Mật Với PHP Tài Liệu Chọn Lọc Tiếng Anh

Discussion in 'Hacking & Security Tutorials' started by admin, Feb 21, 2018.

  1. admin

    admin Administrator Staff Member

    Overview of Handling Data with Best Practices in Mind
    Handling user-supplied data is a big part of many web applications, and it's critical that this is done properly to prevent security holes. There are a number of best practices and principles we can follow when handling data, though I'll just be covering the ones I feel to be the most important:
    1. Treat all user data to be tainted until it has been validated; never assume the integrity of such data (guilty until proven innocent).
    2. Make your users apply by your validation rules. That is to say, do not attempt to correct any invalid data because this gives the potential for security vulnerabilities to arise.
    3. Keep track of data as it enters and exits parts of your application. This is critical in order to be able to tell what data is potentially tainted, and what data has been validated and is safe to use.
    1. Minimise exposure of sensitive data. This covers not storing passwords in cookies, not using the HTTP GET method as a way of requesting passwords, not storing configuration files in the document root, and so on.
    2. Defence in Depth - the advocation of using redundant safeguards. This can help to improve the security of a web application through having additional levels of safeguards in-place (that should never have to be used, but are there just in case).
    Filtering Input
    Filtering input should be done whenever applicable to prevent junk data from entering a web application. It is performed upon the data coming into an application where its validity is inspected. There's a number of ways we can filter our users' input, though the method you choose will be dependent upon the input data you're looking to manipulate. As such, I'll be running through just a few commonly used functions and libraries to give you more of an idea of how this inspection process works. I'll (try to) explicitly reference the practices and principles stated above when I use them.
    The Character Type Functions (ctype_) The character type functions are from the Ctype extension, which is full of handy functions that can be used to validate user input. It does this by checking the characters of a string to see if they're of an appropriate type, much like a simplistic regular expression. All of the Ctype functions are known as predicate functionsbecause they only return a boolean value (TRUE or FALSE). Here's a list of the Ctype functions:
    ctype_alnum() — Checks for alphanumeric character(s)
    ctype_alpha() — Checks for alphabetic character(s)
    ctype_cntrl() — Checks for control character(s)
    ctype_digit() — Checks for numeric character(s)
    ctype_graph() — Checks for any printable character(s) except space
    ctype_lower() — Checks for lowercase character(s)
    ctype_print() — Checks for printable character(s)
    ctype_punct() — Checks for any printable character which is not whitespace or an alphanumeric character
    ctype_space() — Checks for whitespace character(s)
    ctype_upper() — Checks for uppercase character(s)
    ctype_xdigit() — Checks for character(s) representing a hexadecimal digit

    Tip:Ensure that you're always passing in strings to these functions, even if the values are numeric. This is because the PHP manual states:
    "If an integer between -128 and 255 inclusive is provided, it is interpreted as the ASCII value of a single character (negative values have 256 added in order to allow characters in the Extended ASCII range). Any other integer is interpreted as a string containing the decimal digits of the integer."
    Further Reading: An Introduction to Ctype Functions
    PHP Manual - Ctype
    filter_var The filter_var function accepts three arguments: the variable to validate, the filter to apply (a constant), and any optional flags to be set on the filter used. Some simple scenarios where you're going to want to use this function is for validating URLs and E-mails. Validating them with regular expressions is not a good idea, even if you know your way around them.

    The function has two primary types of filtering: validation and sanitisation. Validation filtering will check for invalidity in the data, where FALSE is returned if data integrity is not met, and upon success the data is returned. Sanitisation filtering will attempt to replace any invalid characters and return the sanitised string (according to the filtering type used - this does not mean it is safe to exit it from your application without further sanitising).

    Here's a few simple and common use-cases:

    Validation filtering:
    PHP Code:

    $email = '[email protected]';
    if(filter_var($email, FILTER_VALIDATE_EMAIL) !== FALSE) {
    // valid email

    $url = 'http://domain.tld';;;
    if(filter_var($url, FILTER_VALIDATE_URL) !== FALSE) {
    // valid URL

    $age = 20;
    $options = array('options' => array('min_range' => 18, 'max_range' => 100));
    if(filter_var($age, FILTER_VALIDATE_INT, $options) !== FALSE) {
    // valid age
    (More examples on PHP.net)

    Sanitisation filtering:
    PHP Code:

    $output = 'Protecting against XSS: <script>alert(0)</script>';
    echo filter_var($output, FILTER_SANITIZE_FULL_SPECIAL_CHARS);

    $int = 3.3;
    echo filter_var($int, FILTER_SANITIZE_NUMBER_INT); // 33
    // Note that it omits invalid characters, rather than truncating the input like other integer-validating functions
    (More examples on PHP.net)

    Further Reading: Filters
    It's all about Type It's a well-known fact that PHP is a loosely-typed language. Data types do not need to be explicitly stated before variable definitions or function parameters, and method signature types do not need to be specified either. But that's not to say variable type is not important though.

    Tip:It's always best practice to perform strict comparisons because of the loosely-typed nature of PHP.
    Type-CheckingType-checking in PHP can be done with the is_ functions - a set of predicate functions that return TRUE if the type is correct, or FALSE otherwise. The following is a list of these functions:
    Type-hinting Support for type-hinting was first introduced in PHP 5, and has been a much-loved feature of the PHP community. Method parameters should take advantage of type hinting when possible because of the improved maintainability it provides, along with the less error-prone code it produces (that is also partially self-documenting). PHP supports the following types: objects, arrays (as of PHP 5.1), callables (as of PHP 5.4), and iterators. If a variable of the incorrect type is passed as an argument to a function, then a fatal error is produced.

    Type hints are used like so:
    PHP Code:
    comments section on PHP.net, though beware that some may slow down the performance of your PHP applications.

    Type Casting When we perform a type cast operation in PHP, we change the variable type it is currently casted to. PHP supports the $var = (type) $var; syntax (similar to C and Java), where (type) can be any one of the following:
    (int), (integer) - cast to integer
    (bool), (boolean) - cast to boolean
    (float), (double), (real) - cast to float
    (string) - cast to string
    (array) - cast to array
    (object) - cast to object
    (unset) - cast to NULL

    Type casting is commonly done as a method of validation for integers from user input:

    PHP Code:

    if(isset($_GET['id'])) {
    $id = (int) $_GET['id']; // ensure that the id from the HTTP GET method is of an integer type
    We can also use the settype() function to force a variable to a particular type.
    The Whitelist ApproachWhitelisting assumes that there will be a limited scope of validity in the data (such as an image uploader, where the file type is limited to that of images). We provide the only possibilities that the data can be, and anything else is discarded as invalid. This is commonly done with an array, where the in_array() function checks that a value exists within the array, and is therefore valid.

    PHP Code:

    $languages = array('PHP', 'JavaScript', 'Ruby', 'Elixir');
    $inputLanguage = 'VB.net';

    if(in_array($inputLanguage, $languages, TRUE)) {
    // valid language
    // invalid language
    We could also use an if/elseif/else or switch statement - though these are more commonly used for flow control logic with simple comparisons, rather than for whitelisting potential values.

    Tip:Always give the third argument to the in_array() function (as TRUE) to preform a strict comparison, unless absolutely necessary. Performing a strict comparison (equivalent to tri-operator comparison: ===, !==) of value- as well as type-checking is important to prevent strange things from happening (check out the "The Mystery of Value Appearance" section of this article).

    The opposite to the whitelist approach is to provide a blacklist of all unwanted values. This is done only if you know what possibilities aren't allowed, such as an IP address blacklist.
    Regular ExpressionsRegular expressions, or regex, are used for checking the format of input data and matching complex patterns. They should be used sparingly since they come at a cost of performance, but are a powerful and concise DSL (Domain-Specific Language) when used. They do require good knowledge of PCRE regex, and the patterns used should always be extensively tested before being deployed since their complexity can make it easy to slip-up.

    Due to the amount of content there is to cover when teaching regex, it will have to be done in another tutorial. But for now, if you'd like to check out how to use regular expressions, then I'd recommend the following websites:
    Escaping Output
    Escaping output to prevent interpretation of it is a method of preservation that is carried out upon data exiting an application. There are two primary exits of data from an application: to the browser as client-side code, and to the database inside queries.

    Password Security
    Passwords are probably the most sensitive piece of user information you'll be storing within your web application. When people register on your website, they place a certain amount of trust in the security methods used within your web application, and expect their sensitive information to be securely protected.
    Every now and then, you'll hear large websites become victims of attacks that have lead to their database being compromised. This causes not only user account problems for that website, but also for other websites where those same affected users have used their same password there. There's also a lost of trust from users and bad publicity to deal with amongst other things. For this reason, it is critically important for you to ensure your website is secure. But mistakes happen, and so you need to ensure that if in the event your database is compromised, your users' passwords are securely stored (i.e. hashed) within your database so that they can't be revealed easily.
    I will therefore be talking about how to properly hash passwords to ensure security in case of a database breach. But first, let's review the hashing algorithms that you should not use within your web applications.
    Definition List HashingA one-way process of turning a string of characters into digest according to the hashing algorithm used.DigestThe regurgitation of characters from a hashing process.CollisionWhere different string inputs have the same digest. This occurs because of the potentially infinite input of characters, with only a finite output (which is dependent upon the digest size of the hashing algorithm).Collision RateThe frequency of collisions in a hashing algorithm. The smaller the digest, the higher the collision rate (and vice-versa).SaltA randomly generated string of characters that is hashed along with a password to prevent dictionary and rainbow table attacks.
    Hashing Algorithms You Should Not Use
    There are a few cryptographic hashing algorithms you should eschew when building your web application. They are either considered 'broken' or don't have a sufficient amount of computations to be considered 'secure' anymore.
    MD5 MD5 is a now-antiquated hashing algorithm that produces a 128-bit (32 character) digest of hexadecimal characters. Security flaws have been found in the algorithm, and because of its small digest, it is vulnerable to higher collision rates than other modern-day hashing algorithms. This algorithm is also very quick at generating a digest, and so whilst it's good to use for integrity checks (such as hashing files and comparing the digest), it is not suitable for usage upon passwords.SHA1SHA1, like MD5, is also considered outdated. It produces a 160-bit (40 character) digest of hexadecimal characters. Security flaws have too been found in this algorithm, and whilst its digest is larger than that of MD5 (giving it a slightly lower collision rate), it still has an overall high collision rate. This algorithm is also very quickly at computing digests, and because of these reasons it should also be eschewed if you're looking to protect sensitive information via hashing.
    Hashing Algorithms You Could Use
    The following is a list of hashing algorithms that you could use in your web applications. They're more preferred than the aforementioned (since they are not considered 'broken' and have a sizeable digest), but less preferred than the next sub section of algorithms (due to their speed).
    SHA2 Family The SHA2 family supersede SHA1 by creating longer digests (that are therefore more computationally expensive). These algorithms are slower to compute than the other two aforementioned hashing algorithms, and because their digests are much longer (SHA-256, SHA-384, and SHA-512 generate a 64, 96, and 128 character digest respectively), they have a much lower collision rate.
    Here's an example of using PHP's built-in hash_hmac() function (please read the "enforce security with a salt" section for more information on 'salting' your password):
    PHP Code:
    $digest = hash_hmac('sha512', 'MyPassword', 'salt_here');
    Unfortunately, the SHA2 family are still very fast to compute (and speed is one area a hashing algorithm doesn't want). This problem will only ever get worse as computing power increases, and so these still aren't the preferred hashing algorithms to use.
    Hashing Algorithms You Should Use
    These algorithms are the preferred way to hash sensitive information. The reason being is that they have the ability to specify a work factor, where we can say how expensive we'd like our hashing to be. This is important because as computing power increases yearly (according to Moore's Law), we want to ensure that our hashing algorithm takes longer to compute (i.e. be scalable with hardware) - otherwise it will make generating rainbow tables a lot easier with time. This is something none of the aforementioned hashing algorithms allow for, which is why the following algorithms are the most preferred.
    bcryptBcrypt is something every security-conscious developer should look into. The API for using bcrypt prior to PHP 5.5 was something that confused many people new to the password hashing scene. Fortunately, this changed in PHP 5.5 with the advent of the password_ functions (see this tutorial covering them too). Since then, a couple of libraries have been released that expose the same API as the new password_ functions to make using bcrypt easier (see here for more information about this).
    There are also numerous other posts on both StackOverflow and Security.StackExchange that are well worth reading through for those of you who have an interest in cryptography and cryptanalysis (like myself):
    Should I Nest Hashing Algorithms?
    Nesting hashing algorithms is where the output of one hashing algorithm (the digest) acts as the input of another hashing algorithm, and so we end up rehashing generated digests multiple times. This alone is not good enough for security purposes, and will in fact make your passwords less secure. This is because the input going into a hashing algorithm has an infinite number of possibilities, whereas the digest coming out the hashing algorithm is finite depending upon the digest size and characters used. Thus, we increase the collision rate through lack of entropy when chaining simple hashes together.
    Tip:Cryptography is a complex and very involved topic. It can be very easy to fool yourself into thinking you have created a cryptographically strong hashing algorithm, when in fact you've only weakened the original hashing algorithm used. A good rule of thumb is to stick to the widely adopted hashing algorithms and to avoid creating your own unless you really know what you're doing.
    Nesting hashing algorithms does, however, have the advantage of increasing the computational time, which makes brute force attacks longer (to the point where they may no longer be a cost-effective approach). When done right, nesting hashing algorithms can also mean the collision rates do not increase either. One good example is the PBKDF2 key derivation algorithm. This is where the password is injected into each round of the hashing chain, therefore keeping the entropy there to prevent increasing collision rates, but also making the finishing digest more computationally expensive to generate:
    hash(hash(hash(hash(hash(hash(password+salt) + password+salt) + password+salt) + password+salt) + password+salt) + password+salt)
    PHP has a function for this: hash_pbkdf2(). Unfortunately it is only available to those running PHP 5.5 or higher - but again, using bcrypt is still the preferred method of hashing passwords.
    Enforce Security with a Salt
    The main purpose of a salt is to prevent both precomputed attacks (such as rainbow tables, where a table of digests can be used to perform a lookup upon a particular digest), and dictionary attacks (see Section B, Part 4 - Brute Force and Dictionary Attacks).
    When generating a salt, we need to ensure that the salt itself is considered cryptographically secure. This means there needs to be plenty of entropy in the generation process to ensure that a sufficiently random string of characters is generated. For generating these salts, this means that you should opt to use the likes of openssl_random_pseudo_bytes() over mt_rand() (read here for more information about randomness issues).
    Salts are commonly generated on a per-user basis. This means that when storing a user's details, you will need to store their unique salt alongside their hashed password. Whilst this does not slow down attempts of cracking individual passwords, it greatly slows down trying to crack a whole table of passwords. This is because of the added inconvenience of using different salts for each password, which only enables the attacker to compute digests for a single password at a time when attempting to crack them (rather than directing the attack at the whole table at once).
    Further Reading: Risks and Challenges of Password Hashing
    PKCS #5 v2.1: Password-Based Cryptography Standard
    Remember to still force your users to use good passwords within your application logic. Making your users use a minimum of X characters in their password with at least one non-alphabetical character is good practice for ensuring security on their behalf.

    Minimising Exposure and Minimalist Privileges
    Minimising Exposure
    Exposure of data that need not be seen (especially sensitive data) can come in many forms. When building your Web applications, it is important to be conscious (if not, then outright paranoid) of these potential exposure points and to ensure that you're properly preventing them from leaking such data.
    HTTP GET Method Exposure The HTTP GET method passes data via the query string of a URI. The openness of this makes it unsuitable for passing sensitive information, such as using it for login forms where passwords are involved. This type of exposure threat may seem insignificant, but given that your web browser most probably logs your history, you password will be kept in plain text there until you decide to clear your browsing history. Given how easy it is to avoid this (by simply using the right HTTP method - POST in this case), it's something you should definitely be aware of when requesting sensitive data from your users.

    Cookie Exposure Exposure of data in cookies is usually done through either browser vulnerabilities (very rare) or through cross-site scripting attacks (XSS) (much more common). Given how common XSS attacks are on the Internet today, it is important that you ensure the cookies you're creating for your users cannot be accessed by client-side languages (namely JavaScript). The setcookie() function has a seventh parameter of being able to specify whether the cookie can only be accessed via the HTTP protocol - this should always be set to true unless you specifically need to access your users' cookies through JavaScript (which is unlikely). This still leaves the potential problem of browser vulnerabilities, and so you should still avoid storing sensitive data (like a user's password) inside of their cookie. There's still the unavoidable storage of the session identifier, which can lead to session hijacking if a user's cookie is compromised, but that's a far lower risk with the impediment of client-side access to cookies.

    Session Exposure Sessions are stored on the server-side (by default in the file system, though that can be easily overridden for database storage). This means that we don't have to typically worry about the exposure of their contents - but that doesn't mean you can simply ignore it completely. You can minimise exposure of session data as it is sent to and from the client and server through enabling SSL, protecting HTTP requests and responses. You can also store your sessions in a database and encrypt that database for additional security (though that is perhaps getting a little too paranoid).
    Database Credentials In order to access to your database, you must store the database credentials somewhere in your file system. This is particularly worrying because of the plaintext, out-in-the-open nature of these sensitive details. There are a couple ways to minimise this type of exposure, depending upon the authority you have over your environment at hand.
    If you only have a public document root (common on shared hosting environments), then setting up some some Apache directives to block HTTP access to such files is a good start. If you are allowed to place content outside of the web root, then moving your configuration (or in general, any included) files outside of direct URI access would be a good step to take. Both of these methods, however, are still vulnerable to Local File Inclusion.
    This is where using Apache's environment variables comes in. We can use the SetEnv Apache directive to setup variables to hold our database's access credentials (the username and password), and then access these variables through PHP's superglobal array, $_SERVER. (For more information about this, look on the official Apache website.)
    (I wrote the above with the assumption that an Apache web server is being used. I don't, unfortunately, have enough knowledge on other web server setups (like ones using Nginx) to be able to write about them too. If anyone would like to contribute a paragraph with respect to the above that covers a different server environment, then let me know and I'll update this post and give credits where they're due.)

    Application Errors Outputting application errors to userland both degrades user experience and gives out unnecessary information about your website. These can (and most probably will) be used against you by web application exploiters when they're pen-testing parts of your website for breakages. It is therefore important that you log all errors in a production environment, and ensure that you fix all warnings and errors in your development environment (by ensuring that you have the highest level of error reporting turned on when building your web application).
    Minimalist Privileges
    You should always give out the minimum required privileges to perform actions you are looking to carry out. This is something that is often overlooked by web developers, which makes it all the more important to stay aware of. If you don't require a file to have read, write, and execute permissions for all three groups (owner, group, and world) then don't give the file those permissions.
    It's a lazy habit, just like giving a database user full privileges to access your database. If those details are compromised, then anything can be done to your database. Many applications commonly only require INSERT, UPDATE (with the WHERE clause specified), and SELECT statements (things aren't usually DELETEd from your database via an application request, and we rarely use DDL statements either). As such, the user account accessing your database should only be given these (minimal) privileges. (Take a look at the "Not All Users Are Created Equally" section from this article.)

    Structured Query Language Injection (SQLi)
    What is it?SQLi is the execution of user-supplied SQL code into an application. It can cause damage to the data store involved, damage and/or theft to the data being persisted, and can also give unauthorised access to admin and user accounts.
    You'll hear of first- and second-order injection attacks. When we prepare our SQL statements to insert user-supplied data, we're preventing first-order injection attacks. A second-order injection attack is just where malicious data that is lying dormant inside of a database is queried and then directly reused inside of an unprepared query. This may happen because the developer forgets that the data being stored inside of his database actually originates from the user - and if no validation was applied upon initial insertion, then that data is still just as lethal to the database as it was upon first insertion.
    Good application design can help to prevent this, as well as being aware of where the data inside of your application has come from (good practice #3 from section A). So provided you're preparing your queries from this first- and second-hand data, you should have no problems at all.
    How do I Prevent it?Prevention of SQL injection comes in two forms. The first is optimistic escaping where input data is simply escaped before being sent to the data store. The second is pessimistic sanitising, where the data is firstly validated through integrity checks to ensure it contains expected and valid values. It is then inserted into the data store (with or without protection from escaping - depending upon the strictness of the previous validation rules).
    Optimistic Escaping
    How we escape the data before being sent to the database will depend upon the API we're working with. For the purpose of this section, we will be focusing on using PDO, but will also occasionally reference the MySQLi API since it is also commonly used.
    The typical method of escaping user input to be sent to a database is to use prepared statements (otherwise known as parametrised queries). These work by safely binding values (either in the form of literals or variables) to a query before it is executed, mitigating all possible injection attacks and leaving no room for human error.
    Caveat:Be aware that the mysqli_real_escape_string() function's primary use is to only escapes quotes (single or double) and backslashes; it does not escape grave accents (commonly used in MySQL to bypass naming convention restrictions with reserved words). If it is also used upon a parameter that is not encased inside quotes (because the value is expected to be an integer) inside a query, then its usage becomes redundant. It's human error like this that can leave your web application open to attackers.
    PDO supports named and unnamed (also known as positional) placeholders, unlike the MySQLi API which only supports unnamed. When preparing our queries, PDO also lets us explicitly or implicitly bind parameters to our queries (again, unlike the MySQLi API, which supports explicit binding only). Let’s start by looking at how we can explicitly bind both variables and literals (the MySQLi API cannot bind literals either) to our prepared queries through an example:

    PHP Code:
    // binding variables
    $insertQuery = $db->prepare('INSERT INTO table_name VALUES :)columnA, :columnB)');
    $insertQuery->bindParam('columnA', $valueA, PDO::pARAM_STR);
    $insertQuery->bindParam('columnB', $valueB, PDO::pARAM_INT);

    // binding literals
    $selectQuery = $db->prepare('SELECT columnB FROM table WHERE columnA LIKE :value');
    $selectQuery->bindValue('value', "%{$value}%", PDO::pARAM_STR);
    $result = $selectQuery->execute();
    Tip:Named placeholders must always begin with a colon, and then simply follow the same naming conventions as variables in PHP. (This means named placeholders are also case-sensitive.)
    The bindParam() and bindValue() methods above are used to bind parameters to a prepared query, and they require at least two arguments, along with an optional third.
    The first argument is the name of the placeholder, the second argument is the variable or value we want to bind to the query, and the optional third argument is the type to bind the variable/value as (the default is PDO::pARAM_STR, however I always specify the type for clarity). The bindParam() method also enables us to specify an optional fourth and fifth parameter, the data type length and any additional driver options respectively.
    Tip:The bindParam() and bindValue() methods are orthogonal to one another; either or both of them can be used upon one query when binding values to it. This, however, does not work with named and unnamed placeholders; only one or the other may be used upon an individual prepared query.
    Lastly, the execute() method is invoked upon to execute the prepared query once all parameters have been bound to it.
    Implicit binding is effectively the short-hand version of the above, where we simply prepare our query and head straight to the execute() method, passing to it all parameters to be bound to the query in an array format. The following is an example of using positional placeholders in an implicitly bound parametrised query:

    PHP Code:

    $insertQuery = $db->prepare('INSERT INTO table_name VALUES (NULL, ?, ?)');
    $insertQuery->execute(array('valueA', 'valueB'));

    if($insertQuery->rowCount() !== 0) {
    echo 'Success';
    Tip: If named placeholders are being used, then an associative array will need to be passed to the execute() function (keys being placeholder names, and their respective values are those that need binding); if positional placeholders are used, then an indexed array is passed to execute().

    We begin by preparing our query and putting the unnamed placeholders (the question marks) in position, and then invoke the execute() method with an array containing the values to be bound to our prepared query. This array being passed can contain either (or both) variables and strings. The downside to this short-hand method is that we aren’t able to specify the type of parameters being bound to our prepared query. Next we question if any rows were updated by using the return value from the rowCount() method, which will contain the number of rows affected from the previous operation. Provided the number of rows does not equal zero, then we consider it a success.
    Tip: Always be sure to disable emulating prepares when using the PDO API, since it emulates prepares by default (which can enable for edge-case security vulnerabilities). The MySQLi API always does true prepared statements, and so this isn't a worry if you're using it.
    Pessimistic Sanitising
    As was seen in section A of this series, validating data can be done through either a comparison of submitted data to a set of predefined values (i.e. using the in_array() function), or by forcing input data to the correct type (i.e. typecasting), or alternatively by returning an error to the end user if the data submitted is invalid. We will therefore not be covering this form of sanitising again, so please refer back to section A, part 1 of this series for more information.

    Cross-Site Scripting (XSS)
    What is it?XSS is the injection and parsing of client-side code into web pages. This attack occurs when your web application outputs anything provided by your users (whether it's from your data persistence store, or from a recently submitted form) onto the webpage without escaping the data beforehand.
    How do I Prevent it?There are two functions you can use to mitigate XSS attacks: htmlspecialchars() and htmlentites(). These two functions are used for preservation of text to prevent the web browser from interpreting any client-side language it may contain. These two functions therefore have a different purpose than the strip_tags() function (see later on for why).
    The only difference between htmlspecialchars() and htmlentites() is that the former function translates only special characters (&, ', ", <, >). The latter function on the other hand translates all characters which have HTML character entity equivalents, into those entities.
    Caveat:If you're running a version of PHP prior to PHP 5.4 then you must provide the encoding type for these functions. This is because the default encoding type was ISO-8859-1, and so outputting characters such as the pound (£) and euro (€) signs would produce different output results from the original input. It is considered good practice to always specify the encoding type with these functions.
    Tip:It is considered good practice to not use these functions upon inputting data into your database. This is because you may choose to change the way you would like to output your data, such as using the strip_tags() function upon the output to just show raw text (see below for the strip_tags() function).
    Another function that some people use to mitigate XSS attacks is strip_tags(), but they really shouldn't. It accepts two arguments; the first is the string to sanitise (strip of html tags) and the (optional) second argument is the whitelisted HTML tags not to strip. The way this function works is that it'll look for an opening < sign and then a closing > sign, and then everything in between the two signs (including the signs themselves) are deleted from the string (regardless of whether it was an actual HTML tag or not). This function may seem helpful because it gets rid of any unwanted HTML code; though there are serious draw backs to using it as a prevention method to XSS.
    This first problem is that the function relies on the tags being correctly entered, i.e. having an opening and closing angular bracket (<, >). If this is not the case then the poster may find large amounts of their data/post (if not all) being deleted. Here is a quick demonstration of this:

    PHP Code:
    $string = 'To initiate the execution of php code, we must start our PHP script with the opening tag, <?php. At the end of the PHP code, we can close off the script with a closing tag, ?>. This is <<em>basic</em> PHP knowledge.';

    echo strip_tags($string);
    The above would output:
    To initiate the execution of php code, we must start our PHP script with the opening tag, . This is
    So as you can see, we've lost most of the second sentence because of a misplaced opening angular bracket. Another thing from the example above you should have picked up upon is that we have lost most of the first sentence as well. This is where the second problem of using strip_tags() arises; accidental usage.
    There is however a time and a place to use strip_tags(), and this is when we'd like output raw content when we know there is valid, non-malformed HTML. The scenarios above would have derived from using the function upon input of data before converting all current HTML entity equivalents (with either htmlspecialchars() or htmlentities()) into their respective entities. One valid place to use the strip_tags() function would be when wanting to get or output plain text from a bulletin board (such as from this thread). This is because all of the HTML entity equivalents in the post would have been converted to their respective entities, and any valid BBCode used would have been converted to valid, well-formed HTML that can be safely stripped.
    Cross-Site Request Forgeries (CSRF)
    What is it? CSRF is a method of attack where a victim unknowingly sends forged requests, set up by an attacker. They have the potential to occur upon any actions that haven't been verified through a validation process. Without taking specific measures to intentionally prevent CSRF, users of a web application can be directed to another web page and unintentionally load a image or a javascript-submitted form to execute a specific action in the background.
    How do I prevent it?The solution to tackle this attack type is to use a security feature known as a nonce, where a unique token is passed through the request URI (via the HTTP GET method), which is then validated by requested script on the other end with a session variable.
    Here's a quick example to demonstrate:
    PHP Code:


    $_SESSION['nonce'] = bin2hex( openssl_random_pseudo_bytes(10));
    <!DOCTYPE html>

    <a href="action.php?do=delete&id=1&tok=<?php echo $_SESSION['nonce']; ?>">Delete Something</a>


    PHP Code:



    if(isset($_GET['tok']) && $_GET['tok'] === $_SESSION['nonce']) {
    #valid request

    The above gives a URI example of a HTTP GET request used to perform an action. The unique token (in the session variable) is echoed out so that it's in the URI link when the users clicks it on the index.php page, making the link valid only on that page (when they legitimately want to use that action). If the link is used without the request token, then the action is deemed invalid and is not carried out.

    Brute Force and Dictionary Attacks upon Forms
    What is it?A brute force attack is where all combinations of characters are used in an attempt to find a user's password. It is more formally known as an enumeration attack.
    A dictionary attack on the other hand is where words are used. It can be used in conjunction with a brute force attack to form a hybrid attack.
    How do I prevent it?There are a number of techniques we can implement into login pages to prevent such attacks, as shown below.
    Page TimingSessions are used to maintain state within our web applications over a number of HTTP request calls. We can make use of them for timing form submissions to ensure that our form is not spammed or vulnerable to bruteforce/dictionary attacks. The following shows how we can implement such as system in PHP:
    PHP Code:
    <?php session_start(TRUE);

    if(!isset($_SESSION['mintime'])) {
    $_SESSION['mintime'] = array(0, time());

    if(isset($_POST['submit'])) {
    if($_POST['password'] && $_POST['username']) {
    // form validation here

    array_push($_SESSION['mintime'], time());

    if($_SESSION['mintime'][1] - $_SESSION['mintime'][0] < 2) { // minimum time (in seconds) between valid form submissions
    // form submitted too quickly
    // successful form submission
    // incomplete form data
    <!DOCTYPE html>

    <form method="POST">
    Username: <input type="text" name="username" />
    Password: <input autocomplete="off" type="password" name="password" value="" />
    <input type="submit" name="submit" value="Login!" />

    Upon first landing on the above page, a session variable called mintime is created. It holds two values (time of initial page load and time of form submission), and is responsible for keeping track of the time between form submissions. With each subsequent form submission, a new value of the current time is pushed to the end of the $_SESSION['mintime'] array and the old time is removed from the beginning. Using these two times, we can find the difference between them and check to see if it is greater than the minimum amount of time between valid form submissions (which is 2 seconds in the above example).
    One thing to note with this method is to ensure that at least the password field does not have a prefilled value (a feature often offered by most modern-day browsers). This is because we need the end user to manually type in their password to ensure that they aren't able to immediately submit the form upon landing on the page (which would cause an invalid login). We do this by setting the password field using the value attribute, and by setting the (HTML5) autocompleteattribute to off.
    reCAPTCHARecaptcha is becoming increasingly popular as a method to prevent bots from performing form submissions. For those of you who do not know what reCAPTCHA is, see this web page for more information. For this example, we will be using Google's reCAPTCHA, where we only manipulate the form data submitted once the reCAPTCHA has been correctly solved.
    First things first, you will need to sign up for Google's reCAPTCHA. Having done that, you will now have your public and private keys at hand to be used in the following example. The next thing to do is to download the library associated with PHP, which can be found here. This will need to be included in both the generation of the reCAPTCHA within the form you're looking to protect, and also within the validation logic of said form.
    Here's what our form page may look like with Google's reCAPTCHA in place:
    PHP Code:

    require_once 'recaptcha/recaptchalib.php';

    if(isset($_POST['submit'])) {
    $privatekey = 'private_key_here';
    $resp = recaptcha_check_answer($privatekey,

    if(!$resp->is_valid) {
    // What happens when the CAPTCHA was entered incorrectly
    die ('The reCAPTCHA wasn\'t entered correctly. Go back and try it again. (reCAPTCHA said: '.$resp->error.')');
    // Your code here to handle a successful verification
    <!DOCTYPE html>
    <html lang="en">

    <form method="POST">
    Username: <input type="text" name="username" /><br />
    Password: <input type="password" name="password" /><br />

    $publickey = 'public_key_here';
    echo recaptcha_get_html($publickey);

    <input type="submit" name="submit" value="Log in" />

    The PHP code at the top of the page is the validation part, where an error is output if the input for the reCAPTCHA fails validation. If all goes smoothly, then we can continue to add our own validatory code within the else statement. For our form, we only need to echo out the reCAPTCHA form with the corresponding public key as the parameter for the reCAPTCHA generation function, recaptcha_get_html().
    That's about as basic as it gets to adding reCAPTCHA to a form. You can visit Google's documentation on using its reCAPTCHA with PHP and other languages here.
    Lengthening Successful Form Submissions
    The usleep() function in PHP causes a script to pause for a time in microseconds (there are 1 million microseconds to a second). It can be used after a successful form submission to slow down the page ever so slightly so that real users won't notice its effect, but brute force/dictionary attacks will. Even setting the value of usleep() to 200000 microseconds (0.2 seconds) will greatly impede such attacks, rendering them useless. Here's a quick example usage:

    PHP Code:


    if(isset($_POST['submit'])) {
    if($_POST['password'] && $_POST['username']) {
    // form validation here


    // successful form submission
    // incomplete form data
    This method is more of a quick hack to prevent this type of attack, and it's one that I personally would not use because there are better alternatives (as discussed above).
    The techniques above are common ways to mitigate such attacks. There are other techniques that were not discussed above which can be deployed, such as account locking based on a certain number of attempts per IP address or requiring an answer to a random, pre-computed question. Using a computationally-slow hashing algorithm can also be a somewhat effective measure - though alone it will still allow for abuse on your Web form (which is something we want to mitigate).

    Local File Inclusion (LFI) and Remote File Inclusion (RFI)
    What is it?Local file inclusion is where the file system of a Web application is traversed and a file is included where it should not have been (a common exposure attack).
    Remote file inclusion is where a file from another website is included into a Web application (commonly to execute malicious code).
    How do I prevent it?Both of these vulnerabilities only arise when dynamic paths are used (specifically, where user input is made apart of that dynamic path). They both require serious oversights of the Web developer, and yet they're both so simple to prevent.
    For LFI prevention, if you know the file name you're looking to include, then using the whitelist approach (LINK TO IT) (as described in section A) can be used to ensure that only verified files are included. If, on the other hand, you don't have a list of valid file names to include, then you can perform some basic sanitisation to prevent directory traversing. The basename() function can do just this:
    PHP Code:

    if(basename($_GET['file_name']) !== $_GET['file_name']) {
    // invalid file specified
    The basename() function will evaluate its parameter and will return only the trailing name from it. Therefore, if a path is given (perhaps for an LFI test), the function will return only the file name or the last directory in the specified path. Neither of these return values would pass the above validation.
    You can prevent RFI in much the same way as in the above case. Also, as an extra preventative measure against RFI, you can also disable the allow_url_fopen directive to nullify the ability of referencing remote resources.

    Session Identifier Acquirement and Session Hijacking
    An Overview of Sessions
    How are They Used?Sessions are a solution to the stateless nature of the HTTP protocol. They enable requests made by the same web user to be linked to one another in the form of transactions, allowing for sequential tasks to be performed. This could be anything, like logging into an account (privilege change), adding items to a shopping cart, proceeding through a checkout system, etc. These are all tasks that require a state to be maintained.
    So, What Actually are They?Sessions are, by default, stored in temporary files on the web server. Each of these temporary session files hold all information being stored as session data by your web application. Every session has a unique ID (often referred to as the session identifier) that is randomly generated. This session identifier is, by default, stored in cookie held on a web users computer. The cookie is created upon initiating the session (hence why session initiation must come before any output to the screen - because cookies are apart of the HTTP header). Upon subsequent page requests, the cookie is read by the web server, and the session ID is looked up against the temporary files to see if the (re-)visiting user can resume their session.
    Session Identifier Acquirement
    The secrecy of a session identifier is critical to a session's security. An attacker typically has three options when it comes to acquiring a valid session identifier:
    Prediction Prediction of the session identifier will be the least of your worries. The generation of session IDs is sufficiently random to not need to worry about this attack vector (and so no preventative measures need to be issued).
    Capture Capturing a session identifier involves looking at other targets to attack where the session ID is kept. Because cookies (by default) propagate the session identifier, they have become a common target when attempting to capture a valid session identifier. Browser vulnerabilities can help attackers expose cookie information, though because of the rarity of these types of attacks, you need not worry too much about them.
    Fixation Fixation of a session identifier is where the session ID is set by an attacker in the query string of a URI (usually in the form of ?PHPSESSID=...), forcing the user clicking that link to use that given session identifier. This attack vector used to be a lot easier when session IDs could be passed via the URI. On newer versions of PHP however, the PHP.ini file now has the session.use_trans_sid directive turned off (set to zero) by default. (If your session.use_trans_sid is enabled for some reason or another, then you may want to think about turning it off.) Thus, this security problem is not as much of a concern for us anymore - but that's not to say we should just ignore it.
    It's considered good practice to invoke the session_regenerate_id() when there has been a privilege change in your application (be sure to set the optional parameter to true to delete the old session data). This will ensure that a new session identifier is regenerated to keep your user sessions secure when a change in privilege has occurred and is another preventative measure to session fixation attacks.
    Session Hijacking
    Session Hijacking is an attempt to gain access to a user's session to impersonate them. In order for an attacker to hijack a user's session, they must first gain their session identifier (through one of the aforementioned methods). This in itself is not trivial process, and we can aim to make session hijacking even more difficult through looking for consistency in each users behaviour (and therefore requiring authentication for inconsistent behaviour - like re-logins).
    One method of preventing this is to look for consistency in the requests made by the user agent header (accessible by the variable $_SERVER['HTTP_USER_AGENT']variable). We can hash the user's browser agent (such as with md5), and store that as apart of the session. This can then be checked upon subsequent requests:

    PHP Code:


    if(!isset($_SESSION['user_agent'])) {
    $_SESSION['user_agent'] = md5($_SERVER['HTTP_USER_AGENT']);
    if($_SESSION['user_agent'] !== md5($_SERVER['HTTP_USER_AGENT']) {
    // potentially an imposter - authenticate user with a login page
    A second method of prevention of session hijacking has already been described earlier, in cross-site request forgery attacks. The idea is exactly the same, where a nonce (a uniquely generated token) is passed via the URI to ensure each request is successfully submitted by that user and that user alone.

    There's a few PHP-centric security-related books that I have read and have found to be useful:
    If you have any additions, then please PM me or post below so that I can update the list.

    Appendix B - Further Online Reading Information
    There's plenty of websites that cover Web application security and make for good reads. Here's my short list (feel free to either PM me or comment below if you have any suggestions to add):

    Last edited: Dec 14, 2017
    [email protected]
    My PGP key: https://goo.gl/triziq

Share This Page