Filtering Out Unwanted XHTML/HTML Tags

July 19th, 2005

For a project I am working on right now, I wanted to allow users to add a little bit of HTML in a description field, but not too much. I only wanted to allow a few tags and a few attributes. I, never one to reinvent the wheel, headed to Google (a programmer’s best friend) on a code hunt. I tried several php filter functions and classes and I was left wanting. I was just about to give up and write something myself when I stumbled across the PHP Input Filter class on PHP Classes.org (you have to be a member to download code, but membership is free).

It took a matter of minutes to figure out the class and write the piece of code below:

  1. <?php
  2. // array of allowed tags
  3. $allowed_tags = array('br', 'p', 'div', 'strong', 'em', 'ul', 'ol', 'li', 'dl', 'dd', 'dt', 'a');
  4. // array of allowed attributes
  5. $allowed_attr = array('href', 'title');
  6. // create the new InputFilter instance
  7. $filter =& new InputFilter($allowed_tags, $allowed_attr);
  8. // use the new filter to process the string
  9. $description = $filter->process($description);
  10. ?>

Bada Bing Bang Boom. Problem solved. Since my search was long and nearly unproductive, I thought I would post a bit about my findings here so that others wouldn’t have to google for 45 minutes. The class has other options for filtering, but I will let you discover those for yourself. I just wanted to point you in the right direction.

10 Responses to “Filtering Out Unwanted XHTML/HTML Tags”

  1. Not being a programmer, just a web designer who uses a lot of php complete scripts.
    My sites are real estate agents so I would like to prevent any tags.
    Could you show me how to do that and then …
    How and where would this code be used?

    Mike

  2. @Mike - To remove all tags, I would use strip_tags or htmlspecialchars.

    Strip_tags strips all the tags from a string and htmlspecialchars encodes special characters such as < and >. This means that any html entered would show up as text rather than actual html.

    As far as where to use them, I would use them before inserting into the database. Another option would be to use them when displaying the fields on the front side.

  3. avatar shareen April 12th, 2006 6:09 pm

    Hi,

    I need to actually demonstrate and implement how XSS can be prevented for my dissertation. Can you help, ive read up on it abit and i know that you have to filter the information added to ur web page, but have not got a clue how to do that. Can you pleease help and point us in the right direction bearing in mind i am not very good at programming. Will be awaiting ur response.

    thankyou

  4. I would search ‘XSS’ at google. That should give you a wealth of articles. The main idea is that someone inserts <script> tags in a comment form or something which runs a script on another site when the page on your site is loaded. Most browsers now prevent XSS by default. Hope that helps.

  5. You should check out http://ha.ckers.org/xss.html if you want the low down on XSS. That guy put together the list of all the possible XSSs that are known. I don’t think your list solves the problem. For instance you don’t even mention the JavaScript directive but you allow A HREF, so you could have JavaScript in a link for instance. Anyway, it’s worth a read.

  6. @Teddy - Good find. I get nervous just looking at that list.

  7. I think I am just entering the XSS realm… but don’t need and comprehend input filtering completely. I am just looking for a way to secure posted data in a URL after a form entry such as: index.php?name=user and hiding the variables in the URL.
    It has something to do with SESSIONS or cookies, but I can’t find some easy examples to start off with. As a PHP newbie these classes and input filters are very daunting.

  8. avatar Rex December 22nd, 2007 12:30 am

    Where is the InputFilter class????

  9. It’s linked in the post.

  10. avatar Santosh Patnaik February 2nd, 2008 5:37 pm

    Try htmLawed. Besides filtering admin-specified HTML tags, attributes, etc., it can also balance and properly nest HTML tags, transform deprecated tags and attributes, and so on.

About This Site

Addicted to New is the personal website of John Nunemaker (Noo-neh-maker), a Web Developer enamored of Ruby on Rails and a wide-eyed fan of all things new and cool.