Filtering Out Unwanted XHTML/HTML Tags
July 19th, 2005
For a project I am working on right now, I wanted to allow users to add a little bit of HTML in a description field, but not too much. I only wanted to allow a few tags and a few attributes. I, never one to reinvent the wheel, headed to Google (a programmer’s best friend) on a code hunt. I tried several php filter functions and classes and I was left wanting. I was just about to give up and write something myself when I stumbled across the PHP Input Filter class on PHP Classes.org (you have to be a member to download code, but membership is free).
It took a matter of minutes to figure out the class and write the piece of code below:
<?php// array of allowed tags$allowed_tags = array('br', 'p', 'div', 'strong', 'em', 'ul', 'ol', 'li', 'dl', 'dd', 'dt', 'a');// array of allowed attributes$allowed_attr = array('href', 'title');// create the new InputFilter instance$filter =& new InputFilter($allowed_tags, $allowed_attr);// use the new filter to process the string$description = $filter->process($description);?>
Bada Bing Bang Boom. Problem solved. Since my search was long and nearly unproductive, I thought I would post a bit about my findings here so that others wouldn’t have to google for 45 minutes. The class has other options for filtering, but I will let you discover those for yourself. I just wanted to point you in the right direction.
If you enjoyed this post, get free updates by email or RSS.
Not being a programmer, just a web designer who uses a lot of php complete scripts.
My sites are real estate agents so I would like to prevent any tags.
Could you show me how to do that and then …
How and where would this code be used?
Mike
@Mike - To remove all tags, I would use strip_tags or htmlspecialchars.
Strip_tags strips all the tags from a string and htmlspecialchars encodes special characters such as < and >. This means that any html entered would show up as text rather than actual html.
As far as where to use them, I would use them before inserting into the database. Another option would be to use them when displaying the fields on the front side.
Hi,
I need to actually demonstrate and implement how XSS can be prevented for my dissertation. Can you help, ive read up on it abit and i know that you have to filter the information added to ur web page, but have not got a clue how to do that. Can you pleease help and point us in the right direction bearing in mind i am not very good at programming. Will be awaiting ur response.
thankyou
I would search ‘XSS’ at google. That should give you a wealth of articles. The main idea is that someone inserts <script> tags in a comment form or something which runs a script on another site when the page on your site is loaded. Most browsers now prevent XSS by default. Hope that helps.
You should check out http://ha.ckers.org/xss.html if you want the low down on XSS. That guy put together the list of all the possible XSSs that are known. I don’t think your list solves the problem. For instance you don’t even mention the JavaScript directive but you allow A HREF, so you could have JavaScript in a link for instance. Anyway, it’s worth a read.
@Teddy - Good find. I get nervous just looking at that list.
I think I am just entering the XSS realm… but don’t need and comprehend input filtering completely. I am just looking for a way to secure posted data in a URL after a form entry such as: index.php?name=user and hiding the variables in the URL.
It has something to do with SESSIONS or cookies, but I can’t find some easy examples to start off with. As a PHP newbie these classes and input filters are very daunting.
Where is the InputFilter class????
It’s linked in the post.
Try htmLawed. Besides filtering admin-specified HTML tags, attributes, etc., it can also balance and properly nest HTML tags, transform deprecated tags and attributes, and so on.