Hiding email addresses

Two methods of hiding email addresses in Spip’s text fields from spam robots

On our site we need to publish some contact lists which include email addresses. In order to avoid the “harvesting” of these addresses by spam robots they need to be hidden in some way. Here are two ways of doing this, which (I hope) are adequately secure.

Both methods make use of the apres_propre “entry point”, which is provided in Spip from version 1.7 onward, to intercept the text stream.

I should say that this whole contribution comes with a Health Warning attached : I do not know a lot about PHP, and the Regular Expressions here are among the first I have written (they may well be the last, too...).

Method 1

In the file ecrire/mes_options.php3 (create this file if it does not already exist), place these lines (making the change noted at point 3) :

/*
 *   +----------------------------------+
 *    Filter name : AntiSpam1                                               
 *   +----------------------------------+
 *    Date : 26 May 2004
 *    Author : Paolo                                       
 *   +-------------------------------------+
 *    Hides email addresses in Spip's 
 *    text fields 
 *   +-------------------------------------+ 
 *  
 *    Please post suggestions and comments in
 *    the forum associated with the article:
 * http://www.uzine.net/spip_contrib/article.php3?id_article=537
*/

function apres_propre($string) {
$replacement = "M";
$tip = "Antispam measure: you need to replace ".$replacement." by @ in order to send mail.";

// 1. force the link to lower-case letters
preg_match_all("/mailto:[^\"\s>]*/",$string,$found);
$total = count($found[0]);
for($i=0; $i < $total; $i++) {
  $string = str_replace($found[0][$i], strtolower($found[0][$i])."?body=".$tip, $string);
  }

// 2. replace the @ in the links  
preg_match_all("/mailto:[^\?]*/",$string,$found);
$total = count($found[0]);
for($i=0; $i < $total; $i++) {
  $string = str_replace($found[0][$i],str_replace("@", $replacement, $found[0][$i]),$string);
  }

// 3. 'Encodes' the word "mailto:" 
// N.B.: it seems to be impossible to stop Spip 
// transforming the line below.
// For each of the 7 characters, change "&&" to "&#"!
$string = str_replace("mailto:", "&&109;&&x61;&&105;&&x6c;&&116;&&x6f;&&58;", $string);
    
// 4. Takes out the "@" of the text if an email has been given as the text of a link
preg_match_all("/>\S+@{1}\S+\s?<\/a/",$string,$found);
$total = count($found[0]);
for($i=0; $i < $total; $i++) {
  $string = str_replace($found[0][$i],str_replace("@", $replacement, $found[0][$i]),$string);
  }  
return $string;
} 

How does it work ?

1. First, we look for all the instances of mailto: and match the following characters up until the next inverted commas, space or closing angle bracket, and count that as an email link.

Next the link (which may of course contain capital letters) is forced to lower-case. At the same time a « tip » is added to the link. When the visitor to the page clicks on the link, this text will be inserted into the body of the new email telling them what to do in order to make the email address valid.

2. The @ is replaced with the replacement string which is defined at the beginning of the function and which you can change according to taste. Here, I’ve chosen a capital M. As the link contains only lower-case letters this will be easy for the visitor to see to replace, but hopefully incomprehensible for robots.

3. Mail robots apparently usually look for the text « mailto: » so it makes sense to change it a bit. To make it a bit more confusing the string of entities uses a mixture of hex and decimal encoding.

4. This next regular expression checks if between a closing angle bracket and a </a there is an @ character anywhere. This will usually be indicative of an email address having been given as the text of a link. So this text is converted in the same way. It would be possible to just change this text to something like « Send email » - as is done in the second method.

Advantages of this method

-  It will be (I think) good at hiding the addresses from robots.

-  Unlike Spip’s default |antispam filter, it will not convert every @ sign in the text, but just the ones in email links (so you can still write sentences like : “C U @ 9”, she texted to her friend - ok, no great advantage !)

-  The spaces in the email address produced by the default |antispam filter produces quirky effects in some email software when the email links are clicked. This allows you to avoid that.

Disadvantage of this method

-  It’s tiresome for the person clicking on the link to have to correct it manually.

Method 2

In the file ecrire/mes_options.php3 (create this file if it does not already exist), place these lines :

/*
 *   +----------------------------------+
 *    Filter name : AntiSpam2                                               
 *   +----------------------------------+
 *    Date : 26 May 2004
 *    Author : Paolo                                       
 *   +-------------------------------------+
 *    Hides email addresses in Spip's 
 *    text fields 
 *   +-------------------------------------+ 
 *  
 *    Please post suggestions and comments in
 *    the forum associated with the article:
 * http://www.uzine.net/spip_contrib/article.php3?id_article=537
*/

function apres_propre($string) {
preg_match_all("/mailto:[^\"]*/",$string,$found);
$total = count($found[0]);
for($i=0; $i < $total; $i++) {
  $comat = strpos($found[0][$i],"@");
  $part1 = substr($found[0][$i],7,($comat-7));
  $part2 = substr($found[0][$i],($comat+1));
  $newstr ='#" name="'. $part2 . '" title="' . $part1 . '" onClick="location.href = dolink(this.title, this.name); return false;';
  $string = str_replace($found[0][$i], $newstr, $string);
  }
// if an email address is given as the text of a link, change it
preg_match_all("/>\S+@{1}\S+\s?<\/a/",$string,$found);
$total = count($found[0]);
for($i=0; $i < $total; $i++) {
  $string = str_replace($found[0][$i],">[Email]</a",$string);
  }  
return $string;
} 

Then, in the <head> section of the templates where text with emails may appear place these lines :

<script Language="JavaScript">
<!--
function dolink(part1, part2){
  link = 'mailto:' + part1 + '@' + part2;
  return link;
}
//-->
</script> 

Alternatively, you can of course put this function in a separate .js file and link your templates to it using a line like this :

<script type="text/javascript" src="mes_scripts.js"></script>

How does it work ?

The function matches strings beginning with mailto: until it finds a pair of inverted commas. So it is important that the email links be well formed with the href attribute enclosed in double inverted commas (email links made with Spip’s shortcut are like this).

Then the email link is jumbled up by assigning bits of it to different attributes.
So a link that contains
<a href="me@nowhere.net" ...
is transformed into
<a href="#" name="nowhere.net" title="me" onClick="location.href = dolink(this.title, this.name); return false;" ...

The email is only decoded when a visitor clicks on the link.

If the text of a link contains an @ the whole text is replaced ; in this case by the word [Email].

Advantages of this method

-  All the advantages of the first method, plus

-  The link works when it is clicked and doesn’t need correcting manually.

Disadvantage of this method

-  The link will only work if the visitor has a browser with Javascript. Otherwise they will not be able to get at the email address at all.

Note (June 2005) : This contrib has now been superseded by “Un système antispam”, published in French.

Discussion

Une discussion

  • 1

    I also created my personnal hiding method. It’s a mix of your 2 methods and some salt.

    The code is designed for french sites but you can update it. If you are interested for the code, ask me.

    • Cela me semble bon ! J’ai aussi dévelopé mes idées un peu après avoir écrit cette contrib.

      Paolo

    Répondre à ce message

Ajouter un commentaire

Avant de faire part d’un problème sur un plugin X, merci de lire ce qui suit :

  • Désactiver tous les plugins que vous ne voulez pas tester afin de vous assurer que le bug vient bien du plugin X. Cela vous évitera d’écrire sur le forum d’une contribution qui n’est finalement pas en cause.
  • Cherchez et notez les numéros de version de tout ce qui est en place au moment du test :
    • version de SPIP, en bas de la partie privée
    • version du plugin testé et des éventuels plugins nécessités
    • version de PHP (exec=info en partie privée)
    • version de MySQL / SQLite
  • Si votre problème concerne la partie publique de votre site, donnez une URL où le bug est visible, pour que les gens puissent voir par eux-mêmes.
  • En cas de page blanche, merci d’activer l’affichage des erreurs, et d’indiquer ensuite l’erreur qui apparaît.

Merci d’avance pour les personnes qui vous aideront !

Par ailleurs, n’oubliez pas que les contributeurs et contributrices ont une vie en dehors de SPIP.

Qui êtes-vous ?
[Se connecter]

Pour afficher votre trombine avec votre message, enregistrez-la d’abord sur gravatar.com (gratuit et indolore) et n’oubliez pas d’indiquer votre adresse e-mail ici.

Ajoutez votre commentaire ici

Ce champ accepte les raccourcis SPIP {{gras}} {italique} -*liste [texte->url] <quote> <code> et le code HTML <q> <del> <ins>. Pour créer des paragraphes, laissez simplement des lignes vides.

Ajouter un document

Suivre les commentaires : RSS 2.0 | Atom