«PhD-FSTC-2015-30 Ecole Doctorale IAEM Lorraine Faculté des Sciences, de la Technologie et de la Communication DISSERTATION Defense held on ...»
PhD-FSTC-2015-30 Ecole Doctorale IAEM Lorraine
Faculté des Sciences, de la Technologie et de
Defense held on 22/06/2015 in Luxembourg
To obtain the degree of
DOCTEUR DE L’UNIVERSITÉ DU LUXEMBOURG
DOCTEUR DE L’UNIVERSITÉ DE LORRAINE
SPECIALITE: INFORMATIQUEby Samuel Marchal Born on May 5, 1987 in Pont-à-Mousson (France)
DNS AND SEMANTIC ANALYSIS FOR
Dissertation defense committee:
Prof. Dr. Thomas Engel, supervisor Prof. Dr. Eric Filiol, member University of Luxembourg (Luxembourg) ESEIA (France) Prof. Dr. Olivier Festor, co-supervisor Prof. Dr. Eric Totel, member TELECOM Nancy – University of Lorraine (France) Supélec Rennes (France) Prof. Dr. Ulrich Sorger, chairman Dr. Vijay Gurbani, expert University of Luxembourg (Luxembourg) Bell Laboratories (USA) Prof. Dr. Claude Godart, vice-chairman Dr. Habil. Radu State, expert University of Lorraine (France) SnT (Luxembourg) ´ Ecole doctorale IAEM Lorraine Analyse du DNS et Analyse S´mantique pour la D´tection de e e l’Hame¸onnage c (DNS and Semantic Analysis for Phishing Detection) ` THESE pr´sent´e et soutenue publiquement le 22 Juin 2015 e e pour l’obtention du Doctorat de l’Universit´ de Lorraine e (mention informatique) par Samuel MARCHAL Composition du jury Rapporteurs : Prof. Dr. Eric FILIOL ESEIA Prof. Dr. Eric TOTEL Sup´lec Rennes e Examinateurs : Prof.
Laboratoire Lorrain de Recherche en Informatique et ses Applications — UMR 7503 Mis en page avec la classe thesul.
Remerciements My ﬁrst thanks go to the reviewers of this document and to the jury members who accepted to evaluate it. I thank them for the time they spent to read it, for the interest they showed to my work and for the constructive reviews and comment I got out of their evaluation. These helped me to improve this manuscript, to identify some improvements that can be brought to this work and new research perspectives that can be explored.
I faithfully thank my two co-supervisors, Thomas Engel and Olivier Festor for welcoming me in their team at SnT and LORIA during the four years of my Ph.D.. They both provided me a very good support and wise advices while I was doing my research activities. I thank them for their listening, their help and the constructive feedback they gave me. Their supervision has been a key element for the achievement of this Ph.D.
I also want to thank Radu State and Jérôme François. I met Radu Sate while he was my professor at TELECOM Nancy. I discovered and started to do research activities under his supervision. He gave me the taste and the motivation to do research by sharing his work and passion with me. I thanks him for the opportunity he gave me to work with him, for his support and the help he provided me during the past four years. I thank Jérôme François for the supervision and the help he provided me when I started my Ph.D. He put me on the right track from the beginning and we collaborated on many research activities afterwards. It has been a pleasure to work with both of them and they were of great help to produce the results presented in this document.
I want to thank all the people from the SecanLab team (SnT) and the MADYNES team (LORIA). Working within these teams is a great environment to carry research, exchange ideas and produce high quality work. It has been a pleasure for me to work in both during my Ph.D.
and I am happy to have spend time there. I address a special thanks to my oﬃce mates for the good working environment they provided me. I thank as well people from LORIA and SnT, I interacted with a lot of diﬀerent people along these years and I am glad that some of them became good friends.
I thank CETREL, the industrial partner for my Ph.D. and more speciﬁcally Sam Gabbaï and Jean-Yves Decker. It has been a pleasure to work with them and to carry research activities to solve concrete problems. Sam and Jean-Yves have always been of great help and I thank them for the precious time they gave me and their availability during our collaboration.
Finally, I address a special thank to my family and in ﬁrst place my parents who supported me all along my studies. I thank as well my friends with who I spent good times out of oﬃce which helped me to relax and work more eﬃciently.
To you all, thank you.
1 Context The power of persuasion has been used for thousands of years to convince people to do things dictated by a leader employing persuasion. This ancestral art is used by politicians, salesmen or lawyers for instance, in order to spread ideas, to sell products or convince a jury, respectively.
Even though these examples are legal practices, one may ﬁnd the ratio of power unfair between people mastering this technique and their gullible victims. The power of persuasion has also been used to perpetrate other activities considered as illegal such as swindling. In a swindle, a crook uses his skills to abuse people credulity in order to make them do actions for his own beneﬁt.
This can consist in lending money without warranty, provide services or products without paying, give advance payment for fake sales, etc. These practices have been used by unscrupulous people for centuries in order to make easy money. These tricks were initially performed using direct interaction with victims through convincing speeches. However, time changes and the way to perpetrate swindles as well as their targets changes. Nowadays, other means of communications than direct talk are available through electronic communications like phone calls, emails, instant messaging, etc. Moreover, the direct getting of money is not necessarily the ﬁrst objective of modern swindling and the acquisition of others valuable immaterial things, like data that can be sold or used to steal money, became more common.
Phishing is an example of modern swindles that targets electronic communications users such as phones and computers users. The same objectives are aimed by e-crooks, who are named phishers, namely to persuade their victims to perform some actions using electronic communications means. Phishers use their power of persuasion to tailor convincing socially engineered emails or websites to manipulate their victims. They use carefully chosen words and sentences to establish a trust atmosphere with their victims in order to push them to perform some actions. Rather than targeting the direct stealing of money or delivery of products for free, phishing mostly aims to steal the victim’s conﬁdential electronic data that has became valuable.
The Internet has made it easy to use services that in the past required a more intimate contact between the people conducting the transaction. Some general services such as news providers, education services or science libraries are now available on the Internet. Personalized services such as payment services, banking management services or retail services are also proposed.
These personalized services are sensitive because usually dealing with money management and user’s conﬁdential information. Hence, the access to these services is valuable in order to steal the information and/or the money stored. For instance, gaining enough personal information about a victim can be used to impersonate him through identity theft. A stolen identity can be used to pose as a person in others swindles in order to hide and protect the identity of the real crook, or to access personal electronic services in order to act in his name. This represents actually the main goal sought by phishers: to steal the required information in order to access sensitive services.
Figure 1: Phishing attacks and phishing domain names recorded every year (source:APWG) Phishing appeared almost 20 years ago and its ﬁrst victims were ISP users from which phishers tried to steal the account access information using spoofed emails alleging having been sent by administrators. Phishing attacks usually target users of a given sensitive service related to a brand. Phishers lure the brand clients by alleging to be some brand’s representatives in order to ask information related to their usage of the service. This information mostly consists in credentials for a given website or credit card numbers. Several vectors are used for phishing while the mostly used are emails and websites that mimic the ones of legitimate services and alleged to be related to them. Despite this diversity, a common point of many vectors is the use of link misdirecting victims to phishing contents. The use of obfuscated URLs and domain names is widespread in phishing attacks and the use of malicious domain names as a support for attacks is increasing as depicted in Figure 1, showing the relevancy to identify URLs and domain names to ﬁght phishing. This ﬁgure shows the evolution of the number of phishing attacks and phishing domain names in use every year between 2008 and 2014. We can see that the count of registered phishing attacks ﬂuctuates between 100,000 and 250,000 globally along the period.
However, we can see a regular increase in the count of domain names used as a support for phishing attacks starting from around 50,000 in 2008 and reaching almost 170,000 in 2014.
Over the years, phishing activities dramatically increased in terms of attacks and number of targeted brands [apw04, AR14]. This augmentation of phishing attacks is depicted by an ever increasing ﬁnancial damage that reached US $5.9 billion in 2013 [rsa14]. This increase is ongoing since phishing appeared and according to the current trend, this progression will continue. We identiﬁed four main reasons explaining this increase and the installation of phishing as a continual
the Internet, as highlighted by the raise of online websites, reaching almost one billion [net15]. Hence, many new potential victims, physical vectors and targets become available letting space for new kind of phishing attacks to be perpetrated.
• The second reason is the variety of phishing attacks used to perpetrate phishing. Regular phone calls, sms, emails or websites are examples of communication technologies used to perform phishing. Protecting against this variety of vectors is diﬃcult and existing phishing prevention and detection techniques only cope with few of them. Detection techniques for phishing emails [FST07] or phishing websites [MKK08, CDM10, CSDM14] exist for instance, but their application is limited to few attacks compared to the tens that exist.
Hence, a global protection implies the use of several independent techniques as we can see today with email ﬁltering, web browser warnings and website authentication techniques that are jointly used to protect against phishing. However, some phishing attacks still succeed to bypass this cumulated protections in order that phishing impact is still progressing.
• The third reason is the increasing number of phishers and attacks perpetrated. The former is explained by the fact that phishing is an easy to perpetrate task requiring low technical skills. The main eﬀort to build phishing attacks is invested in the social engineering tricks used [HCNK+ 14]. This can easily be performed by technically unqualiﬁed crooks thanks to the availability of ready-to-use phishing kits [CKV08] and the availability of cheap infrastructures to deploy the attacks. The increase of attacks performed is explained by the decrease of gain per attack forcing phishers to launch more campaigns to keep a constant revenue from their crime [HF08]. Phishing can be qualiﬁed as the cybercrime equivalent of pickpocketing since many people are perpetrated it for low revenue. Hence, targeted countermeasures against speciﬁc phishers do not cope with this cybercrime since many other phishers would still continue their activities.
• The fourth and predominant reason is the lack of user awareness about the risk associated with electronic communications and the value of the information stored on their several websites accounts. Most people do not understand and are not concerned about the impact of credential stealing, credit card number stealing or identity theft [pon14]. This lack of concerns does not motivate them to protect their data from stealing. Security is a secondary purpose for most users and their limited technical knowledge does not allow them to enhance the security level of their electronic communications [WT99]. New users of modern electronic communications means are gullible and easy targets for phishers who can easily lure them. This widespread unawareness is the main reason of the eﬃciency of phishing attacks.
Phishing is an ever growing activity that became of major concerns. Many factors explain its expansion and the raise of its ﬁnancial damage to reach several billions of dollars every year. The variety of phishing attacks, the augmentation of potential victims and physical vectors, the ease to perpetrate this modern swindle and the widespread unawareness of victims make it a troublesome cybercrime activity. Beside its ﬁnancial impact, phishing raise as well concerns regarding the use of electronic communications means to communicate. People see personal information stealing and misuse as a very-likely-to-occur event in their life [pon14]. This perception of phishing as a fatality and not as a problem that can be prevented leads to erode the trust among electronic communications users. A direct risk of this lose of trust is the decreasing usage of electronic means such as emails as way of communication [HF08]. This renders the ﬁght against phishing paramount to preserve the widespread usage of this useful technology.