«TECHNICAL REPORT CMU/SEI-2013-TR-010 ESC-TR-2013-010 ® CERT Division Copyright 2013 Carnegie Mellon University This material ...»
Passive Detection of Misbehaving Name
Leigh B. Metcalf
Jonathan M. Spring
Copyright 2013 Carnegie Mellon University
This material is based upon work funded and supported by Department of Homeland Security under
Contract No. FA8721-05-C-0003 with Carnegie Mellon University for the operation of the Software
Engineering Institute, a federally funded research and development center sponsored by the United States Department of Defense.
Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of Department of Homeland Security or the United States Department of Defense.
References herein to any specific commercial product, process, or service by trade name, trade mark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by Carnegie Mellon University or its Software Engineering Institute.
This report was prepared for the SEI Administrative Agent AFLCMC/PZM 20 Schilling Circle, Bldg 1305, 3rd floor Hanscom AFB, MA 01731-2125
NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING
INSTITUTE MATERIAL IS FURNISHEDON AN “AS-IS” BASIS. CARNEGIE MELLON
UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED,
AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR
PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE
OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY
WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK,
OR COPYRIGHT INFRINGEMENT.This material has been approved for public release and unlimited distribution except as restricted below.
Internal use:* Permission to reproduce this material and to prepare derivative works from this material for internal use is granted, provided the copyright and “No Warranty” statements are included with all reproductions and derivative works.
External use:* This material may be reproduced in its entirety, without modification, and freely distributed in written or electronic form without requesting formal permission. Permission is required for any other external and/or commercial use. Requests for permission should be directed to the Software Engineering Institute at email@example.com.
* These restrictions do not apply to U.S. government entities.
Carnegie Mellon® and CERT® are registered marks of Carnegie Mellon University.
DM-0000645 Table of Contents
vii 1 Introduction 1
1.1 Related Work 1
1.2 Motivation 2
1.3 Data Sources 3 2 Method 4 3 Results
Figure 1: The average and minimum number of times an IP address change of a name server also changed the ASN, normalized by the number of IP changes. Each point is binned by IP change frequency for each data source. The maximum for both sources is 1 for all bins. 9 Figure 2: Log-log plot of observed name-server record TTL value frequency out of total nameserver records observed. There are 760,796,769 total records represented. The figure does not show an outlier: 0.035% of the records had a TTL value greater than 604,800 and less than the maximum value of 4,294,944,960. 11 Figure 3: Number of name servers that changed IP address five or more times in a month. Solid red line indicates those servers possibly linked to pharmaceutical scams. 12
In the process of categorizing malicious domains, distinguishing between suspicious and benign name servers can allow the name servers themselves to be acted against. Name servers do not normally change internet protocol (IP) addresses frequently. Domains that do change IP addresses quickly or often are said to exhibit IP flux, which can allow services, such as web pages that deliver malicious content, to circumvent defenders’ attempts to block their IP addresses. IP flux in a name server’s domain may be a sign that the name server is suspicious. This report demonstrates that name-server flux exists and is ongoing. Furthermore, there are two types of data that can reveal IP flux in domain name system (DNS) servers: passively collected DNS messages and the contents of several large, top-level domains’ official zone files.
CMU/SEI-2013-TR-010 | viiCMU/SEI-2013-TR-010 | viii1 Introduction
Detecting malicious domains is becoming an important task in limiting the impact of ne’er-dowells on the internet. It has become a race between defenders developing new detection methods and adversaries developing new evasion methods. Domains are required to have only a few properties, so malicious domains can leave few traces. By focusing on the characteristics a domain must have in order to operate, such as a name server, security personnel can limit the ability of malicious domain controllers to avoid countermeasures. The domain name system (DNS) requires only two associations for a domain: its location and whom to ask about its location. Network administrators and security personnel have pursued restricting location, mainly via blocking internet protocol (IP) addresses, with some success. Blocking name servers, the entities that provide location information about domains, has been pursued with much less energy.
In the process of categorizing malicious domains, distinguishing between suspicious and benign name servers can allow the name servers themselves to be acted against, given sufficient evidence. Because name servers are one of the primary components of a well-functioning DNS, they do not change IP addresses frequently. Domains that do change IP addresses quickly or often are said to exhibit IP flux, which can make services, such as web pages that deliver malicious content, more resilient by circumventing defenders’ attempts to block their IP addresses. IP flux in a name server’s domain may be a sign that the name server is suspicious.
1.1 Related Work
In early 2008 the Internet Corporation for Assigned Names and Numbers (ICANN) Security and Stability Advisory Committee published an advisory detailing the existence of fast-flux networks [ICANN 2008]. This advisory detailed flux in domains based on DNS record content, a phenomenon commonly called fast flux, and in the name-server infrastructure used to support those domains. The ICANN report identified fast flux as malicious and stated that fast-flux hosting “is considered one of the most serious threats to online activities today” [ICANN 2008, p.
2]. The advisory did not qualify the maliciousness of fast flux, nor did it note any benign use cases for the activity.
Significant work has investigated how to find and block domains that demonstrate malicious activity. Some of these efforts are operational, and some are still in the research phase. Most specialize in a particular brand of malicious activity. Some operational blocking lists conceal their selection criteria to prevent adversaries from exploiting the rules. Some notable operational lists include Spamhaus, PhishTank, and Google Safe Browsing. Spamhaus maintains a few operational lists, each targeting aspects of malicious email [Spamhaus 2011]. PhishTank publishes lists of phishing URLs that are consumed by popular browsers to block phishing pages from reaching users. PhishTank takes a community-based approach, providing a site where “anyone can submit, verify, track and share phishing data” [PhishTank 2011]. Google also derives lists of phishing and malicious sites while it crawls the web, and it makes these lists accessible to the public using an application programming interface (API) [Google 2011]. While the effectiveness of blocking phishing seems to vary [Rasmussen 2010, 2011; Spring 2010], efforts to take down or block phishing sites have been demonstrated to shorten their lifetimes [Moore 2007].
CMU/SEI-2013-TR-010 | 1 Several papers have described blocking techniques that use passive DNS (pDNS). Some techniques also use zone files, the official lists of domain-to-name-server and name-server-to-IPaddress mappings maintained by the registry. Antonakakis and colleagues developed a reputationbased classification system, called Notos, that uses pDNS monitoring data [Antonakakis 2010].
The Notos classification scheme divides its decision-making criteria into the broad categories of network-based, zone-based, and evidence-based. A similar team expanded these efforts with Kopis [Antonakakis 2011]. Bilge and colleagues designed the EXPOSURE system, which also uses pDNS as the data input. EXPOSURE introduces features based on time series and time to live (TTL) [Bilge 2011]. Others have described active detection of fast-flux domains [Hu 2011], and later behavioral analysis of the fast-flux networks has independently touched on name server use [Kadir 2012].
There have also been efforts to use URLs culled from spam traps, combined with active DNS behavior and registration information, to describe some properties of malicious domains [Hao 2011] as well as URL properties, excluding page content [Ma 2009a, 2009b]. Additionally, Felegyhazi, Kreibich, and Paxson used zone files to predict which domains would be used maliciously, based on the previous evidence of malicious activity by other domains using the same name server [Felegyhazi 2010]. Stoner finds malicious activity, specifically malicious fast flux and domain flux, using simpler methods [Stoner 2010]. That research uses only two features of a domain: the IP addresses it maps to and the associated autonomous system numbers (ASNs) in which the IP addresses reside.
The ultimate goal of the current work is to hinder criminals’ free use of domain names as an accessory to their crimes. Except for Felegyhazi, Kreibich, and Paxson’s work [Felegyhazi 2010], current techniques can only reactively hinder criminals. All the operational lists currently in use (see Section 1.1) are reactive, so they can at best take away names only after some damage has been done. This is important in limiting damage, but it is not ideal. With some malicious use cases such as spam, as many as 55% of domains may be used within one day of registration; these domains also tend to be hosted on name servers, which are detectably different from the average DNS infrastructure [Hao 2011]. The IP flux detection described in this report complements the malicious name-server detection methods described by Hao, Feamster, and Pandrangi [Hao 2011].
Detection via name-server behavior can improve the current state-of-the-art deterrence because it preempts at least some domains and is comprehensive in that all domains must be served by a name server. The particular aspect of name-server behavior we measure is name-server IP flux.
This technique is independent of the particular use of the malicious domains hosted on the name server, so it complements techniques that first identify malicious domains by a specific use case (spam body URLs, command and control, etc.) and then identify their name servers.
In addition, it is an important contribution simply to report on the prevalence of fast-flux name servers. Active classification, such as reported by Hu, Knysz, and Shin [Hu 2011], is insufficiently scalable for a global view. We employ passive techniques that permit analysis of hundreds of millions of domains, rather than thousands. The 2008 ICANN advisory spurred measurement of the domain fast-flux phenomenon and policy and technical recommendations by ICANN [Konings 2009]. However, previous reporting did not provide measurements of nameserver flux. This report provides such measurement of name-server flux.
CMU/SEI-2013-TR-010 | 21.3 Data Sources
The general categories of data sources used in this work are top-level domain (TLD) zone files and pDNS traffic. The contents of zone files are generally reported to the registry by the registrars. New generic TLD (gTLD) operators are required to make this file available under certain conditions [ICANN 2011]. Other TLD operators do not operate under the same contract with ICANN and do not generally make their zone files available to anyone. Because gTLD files are available, we demonstrate our analysis using the com, org, net, biz, info, and mobi zone files.
Passive DNS collection was first described by Weimer [Weimer 2005]. We use a large pDNS source, the Security Information Exchange (SIE), for pDNS data. While the coverage of the SIE sensor array is incomplete and biased, there is evidence that it is wide, and it processes many tens of millions of distinct messages per day [Spring 2011]. The SIE is the best generally available source of pDNS data. The SIE data is delivered in resource records (RR) sets. One record set is all the resource records in a single message that share record name, class, type, and TTL, with the
record data sorted by the standard DNS ordering and stored as the final fields of the record set:
• name—a unique identifier, and the subject of the record; it can specify a single host or a zone under which there are more names.
• class—identifies a set of types; in practice the only value observed is IN for the internet.
• type—specifies the expected data type and format, such as the IP address(es) of the name, the mail exchange to use to contact that domain, or the name server to ask for more information about the name