Phishing, a term coined in 1996, was a form of online identity theft (APWG,
2007). A phishing attack today typically employed generalized lures.
For example, a phisher disguised himself as a large banking corporation or popular
on-line auction site by replicating of target web sites. However, over the decade
the definition of phishing has expanded. Phishers today use attack vectors such
as email, Trojan horse key loggers and man-in-the-middle attacks to trick the
victims (Hong, 2012).
Phishing activity is naive but the caused damage is tremendous. According to
recent security reports of CNERT/CC in China, 45 million adults lost a total
of 7.6 billion RMB directly due to phishing in 2009 (CNERT/CC,
2010). But the damage caused by phishing does not only apply to monetary
property alone. Indirect losses were much higher, including of loss of productivity,
cost of maintaining a help desk to field calls, recovery costs or damage to
an online organizations reputation. This in turn caused a significant
loss in money, resources and time (Khatibi et al.,
2006; Sudha et al., 2007).
Another phenomenon was that only small set of targeted sites were imitated
by phishers. In order to exploit the financial profit, phishers usually selected
famous online e-commerce websites. For instance, August 2011 saw the total number
of unique phishing submitted to APAC in China is 3,579. Four brands, such as
Taobao, Tencent, ICBC and CCB, hijacked by phishing campaigns and comprised
96.29% of the volume (APAC, 2011).
Quite a number of solutions to mitigate phishing attacks have been proposed
to date. Generally, past works could be classified into browser-side-based solution
and server-side-based solution (Huang et al., 2012).
Browser-side-based solution usually embedded anti-phishing measures plug-in
into end-users browsers. Taking the advantage of heuristics (Xiang
et al., 2011), visual similarity (Chen et
al., 2010), identity (He et al., 2011)
and machine learning (Abbasi et al., 2010; Zhang
et al., 2011), the measures could automatically detect phishing sites
and warned the end-user to go away from phishing trick. However, it was entirely
passive; its effectiveness hinged on users ability. As we all know, end-users
would also be ill-equipped to identify phishing attacks.
An alternative that has been widely adopted was server-side-based solution,
which referred to require online organization authentication to defend against
phishing attacks. Typical approach attempted to eliminate the phishing problem
at the server side by trying to prevent phishing from reaching the potential
victims. Industry relied heavily on manually-verified URL black-lists in combating
phish. Another authentication approach was to share a secret between server
and end-user, e.g., an image watermark (Topkara et al.,
2005; Huang et al., 2010; Singh
et al., 2011), a fingerprint (Steel and Lu, 2008).
However, email filter and black-list verification would unavoidably cause false
positive and false negative and secret share required user awareness and prior
The traditional, passive solutions-providing users with tools to make decisions-may not be sufficient. In this paper, an active anti-phishing solution was done by online service provider to protect end-users from making mistakes. The semi-fragile watermark was consisted of URL, website identity characters and singular heuristics. After the embedded semi-fragile watermark into webpage, the online organization could detect replicated phishing sites. Simulating experiments shown the solution could effectively thwart the phishing attack.
WEBPAGE INFORMATION HIDING ALGORITHM BASED ON EQUAL TAG
Webpage information hiding algorithm based on equal tag was proposed by Sun
et al. (2007) abbreviated in ET lately. The key idea was that tag
attributes of webpage may appear in any order. So the order of attributes could
be changed, without changing the show or the file length. The tags with different
attributes permutation were called equal tag in this study in the following
Definition 1: Let T(a1, a2,
, an) be a tag in HTML, which has n attributes. Where T represents the tag name, ai has the form attribute name = value, denotes an attribute in a tag (1≤i≤n).
Definition 2: Let permutation
is a new permutation of T(a1, a2,
so the tag T(a1, a2,
, an) is an equal
A cover-webpage is modified by equal tag had the same show on the browser. So, there have some properties of equal tag:
||Property 1: Equal tag has the same function
||Property 2: T(a1, a2,
an) has n! equal tags
Semi-fragile watermark is new tendency to watermark. The semi-fragile watermark must fit to the following two conditions:
||It is robust to the provider normal operation, such as updating
the news, advertisement and so on
||It is fragile to the phisher malicious operation, such as
changing the host name of server form handler, the title and so on. The
phisher activity in reality is shown in Fig. 1
A semi-fragile watermark, which generated by the provider with equal tag to
indicate the identity of website, was embedded into webpage.
|| Phishing activity in reality
When a suspicious webpage came, the provider compared the generated and embedded
watermark. If the inconsistence with the information was raised, the spoof webpage
could be considered into a phishing page. The defeat phishing attack work done
by the provider of website was called an active anti-phishing solution. The
solution did not need end-users confirmation and the detection criterion
was done by the online sever, which was more accuracy than the heuristics-based
methods. In the following, we shown the detail of active anti-phishing solution
based on semi-fragile watermark.
ACTIVE ANTI-PHISHING SOLUTION
Solution hypothesis: Phisher lures end user to visit phishing site, the site also satisfy the following hypothesizes by our deep observation:
||Hypothesis 1: Phishing site impersonates well-known
website by duplicating the whole or part of the target sites in order to
show high visual similarity with its targets
||Hypothesis 2: The phishing site identity is inconsistence
with the imitated website
||Hypothesis 3: In order to achieve users information,
phishing site always has a login form
Active anti-phishing solution exploits these hypotheses to defeat phishing attack. In this study, semi-fragile watermark embody these hypotheses. We show how the semi-fragile watermark generated method to embody these hypotheses lately.
Semi-fragile watermark generated method: The semi-fragile watermark is generated with formula 1.
where, h is a hash function such as MD5 or SHA-1; the symbol || is a concatenation symbol; symbol cr is the abbreviation of copy right; a flag t denotes the content of title tag; the tuple (t1, t2, t3, t4, t5) contains five terms order by TF-IDF value in descent; symbol dn is the abbreviation of domain name; symbol url is an abbreviation of uniform resource locator appear in address bar; symbol sfh is an abbreviation of server form hander, its value is the content in server form hander.
Symbol cr and t and the tuple (t1, t2, t3, t4, t5) embody to hypothesis 1; the symbol dn and url used in formula 1 embody in the hypothesis 2; sfh used to indicate the hypothesis 3.
Active anti-phishing solution architecture: According to hypothesis 1, phishing site always replicate whole or part of targets site source code to achieve visual similarity. So we can use active anti-phishing solution based on semi-fragile watermark to thwart the phishing attack by downloading the tactics webpage and modifying a little to lure the victims.
The active anti-phishing solution departs into two parts: embedded part and detection part. In embedded part, a website service provider firstly generates semi-fragile watermark and embedded into webpage tag with equal tag idea to express the identity of website. The flowing chat of expressing the website identity is shown in Fig. 2.
The steps of this flow chat are given as follows:
||Generating semi-fragile watermark W with formula 1 of the
|| The flowing of expressing the website identity
||Using ET embedding algorithm to embed the generated semi-fragile
watermark with secret key k
||After the embedding step is done, the stego-webpage is output
and deployed in the host of website
In the detection part, when a suspicious webpage came, the provider compares the generated and embedded watermark, if the inconsistence with the information is raised, the spoof webpage can be considered into a phishing page. The detection chat flowing is shown in Fig. 3.
The steps of this flow chat are shown in the following:
||Using TF-IDF algorithm to filter the unrelated suspicious
webpage, only if the TF-IDF value is similarity to the protected website
to sent to the step 2 and the others is output legal
||Using the formula 1 to generate semi-fragile watermark W
||Using ET detection algorithm to extract the embedded semi-fragile watermark
||Compare semi-fragile watermark W and W, if they are consistence,
then the flag phishing is output, else the flag legal is output
EXPERIMENTAL RESULTS AND PERFORMANCE ANALYSIS
Experimental results: Here, a semi-fragile watermark generated experiment was shown. Firstly, the homepage of PayPal and eBay was downloaded and the generated semi-fragile watermarks were list in Table 1 and 2. This information could be used to embed into the homepage with ET algorithm and could be represented the websites identity. If phisher downloaded the watermarked webpage and changed some place of the source code, the activity would be detected with active anti-phishing solutions.
|| The flowing of detection of phishing site
|| The appearance of PayPal, (a) Without semi-fragile watermark
and (b) embedding semi-fragile watermark
||The source code of PayPal, (a) Source code without embedding
semi-fragile watermark and (b) Source code with embedding semi-fragile watermark
Using ET algorithm embedded the generated semi-fragile watermark into the protected website. Figure 4 was a snapshot of the PayPal homepage that imitate to protect without embedding semi-fragile watermark and embedding semi-fragile. Figure 5 was a part of the source code snapshot of Fig. 4. From the Fig. 4, it could find that the two homepages were similarity in appearance. So this solution more secret to the visible image watermark, because the phisher could not discover there had another protected way. The code underlined in Fig. 5 shown the difference between the two homepage. Though the source code was difference, the appearance was same in browser.
In order to verify the algorithm effectiveness, a phisher activity was imitated.
By observation the phisher activity, the phisher only changed small place in
order to preserve the same appearance. The place, such as server form handler
or titles content, was changed.
|| Semi-fragile watermark of PayPal
|| Semi-fragile watermark of eBay
With this observation, two imitated phishing attack activities were done as follows:
||Phishing attack #2, not only changed the source code content
of action in form as above but also changed the title content to welcome
After the two imitating phishing attack were done. The active phishing verified the phishing activity and listed the answer in Table 3. From the table, it could find that the semi-fragile watermark could be detect the phishing activity. So the active anti-phishing solution could be used to protect the owner of website.
|| Phishing attack to PayPal
Performance analysis: Firstly, the performance of the active anti-phishing solution analyze in usability, imperceptibility and security.
Usability: There have some anti-phishing approaches published in literatures
with visible image watermark (Topkara et al., 2005;
Huang et al., 2010; Singh
et al., 2011). As they describe, a unique watermark to each end user
embedded in image to show the website identity. So the end users just remember
this unique image watermark and distinguish legal website or illegal website
with that. Though those approaches provide a method to end user to verify websites
identity with unique image watermark, end user always ignore this and then it
make those solutions ineffective.
Our anti-phishing solution is developed to assist the owner of website to automatically distinguish phishing site but not manually in previous. The heuristic to verify phishing activity is the website provider actively embedded into the website source code. So the false negative of phishing detection is very small. Another advantage is that it is not dependent on the end user to distinguish phishing or not, which will avoid ignoring the warning flag to leak out personal information.
Imperceptibility: The active solution using equal tag method to embed
semi-fragile watermark into webpage. Reference (Sun et
al., 2007) has shown that this method wills imperceptibility to human.
So the phisher would not find the downloaded webpage with identity information.
Security: As described above, the phisher will not to find the embedded information. When the phisher download the tactic sites source code, they only change small source code to cheat end user but not to destroy the embedded watermark. It will help the owner of website to confirm the phishing activity. So our approach is more security than the visible image watermark approaches, because of ours will not incur the phisher suspicious.
Phishing was an important problem that results in identity theft. Although
simple, phishing attacks were highly effective and have caused billions of dollars
of damage in the last couple of years. In many cases, the phisher did not directly
cause the economic damage but resell the illicitly obtained information on a
secondary market. Hence, phishing attacks were still and important problem and
solutions were required.
Phishers used the downloaded webpage from the real Web site to make the phishing webpage appears exactly the same as the real one did. An active anti-phishing solution was done by online service provider to protect end-users from making mistakes. The solution can effectively thwart the phishing attack by downloading the tactics webpage and modifying a little to lure the victims.
This study is supported by National Natural Science Foundation of China (No. 61202496), Hunan Provincial Natural Science Foundation of China (No.10JJ4043, 10JJ5062), Hunan Province Planned Science and Technology Key Project (No. 2010NK2003), Hunan Province Planned Science and Technology Project (No. 2010TZ4012).