Java 怎么拿取html文件里面的代码

转载

技术极客侠 2024-12-17 11:09:46

文章标签 string output asp.net null 测试 文章分类 Java 后端开发

一、C#中的编码

HttpUtility.HtmlDecode、HttpUtility.HtmlEncode与Server.HtmlDecode、Server.HtmlEncode与HttpServerUtility.HtmlDecode、HttpServerUtility.HtmlEncode的区别？

它们与下面一般手工写的代码有什么区别？

public static string htmlencode(string str)
{
    if (str == null || str == "")
        return "";

    str.Replace("<", "<");
    str.Replace(">", ">");
    str.Replace(" ", " ");
    str.Replace("　", "  ");
    str.Replace("/"", """);
    str.Replace("/'", "'");
    str.Replace("/n", "<br/>");

    return str;
}

答案：

HtmlEncode：是将html源文件中不容许出现的字符进行编码，通常是编码以下字符："<"、">"、"&"、"""、"'"等；

HtmlDecode：跟HtmlEncode恰好相反，是解码出原来的字符；

HttpServerUtility实体类的HtmlEncode(HtmlDecode)的简便方式，用于在运行时从ASP.NET Web应用程序访问System.Web.HttpUtility.HtmlEncode(HtmlDecode)方法，HttpServerUtility实体类的HtmlEncode(HtmlDecode)方法在内部是使用System.Web.HttpUtility.HtmlEncode(HtmlDecode)方法对字符进行编码(解码)的;

Server.HtmlEncode(Server.HtmlDecode)其实是System.Web.UI.Page类封装了HttpServerUtility实体类的HtmlEncode(HtmlDecode)的方法；

System.Web.UI.Page类有这样一个属性：public HttpServerUtility Server{get;}

所以可以认为：

Server.HtmlEncode=HttpServerUtility实体类的HtmlEncode方法=HttpUtility.HtmlEncode;

Server.HtmlDecode=HttpServerUtility实体类的HtmlDecode方法=HttpUtility.HtmlDecode;

它们只不过是为了调用方便，进行了封装而已；

下面是一个非常简单的替换测试代码，测试结果看注释：

protected void Page_Load(object sender, EventArgs e)
{
    TestChar("<");   //小于号        替换为      <       
    TestChar(">");   //大于号        替换为      >
    TestChar(" ");    //英文半角空格        替换为      不做替换;
    TestChar("　");  //中文全角空格        替换为      不做替换；
    TestChar("&");   //&        替换为      &
    TestChar("/'");   //单引号        替换为      ';
    TestChar("/"");   //双引号        替换为      "
    TestChar("/r");   //回车        替换为      不做替换;
    TestChar("/n");   //回车        替换为      不做替换;
    TestChar("/r/n");   //回车        替换为      不做替换;
}
protected void TestChar(String str)
{
    Response.Write(Server.HtmlEncode(str));
    Response.Write("----------------------");
    Response.Write(HttpUility.HtmlEncode(str));
    Response.Write("<br/>");
}

所以手工的替换方法还是很有必要的，处理一些HtmlEncode不支持的替换。

public static string htmlencode(string str)
{
    str.Replace("<", "<");
    str.Replace(">", ">");
    str.Replace(" ", " ");
    str.Replace("　", " ");
    str.Replace("/'", "'");
    str.Replace("/"", """);
    str.Replace("/n", "<br/>");
}

使用Reflector 查看 HttpUttility.HtmlEncode 的实现，我们就可以看到，它只考虑的五种情况，空格，回车是没有处理的：

public static unsafe void HtmlEncode(string value, TextWriter output)
{
    if (value != null)
    {
        if (output == null)
        {
            throw new ArgumentNullException("output");
        }
        int num = IndexOfHtmlEncodingChars(value, 0);
        if (num == -1)
        {
            output.Write(value);
        }
        else
        {
            int num2 = value.Length - num;
            fixed (char* str = ((char*) value))
            {
                char* chPtr = str;
                char* chPtr2 = chPtr;
                while (num-- > 0)
                {
                    chPtr2++;
                    output.Write(chPtr2[0]);
                }
                while (num2-- > 0)
                {
                    chPtr2++;
                    char ch = chPtr2[0];
                    if (ch <= '>')
                    {
                        switch (ch)
                        {
                            case '&':
                            {
                                output.Write("&");
                                continue;
                            }
                            case '/'':
                            {
                                output.Write("'");
                                continue;
                            }
                            case '"':
                            {
                                output.Write(""");
                                continue;
                            }
                            case '<':
                            {
                                output.Write("<");
                                continue;
                            }
                            case '>':
                            {
                                output.Write(">");
                                continue;
                            }
                        }
                        output.Write(ch);
                        continue;
                    }
                    if ((ch >= '/x00a0') && (ch < 'ā'))
                    {
                        output.Write("&#");
                        output.Write(((int) ch).ToString(NumberFormatInfo.InvariantInfo));
                        output.Write(';');
                    }
                    else
                    {
                        output.Write(ch);
                    }
                }
            }
        }
    }
}

二、JS中的编码和解码

一、escape/unescape
    escape:escape 方法返回一个包含 charstring 内容的字符串值（Unicode 格式）。所有空格、标点、 重音符号以及任何其他非 ASCII 字符都用 %xx 编码替换，其中 xx 等于表示该字符的十六进制数
    unescape:从用 escape 方法编码的 String 对象中返回已解码的字符串
    例外字符： @ * / +

二、encodeURI/decodeURI
    encodeURI:方法返回一个已编码的 URI。如果将编码结果传递给 decodeURI，则将返回初始的字符串。encodeURI 不对下列字符进行编码：“:”、“/”、“;”和“?”。请使用 encodeURIComponent 对这些字符进行编码
    decodeURI:从用encodeURI方法编码的String对象中返回已解码的字符串
    例外字符：! @ # $ & * ( ) = : / ; ? + '

三、encodeURIComponent/decodeURIComponent
    encodeURIComponent:encodeURIComponent 方法返回一个已编码的 URI。如果将编码结果传递给decodeURIComponent，则将返回初始的字符串。因为 encodeURIComponent 方法将对所有字符编码
    decodeURIComponent:从用encodeURIComponent方法编码的String对象中返回已解码的字符串
    例外字符：! * ( ) '

本文章为转载内容，我们尊重原作者对文章享有的著作权。如有内容错误或侵权问题，欢迎原作者联系我们进行内容更正或删除文章。