这是我这次做毕设的头一个环节,把本地数据集(json格式)读到程序里。我使用的java和idea

需要读的数据的格式是这样的

jquery 嵌套json解析 嵌套的json格式_json

可以看到,本身就是json数组的格式,然后每一个数组里面的json,有着很多的key,value里面还会嵌套着json数组。

在处理的时候,我用了一个比较笨的方法,因为我不知道怎么去解析json数组,那么我就在整个需要处理的数据集的前后加上括号{},这样的话,就可以把原本的json数组重新变成一个json,这个时候就可以用fastjson进行转化了。当然这里还有一个问题,(可能是我的问题),对于这个json数组来说,在大json里面它是没有key的,或者说key是null。我为了方便处理,在加括号的时候,顺便加了个key上去,这样就好处理了嘻嘻:)

首先把本地的文件读进来

读的时候,我就给文件加上了括号和key

jquery 嵌套json解析 嵌套的json格式_jquery 嵌套json解析_02

加上去之后,就可以对json进行嵌套处理了。基本的操作就是

1.把string用fastjson转成jsonObject【相当于是获得一个json对象,这个json对象只有一个key,就是我们添加的article,value就是原本的json数组】

2.把jsonObject转换成jsonArray【json数组】

3.对jsonArray遍历,这样就可以对原本嵌套的json里面的每一个json做处理了。

jquery 嵌套json解析 嵌套的json格式_jsonobject_03

里面如果再有嵌套的话,也是一样的处理。这里面的author原本就是一个json数组的形式,我把解决的过程写了一个方法,后面对tag进行处理的时候还能继续用用。

jquery 嵌套json解析 嵌套的json格式_intellij idea_04

在处理tag的时候,有一个小坑,就是json是不认“None”的,如果直接处理会报错。需要把None转成Null才可以。我这里就是简单的把传进来的stringbuilder遍历并且替换了一下。

全文主要借鉴

如果有人愿意试试。。

第一模块是代码,第二模块是用来进行测试的数据。

import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;


import java.io.*;



public class ReadFile implements Base{
	
		public static void readJsonFile()
		{
			String path="Collaborative Filtering\\testJson.json";
			File jsonFile=new File(path);
			FileReader fileReader= null;
			try {
				fileReader = new FileReader(jsonFile);

				Reader reader=new InputStreamReader(new FileInputStream(jsonFile),"utf-8");
				StringBuffer sbf=new StringBuffer();
				sbf.append("{\"article\":");
				int ch=0;
				while((ch=reader.read())!=-1)
				{
					sbf.append((char) ch);
				}
				sbf.append("}");

				JSONObject json=JSON.parseObject(sbf.toString());
				JSONArray jsonArray=json.getJSONArray("article");
				int length=jsonArray.size();
				for(int i=0;i<length;i++)
				{
					json=(JSONObject) jsonArray.get(i);
					//处理author

					String tempArray=(String) json.get("author");

					String author[]=getInnerJson(tempArray,"name");



					String id=json.getString("id");
					String title=json.getString("title");
					int year=json.getIntValue("year");
					String summary=json.getString("summary");
					StringBuilder tag_=new StringBuilder(json.getString("tag"));
					checkUnfairWordInJson(tag_);
					String tag[]=getInnerJson(tag_.toString(),"term");

					Article article=new Article(author,id,summary,title,year,tag);
					System.out.println(article.toString());



				}





				fileReader.close();
				reader.close();
			} catch (Exception e) {
					e.printStackTrace();
			}finally {

			}

		}
		public static void main(String args[])
		{
			readJsonFile();

		}
		//获得内部的json,转为String 输出
		public static String[] getInnerJson(String array,String key)
		{
			StringBuilder sbd1=new StringBuilder(array);

			int sbdLength=sbd1.length();
			sbd1.insert(0,"{\"temp\":");
			sbd1.append("}");

			JSONObject Json=JSON.parseObject(sbd1.toString());
			JSONArray ja=Json.getJSONArray("temp");
			int size=ja.size();
			String author[]=new String[size];
			for(int i=0;i<size;i++)
			{
				JSONObject innerJson=(JSONObject) ja.get(i);
				author[i]=(String) innerJson.get(key);
//				System.out.println(author[i]);

			}
			return author;
		}
		public static void checkUnfairWordInJson(StringBuilder sbd)
		{
			//暂时先处理None
			int length=sbd.length();
			int index=0;
			for (int i=0;i<length;i++)
			{
				if(sbd.charAt(i)=='N')
				{
					if(sbd.charAt(i+1)!='o') continue;
					if(sbd.charAt(i+2)!='n') continue;
					if(sbd.charAt(i+3)!='e') continue;

					sbd.replace(i,i+4,"Null");
					i=i+3;
				}
			}

		}

}
{
"article":[
  {
    "author": "[{'name': 'Ahmed Osman'}, {'name': 'Wojciech Samek'}]",
    "day": 1,
    "id": "1802.00209v1",
    "link": "[{'rel': 'alternate', 'href': 'http://arxiv.org/abs/1802.00209v1', 'type': 'text/html'}, {'rel': 'related', 'href': 'http://arxiv.org/pdf/1802.00209v1', 'type': 'application/pdf', 'title': 'pdf'}]",
    "month": 2,
    "summary": "We propose an architecture for VQA which utilizes recurrent layers to\ngenerate visual and textual attention. The memory characteristic of the\nproposed recurrent attention units offers a rich joint embedding of visual and\ntextual features and enables the model to reason relations between several\nparts of the image and question. Our single model outperforms the first place\nwinner on the VQA 1.0 dataset, performs within margin to the current\nstate-of-the-art ensemble model. We also experiment with replacing attention\nmechanisms in other state-of-the-art models with our implementation and show\nincreased accuracy. In both cases, our recurrent attention mechanism improves\nperformance in tasks requiring sequential or relational reasoning on the VQA\ndataset.",
    "tag": "[{'term': 'cs.AI', 'scheme': 'http://arxiv.org/schemas/atom','label': None}, {'term': 'cs.CL', 'scheme': 'http://arxiv.org/schemas/atom'}]",
    "title": "Dual Recurrent Attention Units for Visual Question Answering",
    "year": 2018
  },  {
    "author": "[{'name': 'Ji Young Lee'}, {'name': 'Franck Dernoncourt'}]",
    "day": 12,
    "id": "1603.03827v1",
    "link": "[{'rel': 'alternate', 'href': 'http://arxiv.org/abs/1603.03827v1', 'type': 'text/html'}, {'rel': 'related', 'href': 'http://arxiv.org/pdf/1603.03827v1', 'type': 'application/pdf', 'title': 'pdf'}]",
    "month": 3,
    "summary": "Recent approaches based on artificial neural networks (ANNs) have shown\npromising results for short-text classification. However, many short texts\noccur in sequences (e.g., sentences in a document or utterances in a dialog),\nand most existing ANN-based systems do not leverage the preceding short texts\nwhen classifying a subsequent one. In this work, we present a model based on\nrecurrent neural networks and convolutional neural networks that incorporates\nthe preceding short texts. Our model achieves state-of-the-art results on three\ndifferent datasets for dialog act prediction.",
    "tag": "[{'term': 'cs.CL', 'scheme': 'http://arxiv.org/schemas/atom', 'label': None}, {'term': 'cs.AI', 'scheme': 'http://arxiv.org/schemas/atom', 'label': None}, {'term': 'cs.LG', 'scheme': 'http://arxiv.org/schemas/atom', 'label': None}, {'term': 'cs.NE', 'scheme': 'http://arxiv.org/schemas/atom', 'label': None}, {'term': 'stat.ML', 'scheme': 'http://arxiv.org/schemas/atom', 'label': None}]",
    "title": "Sequential Short-Text Classification with Recurrent and Convolutional\n  Neural Networks",
    "year": 2016
  }

]
}