Java核心技术卷II(第8版) – 读书笔记 – 第1章

1、输入流：可以从中读取一个字节。InputStrean

输出流：可以向其中写入一个字节。OutputStream

2、流来源、去处可能是文件，也可能是网络等。

3、由于Unicode多个字节表示一个字，所以不可能用流读取Unicode编码的文件。因此引入了Reader和Writer处理他们（基于双字节）。

4、基础的流只有两个：InputStream和OutputStream

abstract class InputStream
{
    abstract int read();

    abstract int read(byte [] b, int off, int len);

    abstract int avaliable();

    long skip(long n);

    void close();
}

read()每次读取一个字节，读到最后返回-1，字节不可用时会阻塞。

read(byte[] b, int off, int len)从off偏移开始读取字节，读取len个，返回实际读取的个数。

avaliable，不阻塞的返回可用字节数，-1为结尾。

都结束操作后，应该用close关闭，否则会耗尽系统资源（比如Linux下的nofile）。

abstract class OutputStream
{
    abstract void write();

    abstract int write(byte [] b, int off, int len);

    void close();

    void flush();
}

write也是阻塞的写入。

flush是清空输出流，将缓存物理写入。close时会隐式调用这个。

5、所有的其他实用流都继承自InputStream/OutputStream，一次读取一个字节。

6、字符流用于处理Unicode，一次读取两个字节，Reader和Writer。

abstract class Reader
{
    abstract int read();
}

abstract class Writer
{
    abstract void write(int c);
}

Reader和InputStream基本一致，但是read返回的是Unicode码元(0~65535之间的数)，结尾一样返回-1。

Writer也是类似，只不过传入的c必须是Unicode码。

7、Java SE 5额外引入了几个接口：Closeable、Flushable、Readable、Appendable。

前两个接口就是对应了close和flush方法。Input/OutputStream、Reader/Writer都实现了Closeable，写的还实现了Flushable。

8、Readable接口只有一个方法：

Interface Readable
{
    int read(CharBuffer cb);
}

这个CharBuffer是影射到内存中的可随机访问的块。FileReader、BufferedReader等都实现了这个方法。

Interface Writable
{
    Appendable append(char c);
    Appendable append(CharSequence csq);
}

可以追加一个或者多个字节，只有Writer实现了它。

9、Java的流使用组合的方式实现复杂的功能。例如从文件中读取需要使用FileInputStream，从流中读取整数需要DataInputStream，但是后者只接受InputStream参数而不接受文件名参数，于是：

FileInputStream fin = new FileInputStream("em.dat");
DataInputStream din = new DataInputStream(fin);
long l = din.readLong();

如果我们希望读取的再快一些，可以在FileInputStream上追加包装一个BufferedInputStream。

10、PushBackInputStream，可以“回推”输入流。如果希望预先浏览，还能回退的流，可以用它的unread(...)方法。

11、ZipInputStream可以读写zip文件，例如我们直接读取Zip文件中的double：

ZipInputStream zin = new ZipInputStream(new FileInputStream("e.zip"));
DataInputStream din = new DataInputStream(zin);

读写zip文件需要结合getNextEntry()方法，如下：

public void unzip(String zipFileName,String outputDirectory)throws Exception{

		ZipInputStream in=new ZipInputStream(new FileInputStream(zipFileName));

		ZipEntry z;

		while ((z=in.getNextEntry() )!= null)

		{

			System.out.println("unziping "+z.getName());

			if (z.isDirectory())

			{

				String name=z.getName();

				name=name.substring(0,name.length()-1);

				File f=new File(outputDirectory+File.separator+name);

				f.mkdir();

				System.out.println("mkdir "+outputDirectory+File.separator+name);

			}

			else{

				File f=new File(outputDirectory+File.separator+z.getName());

				f.createNewFile();

				FileOutputStream out=new FileOutputStream(f);

				int b;

				while ((b=in.read()) != -1)

					out.write(b);

				out.close();

			}

		}

		in.close();

	}

12、OutputStreamWriter：把Unicode字符转变为字节流（以借助下层链上的XXOutputStream输出到文件、网络等）。

InputStreamReader：将包含字节输入流转变成提供Unicode读取的读入器：

InputStreamReader in = new InputStreamReader(new FileInputStream("xx.txt"), "utf-8");
StringBuilder sb = new StringBuilder();
char cbuf[] = new char[1024];
int len = 0;
while((len = in.read(cbuf, 0, cbuf.length))!=-1)
{
    sb.append(cbuf);
}

这貌似也是不同流转化编码的唯一方式。同理，输出别的编码，要使用OutputStreamWriter：

OutputStreamWriter(OutputStream out, String charsetName)

13、如果只是读写文件（没编码的问题），可以直接使用FileReader和FileWriter，他们都有不需要底层FileInput/OutputStream的构造函数：

FileWriter(String fileName)

FileReader(String fileName)

14、PrintWriter可以文本格式、数字等方法输出到文件，构造函数中可以带一个boolean参数，决定是否启用输出缓冲区，同时，还可以设置文件的字符编码。

import java.io.FileNotFoundException;
import java.io.PrintWriter;
import java.io.UnsupportedEncodingException;

public class PRTest {
	public static void main(String[] args) {

		PrintWriter out = null;
		try {
			out = new PrintWriter("test.txt", "utf-8");
			out.format("%s: %d", "计算所", 10);
                        out.println("abcabc")

		} catch (FileNotFoundException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} catch (UnsupportedEncodingException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		} finally {
			if (out != null) {
				out.close();
			}
		}

	}
}

15、总结：以二进制格式写数据：DataOutputStream，以文本格式写数据：PrintWriter。

16、读取文本数据：BufferedReader(<JDK5)或者Scanner(>JDK5)。

17、BufferedReader只能读取单个字符，如果要读数字等格式，最好用Scanner。

18、可以用文本格式存储对象：例如各个域之间用|分开（假定域内不含有|），则输出的时候用PrintWriter写String，读取的时候用Scanner读取一行，然后用String分割：

String line = scan.nextLine();
String parts[] = line.split("\\|");

注意|输入正则的一部分，需要转义。

19、JDK1.4之后，引入了java.nio.Charset，用语多编码的表示。

20、几种常见的编码：
ISO-8859-1 就是最原始的ASCII。
ISO-8859-15 1的基础上，加入法语和荷兰语，并使用欧元符号。
UTF-8 不解释了。。

通过调用Charset.forname获得一个编码：

Charset cset = Charset.forname("utf-8");

Java中的编码名字不区分大小写！

每个编码有很多别名，可以用aliases()取得这些别名。

import java.nio.charset.Charset;

public class TestCharSetAlias {

    public static void main(String [] args) {
        Charset cset = Charset.forName("utf-8");
        for(String name: cset.aliases()) {
            System.out.println(name);
        }
    }
}

21、如果不确定JDK中有哪些字符集，可以调用static方法availableCharsets获取所有可用的字符集编码：

import java.nio.charset.Charset;
import java.util.Map;

public class TestCharSetAlias {

    public static void main(String [] args) {
        Map<String, Charset> map = Charset.availableCharsets();
        for(String name: map.keySet()) {
            System.out.println(name);
        }
    }
}

Big5
Big5-HKSCS
COMPOUND_TEXT
EUC-JP
EUC-KR
GB18030
GB2312
GBK
IBM-Thai
IBM00858
IBM01140
IBM01141
IBM01142
IBM01143
IBM01144
IBM01145
IBM01146
IBM01147
IBM01148
IBM01149
IBM037
IBM1026
IBM1047
IBM273
IBM277
IBM278
IBM280
IBM284
IBM285
IBM297
IBM420
IBM424
IBM437
IBM500
IBM775
IBM850
IBM852
IBM855
IBM857
IBM860
IBM861
IBM862
IBM863
IBM864
IBM865
IBM866
IBM868
IBM869
IBM870
IBM871
IBM918
ISO-2022-CN
ISO-2022-JP
ISO-2022-JP-2
ISO-2022-KR
ISO-8859-1
ISO-8859-13
ISO-8859-15
ISO-8859-2
ISO-8859-3
ISO-8859-4
ISO-8859-5
ISO-8859-6
ISO-8859-7
ISO-8859-8
ISO-8859-9
JIS_X0201
JIS_X0212-1990
KOI8-R
KOI8-U
Shift_JIS
TIS-620
US-ASCII
UTF-16
UTF-16BE
UTF-16LE
UTF-32
UTF-32BE
UTF-32LE
UTF-8
windows-1250
windows-1251
windows-1252
windows-1253
windows-1254
windows-1255
windows-1256
windows-1257
windows-1258
windows-31j
x-Big5-HKSCS-2001
x-Big5-Solaris
x-euc-jp-linux
x-EUC-TW
x-eucJP-Open
x-IBM1006
x-IBM1025
x-IBM1046
x-IBM1097
x-IBM1098
x-IBM1112
x-IBM1122
x-IBM1123
x-IBM1124
x-IBM1381
x-IBM1383
x-IBM33722
x-IBM737
x-IBM833
x-IBM834
x-IBM856
x-IBM874
x-IBM875
x-IBM921
x-IBM922
x-IBM930
x-IBM933
x-IBM935
x-IBM937
x-IBM939
x-IBM942
x-IBM942C
x-IBM943
x-IBM943C
x-IBM948
x-IBM949
x-IBM949C
x-IBM950
x-IBM964
x-IBM970
x-ISCII91
x-ISO-2022-CN-CNS
x-ISO-2022-CN-GB
x-iso-8859-11
x-JIS0208
x-JISAutoDetect
x-Johab
x-MacArabic
x-MacCentralEurope
x-MacCroatian
x-MacCyrillic
x-MacDingbat
x-MacGreek
x-MacHebrew
x-MacIceland
x-MacRoman
x-MacRomania
x-MacSymbol
x-MacThai
x-MacTurkish
x-MacUkraine
x-MS932_0213
x-MS950-HKSCS
x-MS950-HKSCS-XP
x-mswin-936
x-PCK
x-SJIS_0213
x-UTF-16LE-BOM
X-UTF-32BE-BOM
X-UTF-32LE-BOM
x-windows-50220
x-windows-50221
x-windows-874
x-windows-949
x-windows-950
x-windows-iso2022jp

22、一旦获得了一个Charset，就可以在Java的Unicode和指定的编码格式之间进行转化，下面以GBK和Unicode之间做为例子。

从Unicode到GBK：

import java.nio.charset.Charset;
import java.nio.ByteBuffer;
import java.util.Map;

public class ConvCharset {

    public static void main(String [] args)throws Exception {
        Charset gbk_cset = Charset.forName("gbk");
        String utf_str = new String("计算所"); // utf-8 string
        String gbk_str = new String(utf_str.getBytes(), gbk_cset); // gbk string
        System.out.println(utf_str.length());
        System.out.println(gbk_str.length());
    }
}

从GBK到unicode：

暂时没有成功……诡异。。

23、DataOutput接口定义了以二进制格式输出，写各种类型数据的方法：

writeInt
writeByte
writeDouble
....
writeUTF

它们的写都是固定的，如int固定4字节，double是8。

总之是从Java内部类型转换到二进制表示，使用的都是大端。

对于UTF，使用的是一个UTF-8的修改版本，只有Java虚拟机是这么用的，因此跨语言时不要使用。

DataOutputStream实现了DataOutput接口，当然要用需要包装一层FileOutputStream，甚至Buffered神马的，比如：

DataOutputStream out  = new DataOutputStream(new FileOutputStream("file.dat"));

24、类似的DataInput定义了和上面对应的读取方法。一般来说writeInt之后（输出成了二进制），是必须要能够从readInt反解析回来的。

25、RandomAccessFile同时实现了DataInput和DataOutput，主要是用于磁盘上的文件：

RandomAccessFile in = new RandomAccessFile("file.dat", "r");
RandomAccessFile inOut = new RandomAccessFile("file.dat", "r");

这个模式还可以选rws和rwd，rws要求每个I/O操作都要同步，rwd可以将多次I/O合并成一次。

seek()用于将文件指针放到文件内部任何位置。

getFilePointer()返回当前文件指针的位置。

length()返回文件总字节数。

26、下面的例子使用DataOutputStream、RandomAccessFile等来实现固定字节Record的读取和随机访问。

切记：String的一个char占用两个字节！！

import java.io.*;

public class RAFTest {

	public static void main(String [] args) {
		Person [] ps = new Person[3];

		ps[0] = new Person("lxr", 1);
		ps[1] = new Person("lhy", 2);
		ps[2] = new Person("coder3", 3);

		int max_str = 80;
		int REC_LEN = max_str*2+4;

		try {
			//Write
			DataOutputStream out = new DataOutputStream(new FileOutputStream("person.dat"));
			for(int i=0; i<ps.length; i++) {
				ps[i].Write(max_str, out);
			}
			out.close();

			//Read one by one and output
			RandomAccessFile raf = new RandomAccessFile("person.dat", "r");
			int nps = (int)raf.length()/REC_LEN;
			Person [] ps2 = new Person[nps];
			for(int i=0; i< nps; i++) {
				raf.seek(i*REC_LEN);
				ps2[i] = new Person();
				ps2[i].Read(max_str, raf);
				System.out.println(ps2[i]);
			}
			raf.close();

		} catch(Exception e) {
			e.printStackTrace();
		}

	}
}

class Person {

	public Person(String name, int id) {
		this.name = name;
		this.id = id;
	}

	public Person() {
	}

	public String getName() {
		return name;
	}

	public int getId() {
		return id;
	}

	public void setName(String name) {
		this.name = name;
	}

	public void setId(int id) {
		this.id = id;
	}

	public void Write(int str_len, DataOutput out) throws IOException {
		//Write name, at most str_len *2 bytes
		for(int i=0; i < str_len; i++) {
			char c = 0;
			if(i<name.length()) {
				c = name.charAt(i);
			}
			out.writeChar(c);
		}
		//Output id
		out.writeInt(id);
	}

	public void Read(int str_len, DataInput in) throws IOException {
		//Read name, skip if len < str_len
		StringBuilder sb = new StringBuilder();
		for(int i=0; i < str_len; i++) {
			char c = in.readChar();
			if(c!=0) {
				sb.append(c);
			}
		}
		this.name = sb.toString();
		//Read id
		this.id = in.readInt();
	}

	public String toString() {
		return this.name + " " + this.id;
	}

	private String name;
	private int id;
}

27、ZipInputStream和ZipOutputStream是用于操作zip文件的。

它们都是嵌套结构，一个Stream内含若干个ZipEntry(就是Zip内置的文件)

对于ZipInputStream，需要首先调用getNextEntry()，之后再调用read()，当然也可以用其他DataInput等包装读取。当-1时，表示当前entry完毕，需要先调用CloseEntry()，然后再读取getNextEntry()，直到返回null，表示全部都读取完毕。

对于ZipOutputStream，新建一个文件时，需要putNextEntry()，写入完毕后，调用closeEntry()。

一个读写zip文件的例子如下：

import java.io.*;
import java.util.*;
import java.util.zip.*;

public class ZipTest {

    public static void main(String [] args) throws Exception {
        //Write an zip file
        ZipOutputStream zout = new ZipOutputStream(new FileOutputStream("test.zip"));
        DataOutputStream dout = new DataOutputStream(zout);
        zout.putNextEntry(new ZipEntry("file1.txt"));
        dout.writeUTF(new String("I'm file1.txt"));
        zout.closeEntry();
        zout.putNextEntry(new ZipEntry("file2.txt"));
        dout.writeUTF(new String("I'm file2.txt"));
        zout.closeEntry();
        dout.close();
        zout.close();

        //Read zip file
        ZipInputStream zin = new ZipInputStream(new FileInputStream("test.zip"));
        ZipEntry entry = null;
        while( (entry = zin.getNextEntry() )!=null) {
            Scanner scan = new Scanner(zin); //一定要新建Scanner！！
            String str = "";
            while(scan.hasNextLine()){
                str+=scan.nextLine();
                str+="\n";
            }
            System.out.println(entry.getName());
            System.out.println(str);
            System.out.println();
            zin.closeEntry();
        }
        zin.close();
    }
}

需要注意的是，读取时，每切换一个Entry，要新建一个Scanner！！！

28、Java提供了“对象序列化”机制，可以将任何对象写入到流中，并在将来取回。

29、ObjectOutputStream用于输出对象，ObjectInputStream用于读取。对象必须实现了Serialize接口。但这个接口不含任何方法。所以序列化理论上不需要做其他事情，Stream会自动扫描所有域，并逐一序列化/反序列化。

30、为了保证对象的一致性（一个对象可能被多个其他对象引用）。每个对象内部有唯一的ID。

31、相同序号重复出现将被替换为只存储对象序号的引用。

32、如果不想让某个域被序列化，将其表为transient（瞬时的）即可，如：

public class LabelPoint implements Serializable {
    private String label;
    private transient int tmp_id;
}

33、如果想大规模重写序列化、反序列化，可以自己写readObject和writeObject：

private void writeObject(ObjectOutputStream out);
private void readObject(ObjectInputStream out);

34、如果想同时重写超类中的数据域，则要使用Externalizable接口。如果你对继承结构复杂的类序列化，并且想要更快的性能，应该使用Externalizable，它会更快。

35、对于Enum的枚举，序列化可以正常工作。对于public static final xxx 这种定义的常量，一般不会正常工作，此时慎重使用对象序列化。

36、因为序列化会存储类结构的指纹，因此如果类结构变化了，想搞一个版本号怎么办？简单，加上：

public static final long serialVersionUID = 42L;

37、其实也可以用序列化做clone：先Output再Input。当然这比clone慢很多。

38、前面的各种Stream关注的是文件内容。而文件的管理，与文件系统，则由File类完成。

File f = new File("file.dat");
System.out.println(f.getAbsolutePath());
System.out.println(f.exists());

39、File还可以有dir和name的构造：

File(File dir, String name);

File既表示文件也可以是目录，用isFile()和isDirectory()区分。

File中的separator是路径分隔符，windows为\，Linux为/

File还有很多功能，如创建临时文件，遍历目录等等。。

40、JDK 1.4后引入了New I/O(java.nio)，包含了下述操作系统的新特性：

字符编码
非阻塞I/O
内存映射文件
文件加锁

41、操作系统支持将文件映射到内存的一块中进行操作，以减少不断I/O的时间浪费。

用nio进行文件映射很简单：

(1)和往常一样打开FileInputStream
(2)FileChannel channel = stream.getChannel() 获得这个通道。
(3)FileChannel的map方法获得映射：

public abstract MappedByteBuffer map(FileChannel.MapMode mode,
                   long position,
                   long size);

有三种默认：只读、可写、私人（可以更改，但不会更新到文件）。

之后就可以用ByteBuffer进行操作了。

可以一次get()出一个byte，也可以用getInt()，getDouble()等以二进制的方式读取基本类型。可以用order()设置大、小端。

42、Buffer这个超类和其子类IntBuffer、ByteBuffer、DoubleBuffer等，也是nio新引进的。

43、缓冲区是相同类型的信息块，每个缓冲区都有：固定的容量（设置的）、读写位置、界限（超出后无意义）、标记（用于重复读等）。

44、缓冲区主要是用来循环执行“写、读”等操作。
(1)初始：位置为0，界限等于容量，不断用put写入到缓冲区。
(2)当数据全写完或者到达容量限制时，需要切换到读操作。
(3)调用flip，将位置复位到0。
(4) 此时不断get，当remainning()返回正是，表示还有未读取完的。最后调用clear()重回写状态
如果要重新读入，可以用rewind或者mark/reset。

关于Buffer和Channel可以围观一下这篇神文，解释的很好：

http://www.cnblogs.com/focusj/archive/2011/11/03/2231583.html

45、有时候我们想对文件加锁，还是用FileChannel，它有lock()方法：

public final FileLock lock()

它会阻塞，直到获得一个锁，也可以用trylock，不阻塞。

lock也有设置独占、共享的版本：

public abstract FileLock lock(long position,
            long size,
            boolean shared)

如果shared=true，则共享，但放置别人独占。false则独占，排斥其他全部。

46、锁为JVM不可重入（同一个JVM启动的类不能全占有），实际是进程不可重入。

47、正则表达式，模式化的字符串：

[Jj]ava.+

匹配Java、java，java/Java**

正则表达式的规则不再多说了，用的太多了。

Java与正则相关的API主要是Pattern和Matcher类。

Pattern pattern = Pattern.compile("patternString");
Matcher matcher = pattern.matcher("string to be matched");
//模式1：匹配全串
if (matcher.matches()) {
...
}
//模式2：匹配子串
while(matcher.find()) {
......
}

compile时可以加选项：

CASE_INSENSITIVE：大小写不敏感
MULTILINE：可以跨行
DOTALL：匹配所有终止。

Java的正则也支持组群。

group(int gourpNum)

0是全串，从1开始，按照在pattern中按照括号的逐一递增排序。

Matcher的replaceAll替换所有，之中可以用$n表示引用组群。

Pattern的split可以用正则分割字符串。

一个提取网页中所有<a href="">，并把所有url替换为#的例子：

import java.io.*;
import java.util.regex.*;

public class RETest {
    public static void main(String [] args) throws IOException {
        //Read html
        BufferedReader reader = new BufferedReader(new FileReader("test.html"));
        char buf [] = new char[1024];
        int len;
        StringBuilder sb = new StringBuilder();
        while((len = reader.read(buf, 0, 1024))!=-1) {
            sb.append(buf, 0, len);
        }
        String html = sb.toString();
        //System.out.println(html);
        reader.close();

        //Regular Exp
        Pattern pt = Pattern.compile("<a\\s.*?href=\"([^\"]+)\"[^>]*>(.*?)</a>", Pattern.MULTILINE|Pattern.DOTALL);
        Matcher ma = pt.matcher(html);
        while(ma.find()) {
            int ng = ma.groupCount();
            if(ng>0){
                    System.out.println(ma.group(1));
            }
        }
    }
}

本章完毕。

四号程序员

Keep It Simple and Stupid

Java核心技术卷II(第8版) – 读书笔记 – 第1章

Leave a Reply Cancel reply