当前位置：首页 > news >正文

Linux 文件 IO 管理（第二讲）（重定向和缓冲区）

news 2024/11/16 20:25:27

Linux 文件 IO 管理（第二讲）（重定向和缓冲区）

read 和 stat
- read
- stat
- 测试
文件描述符分配规则
重定向和缓冲区
- 引入
- 理解重定向
- - dup2
- 理解缓冲区
- - 刷新策略
奇怪的代码
- 第一串代码
- 第二串代码
- 解释
- - 对于第一串代码来说
  - 对于第二串代码来说
完善 shell

请特别理解第一讲的 Linux 系统一切皆文件

read 和 stat

先来认识两个 系统调用 ： read 和 stat

read

需要 头文件 ：

#include <unistd.h>

下面是 系统调用 的原型：

ssize_t read(int fd, void *buf, size_t count);

作用：从指定 文件描述符 fd 所指文件里，读取 count 个字节放进 buf 里

stat

需要 头文件 ：

#include <sys/types.h>
#include <sys/stat.h>
#include <unistd.h>

下面是 系统调用 的原型：

int stat(const char *path, struct stat *buf);
int fstat(int fd, struct stat *buf);
int lstat(const char *path, struct stat *buf);

作用：通过指定 文件描述符 fd 或者路径，获取一个 struct stat * 指针（输出型参数），此指针指向 stat 结构体

那 stat 结构体是什么呢？直译是一个文件的状态，其实里面装载此 文件的属性 ，就是如下模样：

struct stat {dev_t     st_dev;     /* ID of device containing file */ino_t     st_ino;     /* inode number */mode_t    st_mode;    /* protection */nlink_t   st_nlink;   /* number of hard links */uid_t     st_uid;     /* user ID of owner */gid_t     st_gid;     /* group ID of owner */dev_t     st_rdev;    /* device ID (if special file) */off_t     st_size;    /* total size, in bytes */blksize_t st_blksize; /* blocksize for file system I/O */blkcnt_t  st_blocks;  /* number of 512B blocks allocated */time_t    st_atime;   /* time of last access */time_t    st_mtime;   /* time of last modification */time_t    st_ctime;   /* time of last status change */
};

我们知道 文件 = 内容 + 属性 ，那么对文件的操作无非就是：要么对内容操作，要么对属性操作

而这三个 系统调用 就是对属性操作（获取属性）；上面有个属性叫做 st_size ，表示文件总大小为多少字节

测试

#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>const char* filename = "log.txt";int main()
{struct stat st;int n = stat(filename, &st);if (n < 0) return 1;printf("file size: %lu\n", st.st_size);int fd = open(filename, O_RDONLY);if (fd < 0){perror("open");return 2;}char* file_buffer = (char*) malloc(st.st_size + 1);n = read(fd, file_buffer, st.st_size);if (n > 0){file_buffer[n] = '\0';printf("%s", file_buffer);}free(file_buffer);close(fd);return 0;
}

文件描述符分配规则

我们知道系统可通过 文件描述符 找到进程正在操作的文件，可是这之前是如何给文件分配 文件描述符 呢？也就是文件所在的下标分配遵循什么规则？

仔细阅读下面代码：

#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>const char* filename = "log.txt";int main()
{int fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0666);if (fd < 0){perror("open");return 1;}printf("fd: %d\n", fd);close(fd);return 0;
}

按照前面第一讲的学习，这里的 fd 的值一定是 3

前面我们知道，系统会为进程打开 文件描述符 为 1，2，3 的文件，可是现在我要是关闭了 文件描述符 为 0 对应的文件呢？也就是下面这样：

const char* filename = "log.txt";
int main()
{close(0);int fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0666);if (fd < 0){perror("open");return 1;}printf("fd: %d\n", fd);close(fd);return 0;
}

那么答案就是 0 了，因为文件描述符的分配规则是：查自己的文件描述符表，分配最小的，没有被使用的 fd

所以上述代码关闭的 文件描述符 若是是 2 ，结果也就是 2 了；但如果你关闭 1 ，那 结果将什么都不显示，因为 文件描述符 为 1 的文件表示的是 标准输出流，你给它关了，printf 还怎么打印到屏幕上

重定向和缓冲区

引入

咱先看下面的代码：

const char* filename = "log.txt";
int main()
{close(1);int fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0666);if (fd < 0){perror("open");return 1;}printf("printf -> fd: %d\n", fd);fprintf(stdout, "fprintf -> fd: %d\n", fd);fflush(stdout);close(fd);return 0;
}

很显然，这一次先关闭了 标准输出流，那此时 log.txt 的 文件描述符 就是 1 ，紧接着 printf 和 fprintf 往 标准输出流 stdout 里打印数据，但 标准输出流 stdout 内封装的文件描述符 1 在内核里已经不是显示器文件，而是 log.txt 文件

所以在 log.txt 文件里会有如下数据：

printf -> fd: 1
fprintf -> fd: 1

原本往 显示器文件 打印的数据现在却在 其他文件 里，这难道不是一种 重定向 吗？

那么 重定向的本质：是在内核里改变文件描述符表特定下标的内容，和上层库无关

但上述代码不觉得奇怪吗？为什么要加 fflush(stdout); 这个语句呢？注释掉试试？

结果是：在 log.txt 文件里将什么也没有，为什么？进程不是将数据打印到 stdout 里面了吗，屏幕没有也就算了，系统吞数据是吧？

是的，从某方面说，系统确实吞了数据；在 C 语言标准库里封装了 struct FILE 这样的结构体来操作文件，典型的对象就是 stdin ， stdout 和 stderr ，里面封装了文件的很多属性，包括 _fileno 之类的，而里面还有一个指针，也指向一片内存空间，这是语言级别的文件缓冲区！！！

在 第一讲 里我们讨论了内核级别的文件缓冲区，这是内核里每个 struct file 数据结构的标配；这俩有什么区别呢？

其实在你使用 printf 或者 fprintf 的时候，其实会先将数据通过 文件描述符，放进语言级别的文件缓冲区，然后才会将数据刷新到内核级别的文件缓冲区里

而 fflush 的 真正作用 正是：将语言级别的文件缓冲区数据通过 文件描述符，刷新到内核级别的文件缓冲区里

如果没有 fflush 函数，正常情况下会在进程结束后冲刷语言级别的文件缓冲区，拷贝进内核，但上面的代码不是正常情况，进程在还没有结束的情况下 文件被语句 close(fd); 直接关闭了，导致现在压根没有对应在内核级别的 文件缓冲区 了

如果没有 fflush 函数，就会出现 log.txt 都被关闭了才把数据刷过来的情况，早干嘛去了？不好意思， log.txt 什么都没有，所以就会出现系统吞数据的情况

理解重定向

其实对于语言层来说，被封装的固定文件的 文件描述符 都是不变的，标准输入 就是 0 ，标准输出 就是 1 ，只认整数，因为 OS 返回给上层的也就是数字，可是具体下标里指向什么文件是由我 OS 说的算

所以 重定向 的本质就是想办法让 OS 修改 文件描述符表（是个数组） 里固定指向的文件内容，就可以灵活的完成重定向，我想怎么改就怎么改

那究竟该怎么做呢？难道每次 重定向 都要像上面那样关闭对应的文件才能做到吗？并不是，接下来认识一个 系统调用 ：

dup2

需要 头文件 ：

#include <unistd.h>

下面是 系统调用 的原型：

int dup(int oldfd);
int dup2(int oldfd, int newfd);

作用：可以在数组 struct file* fd_array[] 里，将两个 文件描述符（数组下标） 对应的内容做值拷贝

这里需要注意，是将 oldfd 的内容拷贝到 newfd 里面，而且是覆盖式的拷贝；所以调用 dup2 后，俩 文件描述符 里指向的文件是同一个，且都是原来 oldfd 指向的文件

例：而上面代码的 重定向 ：要将 文件描述符 为 1 的 显示器文件 修改为 log.txt 文件，那就直接 dup2(fd, 1); 即可，此时文件描述符 fd 和 1 的位置都指向 log.txt 文件，目的达成

不过此时的 log.txt 也知道自己被 两个不同的文件描述符 指向，因为在 struct file 里面有属性 ref_count （int），可以记录当前指向此文件的 文件描述符 个数：如果为 0 ，此文件的 struct file 就会被释放

代码验证：

const char* filename = "log.txt";
int main()
{int fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0666);if (fd < 0){perror("open");return 1;}dup2(fd, 1);printf("printf -> fd: %d\n", fd);fprintf(stdout, "fprintf -> fd: %d\n", fd);close(fd);return 0;
}

这是常规实现 重定向 的方法，把原本要往 显示器文件 写数据的动作换成往 log.txt 文件写入

理解缓冲区

缓冲区 就是一块内存空间，主要作用就是为了性能问题，给上层提供高效的 IO 体验，间接提高整体效率

缓冲区 有很多，既有 用户级 的，也有 内核级 的

几乎所有的 缓冲区 都有两种好处：

解耦：只需要将数据交给 缓冲区，底层怎么做你完全不用管
提高效率：注意这里的提高效率不是提高怎么将数据刷新到外设，而是：
- 提高的原进程的效率：因为原进程不需要管怎么将数据刷新到外设，已经交给别人（ 缓冲区 ）做啦，可以自己做自己的事情，这还不够效率嘛？
- 提高整体向系统交付数据的效率，刷新 IO 的效率：如果你频繁使用 系统调用 会 影响性能，因为使用系统调用接口是有成本的，是需要 OS 来配合的，可 OS 是个大忙人，你得等 OS 啊，所以类似 printf 的函数：
  - 若是直接将数据交付 OS 的情况下，多次调用 printf ，势必影响性能
  - 若是交付 缓冲区，那么多次调用 printf 情况下就只要向 缓冲区 交付数据，最后只需要一次 系统调用 把 缓冲区 所有数据刷新到内核中即可

刷新策略

以下大多数是针对 用户级的缓冲区 而言：

立即刷新 ：类似 C 语言库函数 fflush ；Linux 系统调用 fsync （ int fsync(int fd); 把 fd 所指的文件数据立即从内核缓冲区刷新到外设）
行刷新 ：显示器采用 行刷新 （为了照顾用户的查看习惯）
全缓冲 ：缓冲区写满才刷新（普通文件）
特殊情况 ：
- 进程退出，系统会自动刷新
- 强制刷新，类似 立即刷新

奇怪的代码

第一串代码

第一串哦：

// test.c 源代码
#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>int main()
{// C 语言printf("Hello printf!!!\n");fprintf(stdout, "Hello fprintf!!!\n");// system callconst char* msg = "Hello write!!!\n";write(1, msg, strlen(msg));return 0;
}

观察现象：

[exercise@localhost redirection]$ ll
total 8
-rw-rw-r-- 1 exercise exercise   88 Sep 16 17:41 makefile
-rw-rw-r-- 1 exercise exercise 2082 Sep 19 16:57 test.c
[exercise@localhost redirection]$ make
gcc test.c -o Test -g
[exercise@localhost redirection]$ ./Test
Hello printf!!!
Hello fprintf!!!
Hello write!!!
[exercise@localhost redirection]$ ./Test > log.txt
[exercise@localhost redirection]$ cat log.txt
Hello write!!!
Hello printf!!!
Hello fprintf!!!
[exercise@localhost redirection]$

现象很明显啊！！！打印的数据顺序居然不一样

第二串代码

别着急，咱还有第二串代码：

// test.c 源代码
#include <stdio.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>int main()
{// C 语言printf("Hello printf!!!\n");fprintf(stdout, "Hello fprintf!!!\n");// system callconst char* msg = "Hello write!!!\n";write(1, msg, strlen(msg));fork(); // 和 第一串代码 唯一不一样的地方return 0;
}

观察现象：

[exercise@localhost redirection]$ ll
total 8
-rw-rw-r-- 1 exercise exercise   88 Sep 16 17:41 makefile
-rw-rw-r-- 1 exercise exercise 2094 Sep 19 17:02 test.c
[exercise@localhost redirection]$ make
gcc test.c -o Test -g
[exercise@localhost redirection]$ ./Test
Hello printf!!!
Hello fprintf!!!
Hello write!!!
[exercise@localhost redirection]$ ./Test > log.txt
[exercise@localhost redirection]$ cat log.txt
Hello write!!!
Hello printf!!!
Hello fprintf!!!
Hello printf!!!
Hello fprintf!!!
[exercise@localhost redirection]$

啊？这个更离谱，怎么 重定向 后打印这么多东西？

解释

对于第一串代码来说

没有 重定向 时，数据是打印在屏幕上的，也就是 显示器文件，所以此时的缓冲区刷新策略是 行刷新

所以 用户级缓冲区 遇到 \n 就将数据刷新到 内核级缓冲区 中，所以此时 内核级缓冲区 无论怎么刷新，顺序都是：

Hello printf!!!
Hello fprintf!!!
Hello write!!!

可若是 重定向 为普通文件，缓冲区刷新策略就变成了 全刷新，将缓冲区写满才刷新，所以 printf 和 fprintf 的数据都在 用户级缓冲区 中

可是 系统调用 不一样啊，它会直接将数据送进 内核级缓冲区 ，现在好了，导致 Hello write!!! 是最先被送进内核的，而 用户级缓冲区 里的数据是进程结束才被送进 内核级缓冲区 ，所以顺序会出现问题

对于第二串代码来说

就是比第一串代码多了个 fork() 创建子进程

在没有 重定向 的时候，和第一串代码结果一样，毕竟 显示器文件 刷新策略是按行刷新，那么读取 换行符 \n 就会将数据送进内核，所以顺序依然正常的；

而在执行 fork() 之前，数据均已在内核，所以父子进程结束时没有需要刷新到内核的数据

但进行 重定向 后就不一样了，将 显示器文件重定向到普通文件，此时刷新策略就是 全刷新 ，和第一串代码一样， write 会先将数据送进内核，而 用户级缓冲区 的数据要等进程结束再被刷新进内核

但执行 fork() 后变成两个进程，两个进程结束时都会冲刷缓冲区，此时同样的数据就会被父子进程分别进行冲刷，也就是两遍，所以会出现两遍下面的字符串

Hello printf!!!
Hello fprintf!!!

完善 shell

在之前我们写了一个简易版的 shell ，详情请见本人拙作 Linux 自主 shell 编写（C 语言实现）

严重建议先看这篇博客理解实现原理，而接下来学习了 重定向，可以往里添加咯

首先要明白，重定向 步骤应该在执行命令之前就得检查出来！CheckRedir(usercommand);

至于和 分割命令行的步骤 相比，个人感觉放在其前后都是可以的，取决于个人实现，这里就放在 分割命令行的步骤 之前检查出来，并且使用 全局变量 和宏分别来保存 重定向文件名 和 重定向类型

然后在 执行命令的函数 ExecuteCommmand() 开头检查是否有 重定向，若有则调用 dup2 进行 重定向

源码（CentOS 7.9 平台测试，gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)）：

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <string.h>
#include <ctype.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>#define SIZE 512
#define ZERO '\0'
#define SEP " "
#define NUM 32
#define SkipPath(pCwd) do { pCwd += (strlen(pCwd) - 1); while (*pCwd != '/') --pCwd; } while (0)
#define SkipSpace(line, pos) do { \while (1) { \if (isspace(line[pos])) pos++; \else break; \} \} while (0)#define None_Redir  0
#define In_Redir    1
#define Out_Redir   2
#define App_Redir   3int redir_type = None_Redir;
char* filename = NULL;char* gArgv[NUM];
char Cwd[SIZE];
int lastcode = 0;const char* getHome()
{const char* home = getenv("HOME");if (home == NULL) return "/";return home;
}const char* getUserName()
{const char* username = getenv("USER");if (username == NULL) return "None";return username;
}const char* getHostName()
{const char* hostname = getenv("HOSTNAME");if (hostname == NULL) return "None";return hostname;
}// 临时
const char* getCwd()
{const char* cwd = getenv("PWD");if (cwd == NULL) return "None";return cwd;
}void MakeCommandLineAndPrint()
{char line[SIZE];const char* username = getUserName();const char* hostname = getHostName();const char* cwd = getCwd();SkipPath(cwd);snprintf(line, sizeof(line), "[%s@%s %s]> ", username, hostname, strlen(cwd) == 1 ? "/" : (cwd + 1));printf("%s", line);fflush(stdout);
}int GetUserCommand(char command[], size_t size)
{char* s = fgets(command, size, stdin);if (s == NULL) return -1;command[strlen(command) - 1] = ZERO;return strlen(command);
}void SplitCommand(char command[], int size)
{(void)size;gArgv[0] = strtok(command, SEP);int index = 1;while ((gArgv[index++] = strtok(NULL, SEP)));}void Die()
{exit(1);
}void ExecuteCommmand()
{pid_t id = fork();if (id < 0) Die();else if (id == 0){// 重定向设置if (filename != NULL){if (redir_type == In_Redir){int fd = open(filename, O_RDONLY);dup2(fd, 0);}else if (redir_type == Out_Redir){int fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC, 0666);dup2(fd, 1);}else if (redir_type == App_Redir){int fd = open(filename, O_WRONLY | O_CREAT | O_APPEND, 0666);dup2(fd, 1);}else {}}// childexecvp(gArgv[0], gArgv);exit(errno);}else {// parentint status = 0;pid_t rid = waitpid(id, &status, 0);if (rid > 0){lastcode = WEXITSTATUS(status);if (lastcode != 0) printf("%s:%s:%d\n", gArgv[0], strerror(lastcode), lastcode);}}
}void Cd()
{// 获取 cd 路径const char* path = gArgv[1];if (path == NULL) path = getHome();// 此时 path 一定存在，那么可以直接使用 系统调用 修改工作路径chdir(path);// 获取此时的工作路径char temp[SIZE * 2];getcwd(temp, sizeof(temp));//  拼接 PWD 环境变量snprintf(Cwd, sizeof(Cwd), "PWD=%s", temp);// 刷新环境变量putenv(Cwd);
}int CheckBuildIn()
{int yes = 0;const char* enter_cmd = gArgv[0];if (strcmp(enter_cmd, "cd") == 0){yes = 1;Cd();}else if (strcmp(enter_cmd, "echo") == 0 && strcmp(gArgv[1], "$?") == 0){yes = 1;printf("%d\n", lastcode);lastcode = 0;}return yes;
}void CheckRedir(char* line)
{// <, >, >>int pos = 0;int end = strlen(line);while (pos < end){if (line[pos] == '>'){if (line[pos + 1] == '>'){line[pos++] = 0;pos++;redir_type = App_Redir;SkipSpace(line, pos);filename = line + pos;}else {line[pos++] = 0;redir_type = Out_Redir;SkipSpace(line, pos);filename = line + pos;}}else if (line[pos] == '<'){line[pos++] = 0;redir_type = In_Redir;SkipSpace(line, pos);filename = line + pos;}else {pos++;}}
}int main()
{int quit = 0;while (!quit){// 初始化重定向参数redir_type = None_Redir;filename = NULL;// 自己需要输出一个命令行MakeCommandLineAndPrint();// 获取用户命令字符串char usercommand[SIZE];int num = GetUserCommand(usercommand, sizeof(usercommand));if (num < 0) return 1;else if (num == 0) continue;// 检查重定向CheckRedir(usercommand);// 分割用户命令字符串SplitCommand(usercommand, sizeof(usercommand));// 检查命令是否为内建命令num = CheckBuildIn();if (num) continue;// 执行命令ExecuteCommmand();}return 0;
}