当前位置：首页 > news >正文

FFMPEG录屏（21）--- Linux 下基于X11枚举所有可见窗口，并获取标题、图标、缩略图、进程路径等信息

news 2025/12/17 12:47:51

在 Linux X11 下枚举窗口并获取窗口信息

在 Linux 系统中，X11 是一个非常流行的窗口系统，它提供了丰富的 API 用于管理和操作窗口。在这篇博客中，我们将详细介绍如何使用 X11 枚举当前系统中的窗口，并获取它们的标题、截图、进程路径、进程图标等信息。

基础知识

官方文档新手 Displays and Screens
官方文档新手 Window System Objects
官方文档 API手册

摘两段比较重要的概念

X Is Client / Server
The X Window System was designed to allow multiple programs to share access to a common set of hardware. This hardware includes both input devices such as mice and keyboards, and output devices: video adapters and the monitors connected to them. A single process was designated to be the controller of the hardware, multiplexing access to the applications. This controller process is called the X server, as it provides the services of the hardware devices to the client applications. In essence, the service the Xserver provides is access, through the keyboard, mouse and display, to the X user.Like many client/server systems, the X server typically provides its service to many simultaneous clients. The X server runs longer than most of the clients do, and listens for incoming connections from new clients.Many users will only ever use X on a standalone laptop or desktop system. In this setting, the X clients run on the same computer as the X server. However, X defines a stream protocol for clients / server communication. This protocol can be exposed over a network to allow clients to connect to a server on a different machine. Unfortunately, in this model, the client/server labeling can be confusing. You may have an X server running on the laptop in front of you, displaying graphics generated by an X client running on a powerful machine in a remote machine room. For most other protocols, the laptop would be a client of file sharing, http or similar services on the powerful remote machine. In such cases, it is important to remind yourself that keyboards and mice connect to the X server. It is also the one endpoint to which all the clients (terminal windows, web browsers, document editors) connect.

Displays and Screens
X divides the resources of a machine into Displays and Screens. A Display is typically all the devices connected to a single X server, and displaying a single session for a single user. Systems may have multiple displays, such as multi-seat setups, or even multiple virtual terminals on a system console. Each display has a set of input devices, and one or more Screens associated with it. A screen is a subset of the display across which windows can be displayed or moved - but windows cannot span across multiple screens or move from one screen to another. Input devices can interact with windows on all screens of an X server, such as moving the mouse cursor from one screen to another. Originally each Screen was a single display adaptor with a single monitor attached, but modern technologies have allowed multiple devices to be combined into logical screens or a single device split.When connecting a client to an X server, you must specify which display to connect to, either via the $DISPLAY environment variable or an application option such as -display or --display. The full DISPLAY syntax is documented in the X(7) man page, but a typical display syntax is: hostname:display.screen The "hostname" may be omitted for local connections, and ".screen" may also be left off to use the default screen, leaving the minimal display specification of :display, such as ":0" for the normal default X server on a machine.

Windows
In X, a window is simply a region of the screen into which drawing can occur. Windows are placed in a tree hierarchy, with the root window being a server created window that covers the entire screen surface and which lives for the life of the server. All other windows are children of either the root window or another window. The UI elements that most users think of as windows are just one level of the window hierarchy.At each level of the hierarchy, windows have a stacking order, controlling which portions of windows can be seen when sibling windows overlap each other. Clients can register for Visibility notifications to get an event whenever a window becomes more or less visible than it previously was, which they may use to optimize to only draw the visible portions of the window.Clients running in traditional X environments will also receive Expose events when a portion of their window is uncovered and needs to be drawn because the X server does not know what contents were there. When the composite extension is active, clients will normally not receive expose events since composite puts the contents of each window in a separate, non-overlapped offscreen buffer, and then combines the visible segments of each window onscreen for display. Since clients cannot control when they will be used in a composited vs. legacy environment, they must still be prepared to handle Expose events on windows when they occur.

总的来说，Display就是像一个Session, Screen是当前Display(Session)中的显示适配器，Root Window是覆盖整个显示器的一块渲染区域，包含了各种子Window， UI元素就算一种窗口而已(总结的不到位，只是一点帮助获取所有窗口和屏幕列表的概括，建议看官方解释自行理解…)

环境准备

首先，我们需要安装一些必要的库和工具：

sudo apt-get install libx11-dev libxcomposite-dev libxext-dev

枚举窗口

我们将使用 X11 提供的 XQueryTree 函数来枚举当前系统中的所有窗口。以下是一个简单的示例代码：

#include <X11/Xlib.h>
#include <X11/Xatom.h>
#include <vector>
#include <string>
#include <iostream>struct WindowInfo {::Window window;std::string title;std::string process_path;std::vector<uint8_t> icon_data;int icon_width;int icon_height;
};std::vector<WindowInfo> enum_windows(Display *display) {std::vector<WindowInfo> windows;::Window root = DefaultRootWindow(display);::Window parent;::Window *children;unsigned int num_children;if (XQueryTree(display, root, &root, &parent, &children, &num_children)) {for (unsigned int i = 0; i < num_children; ++i) {WindowInfo info;info.window = children[i];// 获取窗口标题XTextProperty window_name;if (XGetWMName(display, children[i], &window_name) && window_name.value) {info.title = (char *)window_name.value;XFree(window_name.value);}// 获取进程路径Atom pid_atom = XInternAtom(display, "_NET_WM_PID", True);if (pid_atom != None) {Atom actual_type;int actual_format;unsigned long nitems, bytes_after;unsigned char *prop = nullptr;if (XGetWindowProperty(display, children[i], pid_atom, 0, 1, False, XA_CARDINAL,&actual_type, &actual_format, &nitems, &bytes_after, &prop) == Success && nitems > 0) {pid_t pid = *(pid_t *)prop;XFree(prop);char path[1024];snprintf(path, sizeof(path), "/proc/%d/exe", pid);char exe_path[1024];ssize_t len = readlink(path, exe_path, sizeof(exe_path) - 1);if (len != -1) {exe_path[len] = '\0';info.process_path = exe_path;}}}// 获取窗口图标Atom icon_atom = XInternAtom(display, "_NET_WM_ICON", True);if (icon_atom != None) {Atom actual_type;int actual_format;unsigned long nitems, bytes_after;unsigned char *prop = nullptr;if (XGetWindowProperty(display, children[i], icon_atom, 0, (~0L), False, AnyPropertyType,&actual_type, &actual_format, &nitems, &bytes_after, &prop) == Success && nitems > 0) {unsigned long *data = (unsigned long *)prop;info.icon_width = data[0];info.icon_height = data[1];info.icon_data.assign((uint8_t *)(data + 2), (uint8_t *)(data + 2 + info.icon_width * info.icon_height));XFree(prop);}}windows.push_back(info);}XFree(children);}return windows;
}int main() {Display *display = XOpenDisplay(NULL);if (!display) {std::cerr << "Failed to open display" << std::endl;return -1;}std::vector<WindowInfo> windows = enum_windows(display);for (const auto &window : windows) {std::cout << "Window ID: " << window.window << std::endl;std::cout << "Title: " << window.title << std::endl;std::cout << "Process Path: " << window.process_path << std::endl;std::cout << "Icon Size: " << window.icon_width << "x" << window.icon_height << std::endl;std::cout << std::endl;}XCloseDisplay(display);return 0;
}

获取窗口截图

为了获取窗口的截图，我们可以使用 XGetImage 函数。以下是一个示例代码：

#include <X11/Xlib.h>
#include <X11/Xutil.h>
#include <vector>
#include <iostream>
#include <fstream>void save_ximage_to_ppm(const char *filename, XImage *image) {std::ofstream ofs(filename, std::ios::binary);ofs << "P6\n" << image->width << " " << image->height << "\n255\n";for (int y = 0; y < image->height; ++y) {for (int x = 0; x < image->width; ++x) {unsigned long pixel = XGetPixel(image, x, y);unsigned char r = (pixel & image->red_mask) >> 16;unsigned char g = (pixel & image->green_mask) >> 8;unsigned char b = (pixel & image->blue_mask);ofs.put(r).put(g).put(b);}}
}void capture_window(Display *display, ::Window window, const char *filename) {XWindowAttributes attributes;XGetWindowAttributes(display, window, &attributes);XImage *image = XGetImage(display, window, 0, 0, attributes.width, attributes.height, AllPlanes, ZPixmap);if (image) {save_ximage_to_ppm(filename, image);XDestroyImage(image);}
}int main() {Display *display = XOpenDisplay(NULL);if (!display) {std::cerr << "Failed to open display" << std::endl;return -1;}::Window root = DefaultRootWindow(display);capture_window(display, root, "screenshot.ppm");XCloseDisplay(display);return 0;
}

结合所有功能

我们可以将上述功能结合起来，创建一个完整的程序来枚举窗口并获取它们的标题、截图、进程路径、进程图标等信息。

#include <X11/Xlib.h>
#include <X11/Xatom.h>
#include <X11/Xutil.h>
#include <vector>
#include <string>
#include <iostream>
#include <fstream>struct WindowInfo {::Window window;std::string title;std::string process_path;std::vector<uint8_t> icon_data;int icon_width;int icon_height;
};void save_ximage_to_ppm(const char *filename, XImage *image) {std::ofstream ofs(filename, std::ios::binary);ofs << "P6\n" << image->width << " " << image->height << "\n255\n";for (int y = 0; y < image->height; ++y) {for (int x = 0; x < image->width; ++x) {unsigned long pixel = XGetPixel(image, x, y);unsigned char r = (pixel & image->red_mask) >> 16;unsigned char g = (pixel & image->green_mask) >> 8;unsigned char b = (pixel & image->blue_mask);ofs.put(r).put(g).put(b);}}
}void capture_window(Display *display, ::Window window, const char *filename) {XWindowAttributes attributes;XGetWindowAttributes(display, window, &attributes);XImage *image = XGetImage(display, window, 0, 0, attributes.width, attributes.height, AllPlanes, ZPixmap);if (image) {save_ximage_to_ppm(filename, image);XDestroyImage(image);}
}std::vector<WindowInfo> enum_windows(Display *display) {std::vector<WindowInfo> windows;::Window root = DefaultRootWindow(display);::Window parent;::Window *children;unsigned int num_children;if (XQueryTree(display, root, &root, &parent, &children, &num_children)) {for (unsigned int i = 0; i < num_children; ++i) {WindowInfo info;info.window = children[i];// 获取窗口标题XTextProperty window_name;if (XGetWMName(display, children[i], &window_name) && window_name.value) {info.title = (char *)window_name.value;XFree(window_name.value);}// 获取进程路径Atom pid_atom = XInternAtom(display, "_NET_WM_PID", True);if (pid_atom != None) {Atom actual_type;int actual_format;unsigned long nitems, bytes_after;unsigned char *prop = nullptr;if (XGetWindowProperty(display, children[i], pid_atom, 0, 1, False, XA_CARDINAL,&actual_type, &actual_format, &nitems, &bytes_after, &prop) == Success && nitems > 0) {pid_t pid = *(pid_t *)prop;XFree(prop);char path[1024];snprintf(path, sizeof(path), "/proc/%d/exe", pid);char exe_path[1024];ssize_t len = readlink(path, exe_path, sizeof(exe_path) - 1);if (len != -1) {exe_path[len] = '\0';info.process_path = exe_path;}}}// 获取窗口图标Atom icon_atom = XInternAtom(display, "_NET_WM_ICON", True);if (icon_atom != None) {Atom actual_type;int actual_format;unsigned long nitems, bytes_after;unsigned char *prop = nullptr;if (XGetWindowProperty(display, children[i], icon_atom, 0, (~0L), False, AnyPropertyType,&actual_type, &actual_format, &nitems, &bytes_after, &prop) == Success && nitems > 0) {unsigned long *data = (unsigned long *)prop;info.icon_width = data[0];info.icon_height = data[1];info.icon_data.assign((uint8_t *)(data + 2), (uint8_t *)(data + 2 + info.icon_width * info.icon_height));XFree(prop);}}windows.push_back(info);}XFree(children);}return windows;
}int main() {Display *display = XOpenDisplay(NULL);if (!display) {std::cerr << "Failed to open display" << std::endl;return -1;}std::vector<WindowInfo> windows = enum_windows(display);for (const auto &window : windows) {std::cout << "Window ID: " << window.window << std::endl;std::cout << "Title: " << window.title << std::endl;std::cout << "Process Path: " << window.process_path << std::endl;std::cout << "Icon Size: " << window.icon_width << "x" << window.icon_height << std::endl;// 保存窗口截图std::string screenshot_filename = "screenshot_" + std::to_string(window.window) + ".ppm";capture_window(display, window.window, screenshot_filename.c_str());std::cout << "Screenshot saved to: " << screenshot_filename << std::endl;std::cout << std::endl;}XCloseDisplay(display);return 0;
}

需要注意获取ICON数据时候的特殊处理

在 get_window_icon 函数中，对于不同架构（32位和64位）的实现有所不同，这是因为在不同架构下，数据的存储方式和处理方式有所不同。以下是对该函数中不同架构实现的原因和方法的详细解释。

原因

数据存储方式不同：
- 在32位架构下，unsigned long 类型的大小是32位（4字节）。
- 在64位架构下，unsigned long 类型的大小是64位（8字节），但实际数据只使用了低32位，高32位是填充的。
数据处理方式不同：
- 在32位架构下，可以直接复制整个内存块，因为数据是连续存储的。
- 在64位架构下，需要逐个元素处理，因为每个元素是64位的，但实际数据只使用了低32位。

方法

32位架构实现

在32位架构下，直接复制整个内存块，因为数据是连续存储的，且每个元素的大小是32位。

#if defined(TRAA_ARCH_32_BITS)icon_data.assign(prop + 2 * sizeof(unsigned long), prop + nitems * sizeof(unsigned long));
#endif

这里使用 std::vector::assign 方法将图标数据从 prop 中复制到 icon_data 向量中。prop + 2 * sizeof(unsigned long) 跳过了前两个元素（宽度和高度），prop + nitems * sizeof(unsigned long) 表示复制整个内存块。

64位架构实现

在64位架构下，需要逐个元素处理，因为每个元素是64位的，但实际数据只使用了低32位。

#elif defined(TRAA_ARCH_64_BITS)// TODO: this can be optimized by using some SIMD instructions.icon_data.resize(width * height * desktop_frame::bytes_per_pixel);for (int y = 0; y < height; y++) {for (int x = 0; x < width; x++) {unsigned long pixel = data[2 + y * width + x];icon_data[(y * width + x) * desktop_frame::bytes_per_pixel + 3] = (pixel >> (24)) & 0xff; // Bicon_data[(y * width + x) * desktop_frame::bytes_per_pixel + 2] = (pixel >> (16)) & 0xff; // Gicon_data[(y * width + x) * desktop_frame::bytes_per_pixel + 1] = (pixel >> (8)) & 0xff;  // Ricon_data[(y * width + x) * desktop_frame::bytes_per_pixel + 0] = pixel & 0xff;           // A}}
#endif

这里逐个像素处理，将每个像素的ARGB值提取出来并存储到 icon_data 向量中。具体步骤如下：

调整 icon_data 的大小：
```
icon_data.resize(width * height * desktop_frame::bytes_per_pixel);
```
调整 icon_data 的大小以容纳所有像素数据。

逐个像素处理：

for (int y = 0; y < height; y++) {for (int x = 0; x < width; x++) {unsigned long pixel = data[2 + y * width + x];icon_data[(y * width + x) * desktop_frame::bytes_per_pixel + 3] = (pixel >> (24)) & 0xff; // Bicon_data[(y * width + x) * desktop_frame::bytes_per_pixel + 2] = (pixel >> (16)) & 0xff; // Gicon_data[(y * width + x) * desktop_frame::bytes_per_pixel + 1] = (pixel >> (8)) & 0xff;  // Ricon_data[(y * width + x) * desktop_frame::bytes_per_pixel + 0] = pixel & 0xff;           // A}
}