# movie_wordcloud **Repository Path**: zheng_yongtao/movie_wordcloud ## Basic Information - **Project Name**: movie_wordcloud - **Description**: 豆瓣评论分析,包括情感分析和关键词提取。 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 2 - **Forks**: 0 - **Created**: 2025-01-31 - **Last Updated**: 2025-08-14 ## Categories & Tags **Categories**: Uncategorized **Tags**: Python, Spider, 豆瓣, 电影, movie ## README ## 说在前面 > 今年春节档电影有很多,面对这么多的电影,我们该怎么去选择观看呢?对很多电影抱有期待但又不想浪费时间去看“烂片”,想看影评又不想被剧透,那么我们就可以写一个脚本来提取电影评论进行分析,通过评论来了解观众对电影的看法和评价。 ## 效果展示 ### 封神第二部:战火西岐 ![封神第二部:战火西岐_词云.png](https://p0-xtjj-private.juejin.cn/tos-cn-i-73owjymdk6/4f4afe8ab8994d68834916b1f4e2e761~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgSlllb250dQ==:q75.awebp?policy=eyJ2bSI6MywidWlkIjoiNDQwMjQ0MjkwNzI3Mjk0In0%3D&rk3s=e9ecf3d6&x-orig-authkey=f32326d3454f2ac7e96d3d06cdbb035152127018&x-orig-expires=1738404972&x-orig-sign=qWFXKhatDEaxf6OkI5TtIdCgb7s%3D) ![封神第二部:战火西岐_情感分布.png](https://p0-xtjj-private.juejin.cn/tos-cn-i-73owjymdk6/c4b6e5ae52744a9c89c495910a82d730~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgSlllb250dQ==:q75.awebp?policy=eyJ2bSI6MywidWlkIjoiNDQwMjQ0MjkwNzI3Mjk0In0%3D&rk3s=e9ecf3d6&x-orig-authkey=f32326d3454f2ac7e96d3d06cdbb035152127018&x-orig-expires=1738404980&x-orig-sign=fn7lxnrZhe2dOVLGSeyzIMoZ2NA%3D) ![封神第二部:战火西岐_趋势图.png](https://p0-xtjj-private.juejin.cn/tos-cn-i-73owjymdk6/fc0669a2d22047d0b2cf9c4014560ff7~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgSlllb250dQ==:q75.awebp?policy=eyJ2bSI6MywidWlkIjoiNDQwMjQ0MjkwNzI3Mjk0In0%3D&rk3s=e9ecf3d6&x-orig-authkey=f32326d3454f2ac7e96d3d06cdbb035152127018&x-orig-expires=1738404988&x-orig-sign=yLlhZ%2FamdtT62Ep7PBEiZfsdWks%3D) ### 唐探1900 ![唐探1900_词云.png](https://p0-xtjj-private.juejin.cn/tos-cn-i-73owjymdk6/2b6b012c5ddb4417a92b53c30820dc20~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgSlllb250dQ==:q75.awebp?policy=eyJ2bSI6MywidWlkIjoiNDQwMjQ0MjkwNzI3Mjk0In0%3D&rk3s=e9ecf3d6&x-orig-authkey=f32326d3454f2ac7e96d3d06cdbb035152127018&x-orig-expires=1738405016&x-orig-sign=7yYvYJuURHNuGGmsCeIRiZaRWSA%3D) ![唐探1900_情感分布.png](https://p0-xtjj-private.juejin.cn/tos-cn-i-73owjymdk6/542957a82d694e58b8c396fb421dd6d2~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgSlllb250dQ==:q75.awebp?policy=eyJ2bSI6MywidWlkIjoiNDQwMjQ0MjkwNzI3Mjk0In0%3D&rk3s=e9ecf3d6&x-orig-authkey=f32326d3454f2ac7e96d3d06cdbb035152127018&x-orig-expires=1738405022&x-orig-sign=5%2FMjrZ3OxiYtTNjDFlB4%2BZEKNRU%3D) ![唐探1900_趋势图.png](https://p0-xtjj-private.juejin.cn/tos-cn-i-73owjymdk6/890efdb5382b46db992c322caee2dc08~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgSlllb250dQ==:q75.awebp?policy=eyJ2bSI6MywidWlkIjoiNDQwMjQ0MjkwNzI3Mjk0In0%3D&rk3s=e9ecf3d6&x-orig-authkey=f32326d3454f2ac7e96d3d06cdbb035152127018&x-orig-expires=1738405028&x-orig-sign=9Ge4VPnM%2FWozgmPb%2Badr2rS9NTY%3D) ### 哪吒之魔童闹海 ![哪吒之魔童闹海_词云.png](https://p0-xtjj-private.juejin.cn/tos-cn-i-73owjymdk6/1f1b311535b446f6badd1e7c713225ed~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgSlllb250dQ==:q75.awebp?policy=eyJ2bSI6MywidWlkIjoiNDQwMjQ0MjkwNzI3Mjk0In0%3D&rk3s=e9ecf3d6&x-orig-authkey=f32326d3454f2ac7e96d3d06cdbb035152127018&x-orig-expires=1738405041&x-orig-sign=xa%2FqPzSU8uSjVqwL5bSaTVX%2BzOM%3D) ![哪吒之魔童闹海_情感分布.png](https://p0-xtjj-private.juejin.cn/tos-cn-i-73owjymdk6/90fa6782188a447d9b83a390ff8dd504~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgSlllb250dQ==:q75.awebp?policy=eyJ2bSI6MywidWlkIjoiNDQwMjQ0MjkwNzI3Mjk0In0%3D&rk3s=e9ecf3d6&x-orig-authkey=f32326d3454f2ac7e96d3d06cdbb035152127018&x-orig-expires=1738405048&x-orig-sign=3erjvRWHd8G4wMWSJ9Xsrs%2BqHTo%3D) ![哪吒之魔童闹海_趋势图.png](https://p0-xtjj-private.juejin.cn/tos-cn-i-73owjymdk6/72e28469d6704082962a568f55c21821~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgSlllb250dQ==:q75.awebp?policy=eyJ2bSI6MywidWlkIjoiNDQwMjQ0MjkwNzI3Mjk0In0%3D&rk3s=e9ecf3d6&x-orig-authkey=f32326d3454f2ac7e96d3d06cdbb035152127018&x-orig-expires=1738405054&x-orig-sign=7oIQbDZztSJT9SysLeMo0GhY9qs%3D) ### 射雕英雄传:侠之大者 ![](https://p0-xtjj-private.juejin.cn/tos-cn-i-73owjymdk6/edcc4ed13d924f35ad974376bde1c024~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgSlllb250dQ==:q75.awebp?policy=eyJ2bSI6MywidWlkIjoiNDQwMjQ0MjkwNzI3Mjk0In0%3D\&rk3s=e9ecf3d6\&x-orig-authkey=f32326d3454f2ac7e96d3d06cdbb035152127018\&x-orig-expires=1738404884\&x-orig-sign=5bGniHYlHB675I37O%2BXQMQwFdGc%3D) ![](https://p0-xtjj-private.juejin.cn/tos-cn-i-73owjymdk6/32645508485c45d190fce35017e76c8c~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgSlllb250dQ==:q75.awebp?policy=eyJ2bSI6MywidWlkIjoiNDQwMjQ0MjkwNzI3Mjk0In0%3D\&rk3s=e9ecf3d6\&x-orig-authkey=f32326d3454f2ac7e96d3d06cdbb035152127018\&x-orig-expires=1738404884\&x-orig-sign=yNB0EUVtkYyrPUqy2gP0IKAfKkU%3D) ![](https://p0-xtjj-private.juejin.cn/tos-cn-i-73owjymdk6/ebb185b7304844008bd8dc1386f5f7fc~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgSlllb250dQ==:q75.awebp?policy=eyJ2bSI6MywidWlkIjoiNDQwMjQ0MjkwNzI3Mjk0In0%3D\&rk3s=e9ecf3d6\&x-orig-authkey=f32326d3454f2ac7e96d3d06cdbb035152127018\&x-orig-expires=1738404885\&x-orig-sign=IiIB%2FCwWaH4IZB03uGsjEYlmeys%3D) ### 蛟龙行动 ![蛟龙行动_词云.png](https://p0-xtjj-private.juejin.cn/tos-cn-i-73owjymdk6/4cd553ccc0e8460c8bea77a1b358cba8~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgSlllb250dQ==:q75.awebp?policy=eyJ2bSI6MywidWlkIjoiNDQwMjQ0MjkwNzI3Mjk0In0%3D&rk3s=e9ecf3d6&x-orig-authkey=f32326d3454f2ac7e96d3d06cdbb035152127018&x-orig-expires=1738405076&x-orig-sign=5hefewqp06qk6vrx%2FPT%2FxHPnPT0%3D) ![蛟龙行动_情感分布.png](https://p0-xtjj-private.juejin.cn/tos-cn-i-73owjymdk6/d7f2833af2734223b4f29d3cccd19322~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgSlllb250dQ==:q75.awebp?policy=eyJ2bSI6MywidWlkIjoiNDQwMjQ0MjkwNzI3Mjk0In0%3D&rk3s=e9ecf3d6&x-orig-authkey=f32326d3454f2ac7e96d3d06cdbb035152127018&x-orig-expires=1738405082&x-orig-sign=FtsXYSiJTMsAH%2BRMQHwURRtAsXs%3D) ![蛟龙行动_趋势图.png](https://p0-xtjj-private.juejin.cn/tos-cn-i-73owjymdk6/fa2af1c2234743b3bf24e76ecad46076~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgSlllb250dQ==:q75.awebp?policy=eyJ2bSI6MywidWlkIjoiNDQwMjQ0MjkwNzI3Mjk0In0%3D&rk3s=e9ecf3d6&x-orig-authkey=f32326d3454f2ac7e96d3d06cdbb035152127018&x-orig-expires=1738405088&x-orig-sign=eTmo%2BaiRWmTum%2Buh7C7ZoFj3JoA%3D) ### 熊出没·重启未来 ![熊出没·重启未来_词云.png](https://p0-xtjj-private.juejin.cn/tos-cn-i-73owjymdk6/09edc562b9844df6906cb35187393448~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgSlllb250dQ==:q75.awebp?policy=eyJ2bSI6MywidWlkIjoiNDQwMjQ0MjkwNzI3Mjk0In0%3D&rk3s=e9ecf3d6&x-orig-authkey=f32326d3454f2ac7e96d3d06cdbb035152127018&x-orig-expires=1738405121&x-orig-sign=S48NeB43IGCe6wa3VuVRsLFdDP0%3D) ![熊出没·重启未来_情感分布.png](https://p0-xtjj-private.juejin.cn/tos-cn-i-73owjymdk6/1737ad58c7084e1d99e29c03752927b0~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgSlllb250dQ==:q75.awebp?policy=eyJ2bSI6MywidWlkIjoiNDQwMjQ0MjkwNzI3Mjk0In0%3D&rk3s=e9ecf3d6&x-orig-authkey=f32326d3454f2ac7e96d3d06cdbb035152127018&x-orig-expires=1738405126&x-orig-sign=UtFgoWC5u9WxU5tyDYvwCIyvZHA%3D) ![熊出没·重启未来_趋势图.png](https://p0-xtjj-private.juejin.cn/tos-cn-i-73owjymdk6/5a8fdb9587ad4973a29055aead7c54c3~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgSlllb250dQ==:q75.awebp?policy=eyJ2bSI6MywidWlkIjoiNDQwMjQ0MjkwNzI3Mjk0In0%3D&rk3s=e9ecf3d6&x-orig-authkey=f32326d3454f2ac7e96d3d06cdbb035152127018&x-orig-expires=1738405134&x-orig-sign=0Kt%2BTqgZ7ZtfHIlm1Sfht6nuyXw%3D) ## 功能实现 ### 1、全局字体初始化 先对全局字体进行初始化设置,以确保中文能够正常显示。根据不同的操作系统(Windows、Mac/Linux)选择合适的字体。 ```python # ================ 全局字体初始化 ================ # 必须在其他导入之前设置 mpl.use('Agg') # 解决无GUI环境问题 # 配置系统字体(Windows/Mac/Linux自动适配) try: if os.name == 'nt': # Windows系统 mpl.rcParams['font.sans-serif'] = ['SimHei', 'Microsoft YaHei'] else: # Mac/Linux系统 mpl.rcParams['font.sans-serif'] = ['PingFang HK', 'Noto Sans CJK SC', 'WenQuanYi Zen Hei'] mpl.rcParams['axes.unicode_minus'] = False # 解决负号显示问题 plt.rcParams['font.size'] = 12 # 全局字体大小 # 验证字体配置 test_fig, test_ax = plt.subplots() test_ax.set_title("中文测试") test_fig.savefig(os.path.join(os.getcwd(), 'font_test.png')) plt.close(test_fig) print("✅ 系统字体配置验证通过") except Exception as e: print("❌ 字体配置失败:", str(e)) print("请执行以下解决方案:") print("1. Windows系统:安装[微软雅黑](https://learn.microsoft.com/zh-cn/typography/font-list/microsoft-yahei)") print("2. Mac/Linux系统:执行安装命令:sudo apt install fonts-noto-cjk") exit(1) ``` ### 2、配置信息 包括要分析的电影信息、输出目录、抓取页数、并发线程数、代理 IP 池、字体路径、停用词文件路径和情感阈值等。 ```python # ================== 配置 ================== CONFIG = { # 多电影配置(豆瓣ID: 电影名称) 'movies': { '30181250': '封神第二部:战火西岐', '36282639': '唐探1900', '34780991': '哪吒之魔童闹海', '36289423': '射雕英雄传:侠之大者', '35295960': '蛟龙行动', }, 'output_dir': './reports', # 输出目录 'page_limit': 20, # 每部电影抓取页数 'max_workers': 5, # 并发线程数 'proxy_pool': [ # 代理IP池 # 'http://ip1:port', # 'http://ip2:port' ], 'filterRoleNames':True, 'font_path':'./font/NotoSansCJKMedium.otf', # 字体 'filterText':'./filterText.txt', 'stopwords': './stopwords.txt', # 停用词文件路径 'sentiment_threshold': (0.4, 0.6) # 情感阈值(负面, 中性) } ``` ### 3、MovieAnalyzer 类 脚本的核心类,包含了多个方法,用于实现数据抓取、文本处理、分析和报告生成等功能。 #### (1)初始化 初始化 **UserAgent** 对象,创建输出目录,加载自定义词典和停用词文件。 ```python def __init__(self): self.ua = UserAgent() os.makedirs(CONFIG['output_dir'], exist_ok=True) # 初始化分词器 jieba.load_userdict('./userdict.txt') # 自定义词典 # 加载停用词 with open(CONFIG['stopwords'], 'r', encoding='utf-8') as f: self.stopwords = set(f.read().splitlines()) ``` #### (2)生成动态请求头 生成动态请求头,模拟不同的浏览器访问,避免被网站识别为爬虫。 ```python def get_headers(self): """生成动态请求头""" return { 'User-Agent': self.ua.random, 'Referer': 'https://movie.douban.com/' } ``` #### (3)获取代理ip 从代理 **IP** 池中随机选择一个代理 **IP**,如果代理 **IP** 池为空则返回 **None**。 ```python def get_proxy(self): """随机获取代理IP""" return random.choice(CONFIG['proxy_pool']) if CONFIG['proxy_pool'] else None ``` #### (4)数据获取 多线程安全的数据抓取方法,先获取电影的演员信息,再使用线程池并发抓取指定页数的评论。 ```python def fetch_data(self, movie_id): """多线程安全的数据抓取""" all_comments = [] character_blacklist = [] try: # 获取演员表 url = f'https://movie.douban.com/subject/{movie_id}/celebrities' resp = requests.get(url, headers=self.get_headers(), proxies={'http': self.get_proxy()}, timeout=15) soup = BeautifulSoup(resp.text, 'html.parser') if CONFIG['filterRoleNames']: character_blacklist = [li.find('span', class_='name').text for li in soup.select('li.celebrity')[:8]] # 获取短评 with ThreadPoolExecutor(max_workers=3) as executor: futures = [] for page in range(CONFIG['page_limit']): futures.append( executor.submit(self._fetch_page_comments, movie_id, page) ) time.sleep(random.uniform(0.5, 1.5)) for future in futures: all_comments.extend(future.result()) except Exception as e: print(f'电影{movie_id}数据获取异常: {str(e)}') return { 'comments': all_comments, 'characters': character_blacklist } ``` #### (5)单页评论获取 单页评论抓取方法,使用 requests 库发送请求,解析 HTML 页面,提取评论信息。 ```python def _fetch_page_comments(self, movie_id, page): """单页评论抓取""" try: url = f'https://movie.douban.com/subject/{movie_id}/comments?start={page * 20}' resp = requests.get(url, headers=self.get_headers(), proxies={'http': self.get_proxy()}, timeout=10) soup = BeautifulSoup(resp.text, 'html.parser') comments = [self._clean_text(span.get_text()) for span in soup.select('span.short')] time.sleep(random.uniform(1, 3)) return comments except: return [] ``` #### (6)文本清洗 对评论进行高级文本清洗,去除 HTML 标签、@提及、括号内容等无用信息。 ```python def _clean_text(self, text): """高级文本清洗""" text = re.sub(r'<[^>]+>', '', text) # HTML标签 text = re.sub(r'@\w+\s?', '', text) # 去除@提及 text = re.sub(r'【.*?】', '', text) # 去除括号内容 text = re.sub(r'[^\w\u4e00-\u9fff]', ' ', text) # 保留中文和基本字符 return text.strip() ``` #### (7)影评分析 调用 fetch\_data 方法获取数据,对评论进行情感分析和文本处理,最后生成分析报告。 ```python def analyze_movie(self, movie_id, movie_name): """核心分析流程""" print(f'🎬 正在分析《{movie_name}》...') data = self.fetch_data(movie_id) if not data['comments']: print(f'⚠️ 《{movie_name}》无有效评论') return # 情感分析与文本处理 sentiment_results = [] words = [] characters_arr = [] filter_text = [] for ch in data['characters']: split_string = ch.split() characters_arr.extend(split_string) with open(CONFIG['filterText'], 'r', encoding='utf-8') as f: filter_text = set(f.read().splitlines()) blacklist = set(characters_arr + list(filter_text)) print("blacklist", blacklist) with ThreadPoolExecutor(max_workers=4) as executor: futures = [] for comment in data['comments']: futures.append(executor.submit(self._process_comment, comment, blacklist)) for future in futures: result = future.result() if result: words.extend(result['words']) sentiment_results.append(result['sentiment']) # 生成分析报告 self._generate_wordcloud(words, movie_name) self._generate_sentiment_chart(sentiment_results, movie_name) self._generate_full_report(words, sentiment_results, movie_name) ``` #### (8)单条评论处理 处理单条评论,包括情感分析和文本处理,去除停用词和黑名单词汇。 ```python def _process_comment(self, comment, blacklist): """处理单条评论(包含情感分析)""" try: # 情感分析 s = SnowNLP(comment) sentiment = s.sentiments # 文本处理 seg = jieba.lcut(comment) filtered_words = [w for w in seg if len(w) > 1 and w not in self.stopwords and w not in blacklist] return { 'words': filtered_words, 'sentiment': sentiment } except: return None ``` #### (9)生成词云图 根据关键词频率生成词云图。 ![](https://p0-xtjj-private.juejin.cn/tos-cn-i-73owjymdk6/a6f0cf6b4cf34da090ea2d7a32eadf9b~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgSlllb250dQ==:q75.awebp?policy=eyJ2bSI6MywidWlkIjoiNDQwMjQ0MjkwNzI3Mjk0In0%3D\&rk3s=e9ecf3d6\&x-orig-authkey=f32326d3454f2ac7e96d3d06cdbb035152127018\&x-orig-expires=1738404885\&x-orig-sign=R9vO4eG3jAh%2FOfWW50DnCbcH%2FkA%3D) ```python def _generate_wordcloud(self, words, movie_name): """生成高级词云""" freq = Counter(words) wc = WordCloud( font_path=CONFIG['font_path'], width=1600, height=1200, background_color='white', colormap='tab20', max_words=200, contour_width=1, contour_color='steelblue' ).generate_from_frequencies(freq) plt.figure(figsize=(20, 15)) plt.imshow(wc, interpolation='bilinear') plt.axis('off') plt.savefig(os.path.join(CONFIG['output_dir'], f'{movie_name}_词云.png'), bbox_inches='tight', dpi=300) plt.close() ``` #### (10)生成情感分布图 根据情感分析结果生成情感分布饼图。 ![](https://p0-xtjj-private.juejin.cn/tos-cn-i-73owjymdk6/25a00364d535416b9625616fd050095c~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgSlllb250dQ==:q75.awebp?policy=eyJ2bSI6MywidWlkIjoiNDQwMjQ0MjkwNzI3Mjk0In0%3D\&rk3s=e9ecf3d6\&x-orig-authkey=f32326d3454f2ac7e96d3d06cdbb035152127018\&x-orig-expires=1738404885\&x-orig-sign=HySlScgQsGwgLujkBrtAlt13cfs%3D) ```python def _generate_sentiment_chart(self, sentiments, movie_name): """生成情感分布饼图""" low, high = CONFIG['sentiment_threshold'] counts = { '负面': sum(1 for s in sentiments if s < low), '中性': sum(1 for s in sentiments if low <= s <= high), '正面': sum(1 for s in sentiments if s > high) } plt.figure(figsize=(10, 10)) plt.pie( counts.values(), labels=counts.keys(), autopct='%1.1f%%', colors=['#ff9999', '#66b3ff', '#99ff99'], startangle=90 ) plt.title(f'《{movie_name}》评论情感分布', fontsize=14) plt.savefig(os.path.join(CONFIG['output_dir'], f'{movie_name}_情感分布.png'), bbox_inches='tight', dpi=150) plt.close() ``` #### (11)导出Excel文档 将情感分析和关键词分析结果保存到 Excel 文件中,并生成高频关键词趋势图。 ![](https://p0-xtjj-private.juejin.cn/tos-cn-i-73owjymdk6/0ef0c0cb8f2e437382cbb4ce304ea43f~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgSlllb250dQ==:q75.awebp?policy=eyJ2bSI6MywidWlkIjoiNDQwMjQ0MjkwNzI3Mjk0In0%3D\&rk3s=e9ecf3d6\&x-orig-authkey=f32326d3454f2ac7e96d3d06cdbb035152127018\&x-orig-expires=1738404884\&x-orig-sign=FOL%2BvM1yFG0HOJqTkXC73rnaS3s%3D) ![](https://p0-xtjj-private.juejin.cn/tos-cn-i-73owjymdk6/fef56aa9c2324629a3d9c2157b7c1015~tplv-73owjymdk6-jj-mark-v1:0:0:0:0:5o6Y6YeR5oqA5pyv56S-5Yy6IEAgSlllb250dQ==:q75.awebp?policy=eyJ2bSI6MywidWlkIjoiNDQwMjQ0MjkwNzI3Mjk0In0%3D\&rk3s=e9ecf3d6\&x-orig-authkey=f32326d3454f2ac7e96d3d06cdbb035152127018\&x-orig-expires=1738404884\&x-orig-sign=n8vcqSn5HIkJN7x2KiSknRIFuzA%3D) ```python def _generate_full_report(self, words, sentiments, movie_name): """生成完整分析报告""" # 情感数据 df_sentiment = pd.DataFrame({ '情感得分': sentiments, '情感分类': ['正面' if s > CONFIG['sentiment_threshold'][1] else '中性' if s >= CONFIG['sentiment_threshold'][0] else '负面' for s in sentiments] }) # 关键词数据 freq = Counter(words) df_keywords = pd.DataFrame(freq.most_common(50), columns=['关键词', '频次']) # 保存Excel with pd.ExcelWriter(os.path.join(CONFIG['output_dir'], f'{movie_name}_分析报告.xlsx')) as writer: df_sentiment.to_excel(writer, sheet_name='情感分析', index=False) df_keywords.to_excel(writer, sheet_name='关键词分析', index=False) # 添加统计数据 stats = pd.DataFrame({ '指标': ['总评论数', '平均情感得分', '正面率', '负面率'], '数值': [ len(sentiments), sum(sentiments) / len(sentiments), sum(1 for s in sentiments if s > CONFIG['sentiment_threshold'][1]) / len(sentiments), sum(1 for s in sentiments if s < CONFIG['sentiment_threshold'][0]) / len(sentiments) ] }) stats.to_excel(writer, sheet_name='统计概览', index=False) # 生成趋势图 plt.figure(figsize=(12, 6)) df_keywords.head(15).plot.bar(x='关键词', y='频次', legend=False) plt.title(f'《{movie_name}》高频关键词TOP15') plt.tight_layout() plt.savefig(os.path.join(CONFIG['output_dir'], f'{movie_name}_趋势图.png'), dpi=150) plt.close() ``` ## 源码 ### gitee 源码地址: ### github 源码地址: *** **🌟觉得有帮助的可以点个star\~** **🖊有什么问题或错误可以指出,欢迎pr\~** **📬有什么想要实现的组件或想法可以联系我\~** *** ## 公众号 关注公众号『`前端也能这么有趣`』,获取更多有趣内容。 公众号发送 **加群** 可以加入群聊,一起来学习(摸鱼)吧\~ ## 说在后面 > 🎉 这里是 JYeontu,现在是一名前端工程师,有空会刷刷算法题,平时喜欢打羽毛球 🏸 ,平时也喜欢写些东西,既为自己记录 📋,也希望可以对大家有那么一丢丢的帮助,写的不好望多多谅解 🙇,写错的地方望指出,定会认真改进 😊,偶尔也会在自己的公众号『`前端也能这么有趣`』发一些比较有趣的文章,有兴趣的也可以关注下。在此谢谢大家的支持,我们下文再见 🙌。