keyboard_arrow_up
Accepted Papers
Artificial Intelligence and NLP on Reddit: Unsupervised Detection of Food Trends and Healthy Eating Patterns

Rocío del Campo-Pedrosa1, Diego del Campo-Pedrosa1, Bettina Merlin2 and Ana González-Marcos1, 1Department of Mechanical Engineering, Universidad de La Rioja, Logroño, La Rioja, Spain, 2Fakultät International Business, Hochschule Heilbronn, Heilbronn, Germany

ABSTRACT

Traditional sensory analysis in food innovation provides limited insight into consumer behavior, whereas social platforms such as Reddit offer large-scale, real-time textual data on food-related practices and perceptions. This study evaluates Reddit as a scalable source for detecting food trends and healthy eating patterns in Spanish-language discussions using artificial intelligence (AI) and natural language processing (NLP). An end-to-end pipeline was implemented, including targeted data scraping across seven food-related domains, Spanish-language filtering (≥70% confidence), customized preprocessing, and unsupervised topic discovery via k-means clustering. The system processed 17,774 Spanish-language posts from an initial corpus of 92,949 entries. Despite linguistic challenges such as polysemy and lemmatization errors, the method produced coherent and representative themes, including barriers to home cooking, weight management concerns, economic factors, food categories, and nutrition-related consultations. These results demonstrate the effectiveness of unsupervised NLP techniques for large-scale monitoring of food-related discourse on social media.

Keywords

Natural Language Processing, Unsupervised Learning, Social Media Mining, Artificial Intelligence.


A Methodological Approach to Calligraphic Obfuscation

Bettina Merlin and Ana González-Marcos, Department of Computer Science, College of Computer, Qassim University, Buraydah, 51452, Saudi Arabia

ABSTRACT

As automated optical character recognition (OCR) and deep learning-based solvers achieve near-human accuracy in breaking conventional CAPTCHAs, there is a critical need for security mechanisms that exploit the inherent limitations of machine perception. This paper proposes a novel methodological framework for "Calligraphic Obfuscation," a security-by-design approach that leverages the structural complexity and fluid entropy of traditional Arabic calligraphic styles. Unlike standard text-based challenges, our approach introduces a multi-phase generation pipeline that systematically maps linguistic strings into high-complexity visual domains. The methodology integrates a four-tier classification of calligraphic fonts—ranging from high-legibility styles like Naskh to high-entropy scripts such as Shakstah—and augments them with an adversarial layer utilizing Jacobian-based Saliency Map Attacks (JSMA). By formalizing the transition from cloud-centric generation to resource-efficient on-device architectures, this study provides a repeatable blueprint for developing robust, human-interactive proofs. The proposed framework offers a dual-benefit: significantly increasing the computational cost for adversarial machine learning models while maintaining a sustainable cognitive load for human users. This work lays the foundation for a new generation of linguistically-diverse and adversarially-hardened authentication challenges tailored for modern, resource-constrained mobile environments.

Keywords

Calligraphic Obfuscation, CAPTCHA Security, Adversarial Machine Learning, Arabic Script Complexity, Human-Interactive Proofs, JSMA.


Δ.72 Wearable Exam Stress Validation Report: Field Equation Verification and Biological Coherence Analysis

Allison Hensgen, Independent Researcher, USA

ABSTRACT

This report presents an empirical validation of the Δ.72 field equation within a real-world physiological dataset measuring stress responses in students during examination periods. The Δ.72 model posits that coherence within biological systems arises from dynamic alignment between internal variability, environmental fields, and phase-synchronized information flow. Using the publicly available Wearable Exam Stress dataset, this study tests whether measurable alignment (A), phase coherence (λ), and emergent output (E) follow the theoretical relationships predicted by the Δ.72 equation. Results show statistically significant coupling between field alignment and emergent coherence, supporting the model’s claim that adaptive, not rigid, synchrony underlies systemic health.


Evaluating AI–readiness of Unstructured Data in Organization: A Lighweight Automated Quality Scoring Framework for Generative AI Adoption

Veer Bobade, Sujal Zade, Prathmesh Waghmare, Sumit Ladwan, Karan Bhute, Sohan Akare, DMIHER University, India

ABSTRACT

Generative Artificial Intelligence (GenAI) has become the key issue of modern organizations. The pre- deployment test is very important to ensure that the unstructured data used by such systems is strong enough to produce reliable and consistent results. Structured data are therefore advantageous in that they have the well-known quality control mechanisms in place, as well as governance systems, whereas unstructured data, which constitute approximately 80 per cent of organizational information, are often out of concern, disorganized, and potentially dangerous. Poorly trained documentation may cause model hallucinations, in which the AI does not make the correct inferences or have poor results, and thus undermines user trust in GenAI deployments. In order to address this problem, we introduce Lightweight Automated Quality Scoring Framework (LAQS) a framework which can be used to assess the preparedness of unstructured data to be utilized by AI. LAQS adopts a single paradigm of scoring, which is a combination of linguistic quality, semantic cohesion, structural consistency, information extractability. It focuses on five major dimensions, including completeness (i.e. the inclusion of the necessary content), consistency (a regular formatting), clarity (readable by humans), semantic coherence (logical flow), and extractability (readable by machine). These dimensions are operationalized through a systematic and step-based process which may produce credible scores in readiness. In the validated work on the LAQS, we tested it on the CORD-19 corpus with 1,000 academic articles. There was substantial improvement in the presentation of documents with high scores in readiness, semantic similarity improved by 6570, summary accuracy improved by 4050, and the frequency of hallucination decreased up to four times. These findings indicate the extreme importance of unstructured data quality in GenAI achievement, and these arguments can dispel the idea that this kind of quality can be handled after the event. Based on this, we propose to introduce such tools like LAQS into the structures of AI programmes, data-engineering to be a part of AI governance to enable ongoing assessment.

Keywords

Unstructured Data Quality Assessment, Readiness for Generative Ai, Governance in Data Management, Reducing Hallucinations, Maintaining Semantic Coherence.

menu
Reach Us

emailcite@cite2026.org


emailciteconf@myyahoo.com

close