Humanities and Arts
The study utilized tf-idf (term frequency-inverse document frequency) scores to represent text documents and cosine similarity measures to assess the similarity between source and suspicious documents. This approach enabled the identification of text overlap, indicating potential plagiarism.
The model successfully created vectors and measured similarity metrics, producing a matrix of cosine similarity for all documents. This matrix facilitated manual retrieval of documents to check for overlap using online tools. While the algorithm was not extended to automatically retrieve related documents, the manual comparisons provided some validation of the model's effectiveness.
The primary limitation is that the algorithm was not extended to automatically retrieve related documents for detailed analysis. Implementing a pipeline that converts texts to n-grams for in-depth analysis and evaluating the accuracy of the model could enhance its effectiveness in detecting plagiarism.
To improve the model, extending the algorithm to automatically retrieve related documents and converting texts to n-grams for detailed analysis would be beneficial. This approach would allow for a more accurate evaluation of the model and facilitate the identification of the best match as a source of plagiarism.
This study contributes to the field by providing preliminary steps towards creating an external plagiarism detection tool. It highlights the potential of using tf-idf scores and cosine similarity measures in identifying text overlap, offering a foundation for developing more advanced plagiarism detection systems.
By addressing these questions, the study offers insights into the development and evaluation of plagiarism detection tools, emphasizing the importance of accurate and efficient methods in maintaining academic integrity.
Show by month | Manuscript | Video Summary |
---|---|---|
2025 February | 7 | 7 |
2025 January | 71 | 71 |
2024 December | 12 | 12 |
Total | 90 | 90 |
Show by month | Manuscript | Video Summary |
---|---|---|
2025 February | 7 | 7 |
2025 January | 71 | 71 |
2024 December | 12 | 12 |
Total | 90 | 90 |