Research – Anna Řechtáčková

SIGCSE TS 2025 – Diagnosable Code Duplication in Introductory Programming

Links for a paper presentation at SIGCSE TS 2025

  • Abstract
    • Code quality is an important aspect of programming education, with duplicate code being a common issue. To help students learn to avoid code duplication, it is useful to provide them with actionable, specific feedback, not just a generic code duplication warning. In this paper, we introduce the concept of diagnosable code duplication, provide an overview of its various types, and propose a framework for automatic detection. We apply the framework to an introductory programming dataset to demonstrate its ability to provide specific feedback and reveal non-trivial differences in detected cases compared to simpler detectors.
  • When: Thu 27 Feb 2025 14:22 - 14:41 at Meeting Room 406 - CS Research/Tools
    • Time and location are preliminary and subject to change
  • Paper – ACM DL, PDF
  • Presentation – Google Slides, PDF TBA
  • Supplementary materials
  • Authors
    • Anna Řechtáčková; Masaryk University, Czechia; Scholar
    • Radek Pelánek; Masaryk University, Czechia; Scholar

Not-syntax-driven duplication example

def arrow(n):
    for y in range(n // 2):
        for x in range(n):
            if n // 2 + y == x:
                print("*", end="")
            else:
                print(".", end="")
        print()
    print(n * "*")
    for y in range(n // 2):
        for x in range(n):
            if x + y + 2 == n:
                print("*", end="")
            else:
                print(".", end="")
        print()

Code prints ASCII art of an arrow.

>>> arrow(7)
...*...
....*..
.....*.
*******
.....*.
....*..
...*...

There is duplication between printing the top and the bottom part of the image, but due to the line of stars in between, these duplicate blocks cannot be easily merged together using local restructuring.

The duplication can be removed if instead of printing the image by parts, it is printed in a grid and the value of each pixel (* or .) is decided separately, as it is done in the following code.

Code with the duplication fixed.

def arrow(n):
    for i in range(n):
        for j in range(n):
            if (
                i == n//2
                or j - i == n//2
                or j + i == 3 * (n//2)
            ):
                c = '*'
            else:
                c = '.'
            print(c, end="")
        print()