A systematic assessment of ChatGPT for constructing two-level fractional factorial designs

Abstract

Design of experiments concerns cost-efficient experimental plans to fast-track product innovation and improvement. In the literature, the most common plans are two-level fractional factorial designs, which study each factor at two levels. Traditionally, these designs are obtained from catalogs available in standard textbooks or statistical software. However, modern generative artificial intelligence systems like ChatGPT can now produce two-level fractional factorial designs. To our knowledge, there is no systematic assessment of the quality of these designs. In this paper, we thus evaluate the performance of ChatGPT to generate two-level fractional factorial designs with 8, 16 and 32 runs, and 4 to 31 factors. To this end, we use prompt engineering techniques to develop a high-quality set of tasks that serve as input to ChatGPT. We compare the designs obtained by ChatGPT with the best-known designs in terms of resolution and minimum aberration criteria. We show that ChatGPT can construct good 8-, 16-, and 32-run designs with up to nine factors, but it fails to produce designs with more factors.

Avatar
Alan Roberto Vazquez
Assistant Professor

Data scientist working on the construction and analysis of cost-effective experimental plans using optimization techniques and artificial intelligence